Directory Traversal

Add these crates to your own project:

cargo add glob walkdir

Find all png files recursively

glob-badge cat-filesystem-badge

Recursively find all PNG files in the current directory. In this case, the ** pattern matches the current directory and all subdirectories.

Use the ** pattern in any path portion. For example, /media/**/*.png matches all PNGs in media and it's subdirectories.

use std::error::Error;
use glob::glob;

fn main() -> Result<(), Box<dyn Error>> {
    for entry in glob("**/*.png")? {
        println!("{}", entry?.display());
    }

    Ok(())
}

Recursively calculate file sizes at given depth

walkdir-badge cat-filesystem-badge

Recursion depth can be flexibly set by WalkDir::min_depth & WalkDir::max_depth methods. Calculates sum of all file sizes to 3 subfolders depth, ignoring files in the root folder.

use walkdir::WalkDir;

fn main() {
    let total_size = WalkDir::new(".")
        .min_depth(1)
        .max_depth(3)
        .into_iter()
        .filter_map(|entry| entry.ok())
        .filter_map(|entry| entry.metadata().ok())
        .filter(|metadata| metadata.is_file())
        .fold(0, |acc, m| acc + m.len());

    println!("Total size: {} bytes.", total_size);
}

WalkDir uses the builder pattern to set the arguments, then this struct is converted to an iterator. For each file that exists and has metadata, the size is retrieved and then summed.

File names that have been modified in the last 24 hours

std-badge cat-filesystem-badge

Gets the current working directory by calling env::current_dir, then for each entries in fs::read_dir, extracts the DirEntry::path and gets the metadata via fs::Metadata. The Metadata::modified returns the SystemTime::elapsed time since last modification. Duration::as_secs converts the time to seconds and compared with 24 hours (24 * 60 * 60 seconds). Metadata::is_file filters out directories.

use std::error::Error;
use std::{env, fs};

fn main() -> Result<(), Box<dyn Error>> {
    let current_dir = env::current_dir()?;
    println!(
        "Entries modified in the last 24 hours in {:?}:",
        current_dir
    );

    for entry in fs::read_dir(current_dir)? {
        let entry = entry?;
        let path = entry.path();

        let metadata = fs::metadata(&path)?;
        let last_modified = metadata.modified()?.elapsed()?.as_secs();

        if last_modified < 24 * 3600 && metadata.is_file() {
            println!(
                "Last modified: {:?} seconds, is read only: {:?}, size: {:?} bytes, filename: {:?}",
                last_modified,
                metadata.permissions().readonly(),
                metadata.len(),
                path.file_name().ok_or("No filename")?
            );
        }
    }

    Ok(())
}

Find loops for a given path

same_file-badge cat-filesystem-badge

Because it uses symbolic links, this example will run on:

  • Linux / Unix
  • MacOS
  • Windows Services for Linux (WSL)

Add the same-file crate to your own project:

cargo add same-file

Use same_file::is_same_file to detect loops for a given path. For example, a loop could be created on a Unix system via symlinks:

mkdir -p /tmp/foo/bar/baz
ln -s /tmp/foo/  /tmp/foo/bar/baz/qux

The following would assert that a loop exists.

use std::error::Error;
use std::io;
use std::path::{Path, PathBuf};
use std::fs::{create_dir_all, remove_file};
use std::os::unix::fs::symlink;
use same_file::is_same_file;

fn contains_loop<P: AsRef<Path>>(path: P) -> io::Result<Option<(PathBuf, PathBuf)>> {
    let path = path.as_ref();
    let mut path_buf = path.to_path_buf();
    while path_buf.pop() {
        if is_same_file(&path_buf, path)? {
            return Ok(Some((path_buf, path.to_path_buf())));
        } else if let Some(looped_paths) = contains_loop(&path_buf)? {
            return Ok(Some(looped_paths));
        }
    }
    return Ok(None);
}

fn main() -> Result<(), Box<dyn Error>>{

    create_dir_all("/tmp/foo/bar/baz")?;
    let _ = remove_file("/tmp/foo/bar/baz/qux"); // Dont't care if the file doesn't exist yet.
    symlink("/tmp/foo", "/tmp/foo/bar/baz/qux")?;

    assert_eq!(
        contains_loop("/tmp/foo/bar/baz/qux/bar/baz").unwrap(),
        Some((
            PathBuf::from("/tmp/foo"),
            PathBuf::from("/tmp/foo/bar/baz/qux")
        ))
    );

    Ok(())
}

The Unix / Linux find utility also detects the loop:

$ find /tmp/foo | xargs file
/tmp/foo:             directory
/tmp/foo/bar:         directory
/tmp/foo/bar/baz:     directory
/tmp/foo/bar/baz/qux: symbolic link to /tmp/foo
$ find -L /tmp/foo
find: File system loop detected; ‘/tmp/foo/bar/baz/qux’ is part of the same file system loop as ‘/tmp/foo’.
/tmp/foo
/tmp/foo/bar
/tmp/foo/bar/baz

Recursively find duplicate file names

walkdir-badge cat-filesystem-badge

Find recursively in the current directory any duplicate filenames, printing them only once.

A [HashMap] is used to collect the filenames, the file name itself is the key, and the count of how many times that filename has been seen is the value stored in the HashMap. Any errors encountered, such as a directory that does not have read + execute permissions for the current user are silently ignored.

use std::collections::HashMap;
use walkdir::WalkDir;

fn main() {
    let mut filenames = HashMap::new();

    for entry in WalkDir::new(".")
            .into_iter()
            .filter_map(Result::ok)
            .filter(|e| !e.file_type().is_dir()) {
        let f_name = String::from(entry.file_name().to_string_lossy());
        let counter = filenames.entry(f_name.clone()).or_insert(0);
        *counter += 1;

        if *counter == 2 {
            println!("{}", f_name);
        }
    }
}

Recursively find all files with given predicate

walkdir-badge cat-filesystem-badge

Find JSON files modified within the last day in the current directory. Using follow_links ensures symbolic links are followed like they were normal directories and files.

use std::error::Error;
use walkdir::WalkDir;

fn main() -> Result<(), Box<dyn Error>> {
    for entry in WalkDir::new(".")
            .follow_links(true)
            .into_iter()
            .filter_map(|e| e.ok()) {
        let f_name = entry.file_name().to_string_lossy();
        let sec = entry.metadata()?.modified()?;

        if f_name.ends_with(".json") && sec.elapsed()?.as_secs() < 86400 {
            println!("{}", f_name);
        }
    }

    Ok(())
}

Traverse directories while skipping dotfiles

walkdir-badge cat-filesystem-badge

Uses filter_entry to descend recursively into entries passing the is_not_hidden predicate thus skipping hidden files and directories. Iterator::filter applies to each WalkDir::DirEntry even if the parent is a hidden directory.

Root dir "." yields through WalkDir::depth usage in is_not_hidden predicate.

use walkdir::{DirEntry, WalkDir};

fn is_not_hidden(entry: &DirEntry) -> bool {
    entry
         .file_name()
         .to_str()
         .map(|s| entry.depth() == 0 || !s.starts_with("."))
         .unwrap_or(false)
}

fn main() {
    WalkDir::new(".")
        .into_iter()
        .filter_entry(|e| is_not_hidden(e))
        .filter_map(|v| v.ok())
        .for_each(|x| println!("{}", x.path().display()));
}

Find all files with given pattern ignoring filename case.

glob-badge cat-filesystem-badge

Find all image files in the /media/ directory matching the img_[0-9]*.png pattern.

A custom MatchOptions struct is passed to the glob_with function making the glob pattern case insensitive while keeping the other options Default.

use std::error::Error;
use glob::{glob_with, MatchOptions};


fn main() -> Result<(), Box<dyn Error>> {
    let options = MatchOptions {
        case_sensitive: false,
        ..Default::default()
    };

    for entry in glob_with("test_data/foo_[0-9]*.txt", options)? {
        println!("{}", entry?.display());
    }

    Ok(())
}