Lengths of absolute filepaths

2022-01-03

While there may not be universal agreement as to what a reasonable maximum path length is, I was wondering how long paths are on average in practice.

I started off by making a list of every absolute path on my system (as root):

find / > /tmp/files.txt

This produced a 1.6G file, so I wrote a script in Rust to collect frequencies as well as an example string for each length:

use std::env::args;
use std::error::Error;
use std::fs::File;
use std::io::{BufRead, BufReader};

fn main() -> Result<(), Box<dyn Error>> {
    const V: (u32, Vec<u8>) = (0, Vec::new());
    let mut freqmap = [V; 1 << 16];
    let f = args().skip(1).next().ok_or("expected filename")?;
    let f = File::open(f)?;
    let f = BufReader::new(f);
    for l in f.split(b'\n') {
        let l = l?;
        let i = l.len();
        freqmap[i].0 += 1;
        freqmap[i].1 = l;
    }
    for (i, (c, l)) in freqmap.iter().enumerate() {
        if *c != 0 {
            println!("{:>4} {:>8} '{}'", i, c, String::from_utf8_lossy(l));
        }
    }
    Ok(())
}

The longest filepaths

On my system, the longest filepath is 344 characters long. The path in question is:

/tank/backup/backup/pc1/david/.cache/yarn/v6/npm-socketcluster-14.4.1-e39883c005becbf1d6dba2ced7e04bbfa857693d-integrity/node_modules/socketcluster/sample/node_modules/socketcluster/sample/node_modules/socketcluster/sample/node_modules/socketcluster/sample/node_modules/scc-broker-client/node_modules/socketcluster-client/lib/scsocketcreator.js

Removing all paths that contain node_modules with rg -Nv node_modules files.txt and rerunning the script shows that the longest path without node_modues is 272 characters long:

/tank/backup/backup/pc1/david/Documents/compilers/osxcross/target/SDK/MacOSX11.1.sdk/System/iOSSupport/System/Library/Frameworks/_AuthenticationServices_SwiftUI.framework/Versions/A/Modules/_AuthenticationServices_SwiftUI.swiftmodule/x86_64-apple-ios-macabi.swiftinterface

The one path that is 271 characters long is:

/tank/backup/disaster/software/os/yocto/poky-support/poky-contrib-archive/scripts/lib/bsp/substrate/target/arch/layer/{{ if create_example_bbappend == "y": }} recipes-example-bbappend/example-bbappend/{{=example_bbappend_name}}-{{=example_bbappend_version}}/example.patch

Removing all instances involving osxcross and yocto reveals no further obvious patterns other that the files are part of a software project (with .cabal files as the sole exception).

Outliers

Plotting the frequency map with gnuplot reveals an interesting graph:

Plot

gnuplot> set style data histograms
gnuplot> plot './files_freqmap.txt' using 2:xtic(1)

It seems the lengths follow a roughly normal distribution but some stand out, such as the filepaths with a length of 64. Filtering those files with rg '.{64}' files.txt and making a frequency map with the first 31 characters shows that the most frequent paths (1004732 instances) start with /tank/backup/disaster/software/. The second most common prefix is /tank/backup/backup/vm0/tank/po and only occurs 6484 times.

import sys

fm = {}

with open(sys.argv[1]) as f:
    for l in f.readlines():
        l = l[:31]
        fm[l] = fm.get(l, 0) + 1

f = [[] for _ in range(1 << 20)]

for l, i in fm.items():
    f[i].append(l)

for i, l in enumerate(f):
    if len(l) > 0:
        print(i)
        for e in l:
            print(' ', e)

Filtering for files starting with that prefix gives a very long list of SVN files:

...
/tank/backup/disaster/software/web/apache/db/revprops/706/706302
/tank/backup/disaster/software/web/apache/db/revprops/706/706296
/tank/backup/disaster/software/web/apache/db/revprops/706/706375
...