Lengths of absolute filepaths
2022-01-03
While there may not be universal agreement as to what a reasonable maximum path length is, I was wondering how long paths are on average in practice.
I started off by making a list of every absolute path on my system (as root):
find / > /tmp/files.txt
This produced a 1.6G file, so I wrote a script in Rust to collect frequencies as well as an example string for each length:
use std::env::args;
use std::error::Error;
use std::fs::File;
use std::io::{BufRead, BufReader};
fn main() -> Result<(), Box<dyn Error>> {
const V: (u32, Vec<u8>) = (0, Vec::new());
let mut freqmap = [V; 1 << 16];
let f = args().skip(1).next().ok_or("expected filename")?;
let f = File::open(f)?;
let f = BufReader::new(f);
for l in f.split(b'\n') {
let l = l?;
let i = l.len();
.0 += 1;
freqmap[i].1 = l;
freqmap[i]}
for (i, (c, l)) in freqmap.iter().enumerate() {
if *c != 0 {
println!("{:>4} {:>8} '{}'", i, c, String::from_utf8_lossy(l));
}
}
Ok(())
}
The longest filepaths
On my system, the longest filepath is 344 characters long. The path in question is:
/tank/backup/backup/pc1/david/.cache/yarn/v6/npm-socketcluster-14.4.1-e39883c005becbf1d6dba2ced7e04bbfa857693d-integrity/node_modules/socketcluster/sample/node_modules/socketcluster/sample/node_modules/socketcluster/sample/node_modules/socketcluster/sample/node_modules/scc-broker-client/node_modules/socketcluster-client/lib/scsocketcreator.js
Removing all paths that contain node_modules
with rg -Nv node_modules files.txt
and rerunning the script shows that the longest path without node_modues
is 272 characters long:
/tank/backup/backup/pc1/david/Documents/compilers/osxcross/target/SDK/MacOSX11.1.sdk/System/iOSSupport/System/Library/Frameworks/_AuthenticationServices_SwiftUI.framework/Versions/A/Modules/_AuthenticationServices_SwiftUI.swiftmodule/x86_64-apple-ios-macabi.swiftinterface
The one path that is 271 characters long is:
/tank/backup/disaster/software/os/yocto/poky-support/poky-contrib-archive/scripts/lib/bsp/substrate/target/arch/layer/{{ if create_example_bbappend == "y": }} recipes-example-bbappend/example-bbappend/{{=example_bbappend_name}}-{{=example_bbappend_version}}/example.patch
Removing all instances involving osxcross
and yocto
reveals no further obvious patterns other that the files are part of a software project (with .cabal
files as the sole exception).
Outliers
Plotting the frequency map with gnuplot
reveals an interesting graph:
gnuplot> set style data histograms
gnuplot> plot './files_freqmap.txt' using 2:xtic(1)
It seems the lengths follow a roughly normal distribution but some stand out, such as the filepaths with a length of 64. Filtering those files with rg '.{64}' files.txt
and making a frequency map with the first 31 characters shows that the most frequent paths (1004732 instances) start with /tank/backup/disaster/software/
. The second most common prefix is /tank/backup/backup/vm0/tank/po
and only occurs 6484 times.
import sys
= {}
fm
with open(sys.argv[1]) as f:
for l in f.readlines():
= l[:31]
l = fm.get(l, 0) + 1
fm[l]
= [[] for _ in range(1 << 20)]
f
for l, i in fm.items():
f[i].append(l)
for i, l in enumerate(f):
if len(l) > 0:
print(i)
for e in l:
print(' ', e)
Filtering for files starting with that prefix gives a very long list of SVN files:
...
/tank/backup/disaster/software/web/apache/db/revprops/706/706302
/tank/backup/disaster/software/web/apache/db/revprops/706/706296
/tank/backup/disaster/software/web/apache/db/revprops/706/706375
...