# Lengths of absolute filepaths 2022-01-03 While there may not be universal agreement as to what a reasonable maximum path length is, I was wondering how long paths are on average in practice. I started off by making a list of every absolute path on my system (as root): ```sh find / > /tmp/files.txt ``` This produced a 1.6G file, so I wrote a script in Rust to collect frequencies as well as an example string for each length: ```rs use std::env::args; use std::error::Error; use std::fs::File; use std::io::{BufRead, BufReader}; fn main() -> Result<(), Box> { const V: (u32, Vec) = (0, Vec::new()); let mut freqmap = [V; 1 << 16]; let f = args().skip(1).next().ok_or("expected filename")?; let f = File::open(f)?; let f = BufReader::new(f); for l in f.split(b'\n') { let l = l?; let i = l.len(); freqmap[i].0 += 1; freqmap[i].1 = l; } for (i, (c, l)) in freqmap.iter().enumerate() { if *c != 0 { println!("{:>4} {:>8} '{}'", i, c, String::from_utf8_lossy(l)); } } Ok(()) } ``` ## The longest filepaths On my system, the longest filepath is **344** characters long. The path in question is: ``` /tank/backup/backup/pc1/david/.cache/yarn/v6/npm-socketcluster-14.4.1-e39883c005becbf1d6dba2ced7e04bbfa857693d-integrity/node_modules/socketcluster/sample/node_modules/socketcluster/sample/node_modules/socketcluster/sample/node_modules/socketcluster/sample/node_modules/scc-broker-client/node_modules/socketcluster-client/lib/scsocketcreator.js ``` Removing all paths that contain `node_modules` with `rg -Nv node_modules files.txt` and rerunning the script shows that the longest path without `node_modues` is **272** characters long: ``` /tank/backup/backup/pc1/david/Documents/compilers/osxcross/target/SDK/MacOSX11.1.sdk/System/iOSSupport/System/Library/Frameworks/_AuthenticationServices_SwiftUI.framework/Versions/A/Modules/_AuthenticationServices_SwiftUI.swiftmodule/x86_64-apple-ios-macabi.swiftinterface ``` The one path that is **271** characters long is: ``` /tank/backup/disaster/software/os/yocto/poky-support/poky-contrib-archive/scripts/lib/bsp/substrate/target/arch/layer/{{ if create_example_bbappend == "y": }} recipes-example-bbappend/example-bbappend/{{=example_bbappend_name}}-{{=example_bbappend_version}}/example.patch ``` Removing all instances involving `osxcross` and `yocto` reveals no further obvious patterns other that the files are part of a software project (with `.cabal` files as the sole exception). ## Outliers Plotting the frequency map with `gnuplot` reveals an interesting graph: > [![Plot][plot]][plot] ``` gnuplot> set style data histograms gnuplot> plot './files_freqmap.txt' using 2:xtic(1) ``` It seems the lengths follow a roughly normal distribution but some stand out, such as the filepaths with a length of 64. Filtering those files with `rg '.{64}' files.txt` and making a frequency map with the first **31** characters shows that the most frequent paths (**1004732** instances) start with `/tank/backup/disaster/software/`. The second most common prefix is `/tank/backup/backup/vm0/tank/po` and only occurs **6484** times. ```py import sys fm = {} with open(sys.argv[1]) as f: for l in f.readlines(): l = l[:31] fm[l] = fm.get(l, 0) + 1 f = [[] for _ in range(1 << 20)] for l, i in fm.items(): f[i].append(l) for i, l in enumerate(f): if len(l) > 0: print(i) for e in l: print(' ', e) ``` Filtering for files starting with that prefix gives a very long list of SVN files: ``` ... /tank/backup/disaster/software/web/apache/db/revprops/706/706302 /tank/backup/disaster/software/web/apache/db/revprops/706/706296 /tank/backup/disaster/software/web/apache/db/revprops/706/706375 ... ``` [freqmap]: files_freqmap.txt [plot]: files_freqmap_plot.svg