# Lengths of absolute filepaths
2022-01-03
While there may not be universal agreement as to what a reasonable maximum
path length is, I was wondering how long paths are on average in practice.
I started off by making a list of every absolute path on my system (as root):
```sh
find / > /tmp/files.txt
```
This produced a 1.6G file, so I wrote a script in Rust to collect frequencies
as well as an example string for each length:
```rs
use std::env::args;
use std::error::Error;
use std::fs::File;
use std::io::{BufRead, BufReader};
fn main() -> Result<(), Box> {
const V: (u32, Vec) = (0, Vec::new());
let mut freqmap = [V; 1 << 16];
let f = args().skip(1).next().ok_or("expected filename")?;
let f = File::open(f)?;
let f = BufReader::new(f);
for l in f.split(b'\n') {
let l = l?;
let i = l.len();
freqmap[i].0 += 1;
freqmap[i].1 = l;
}
for (i, (c, l)) in freqmap.iter().enumerate() {
if *c != 0 {
println!("{:>4} {:>8} '{}'", i, c, String::from_utf8_lossy(l));
}
}
Ok(())
}
```
## The longest filepaths
On my system, the longest filepath is **344** characters long. The path in
question is:
```
/tank/backup/backup/pc1/david/.cache/yarn/v6/npm-socketcluster-14.4.1-e39883c005becbf1d6dba2ced7e04bbfa857693d-integrity/node_modules/socketcluster/sample/node_modules/socketcluster/sample/node_modules/socketcluster/sample/node_modules/socketcluster/sample/node_modules/scc-broker-client/node_modules/socketcluster-client/lib/scsocketcreator.js
```
Removing all paths that contain `node_modules` with `rg -Nv node_modules files.txt`
and rerunning the script shows that the longest path without `node_modues`
is **272** characters long:
```
/tank/backup/backup/pc1/david/Documents/compilers/osxcross/target/SDK/MacOSX11.1.sdk/System/iOSSupport/System/Library/Frameworks/_AuthenticationServices_SwiftUI.framework/Versions/A/Modules/_AuthenticationServices_SwiftUI.swiftmodule/x86_64-apple-ios-macabi.swiftinterface
```
The one path that is **271** characters long is:
```
/tank/backup/disaster/software/os/yocto/poky-support/poky-contrib-archive/scripts/lib/bsp/substrate/target/arch/layer/{{ if create_example_bbappend == "y": }} recipes-example-bbappend/example-bbappend/{{=example_bbappend_name}}-{{=example_bbappend_version}}/example.patch
```
Removing all instances involving `osxcross` and `yocto` reveals no further
obvious patterns other that the files are part of a software project (with
`.cabal` files as the sole exception).
## Outliers
Plotting the frequency map with `gnuplot` reveals an interesting graph:
> [![Plot][plot]][plot]
```
gnuplot> set style data histograms
gnuplot> plot './files_freqmap.txt' using 2:xtic(1)
```
It seems the lengths follow a roughly normal distribution but some stand
out, such as the filepaths with a length of 64. Filtering those files with
`rg '.{64}' files.txt` and making a frequency map with the first **31**
characters shows that the most frequent paths (**1004732** instances) start
with `/tank/backup/disaster/software/`. The second most common prefix is
`/tank/backup/backup/vm0/tank/po` and only occurs **6484** times.
```py
import sys
fm = {}
with open(sys.argv[1]) as f:
for l in f.readlines():
l = l[:31]
fm[l] = fm.get(l, 0) + 1
f = [[] for _ in range(1 << 20)]
for l, i in fm.items():
f[i].append(l)
for i, l in enumerate(f):
if len(l) > 0:
print(i)
for e in l:
print(' ', e)
```
Filtering for files starting with that prefix gives a very long list of SVN
files:
```
...
/tank/backup/disaster/software/web/apache/db/revprops/706/706302
/tank/backup/disaster/software/web/apache/db/revprops/706/706296
/tank/backup/disaster/software/web/apache/db/revprops/706/706375
...
```
[freqmap]: files_freqmap.txt
[plot]: files_freqmap_plot.svg