Before I cause a panic… It is about 1-in-a-1000 chance of a collision if you have 3000 sample files, all the same length.
:: math geekery ahead ::
This is essentially a variation on the Birthday Problem, but with files and hash values, not people and birthdays.
The hash value is 32-bits. Assuming it is a decent algorithm, each file has arbitrary birthday hash value somewhere between 0 and 4 billion (2^32).
For a given number of files, we can compute the likelihood of two of them partying colliding on the same birthday hash:
p(n files) ≈ 1 - e ^ (-n×n / 2^33)
At about 3,000 files, this is 1 in a 1000 chance, at 9,000 files, about 1% chance, and at 77,000 files 50% of a collision. See the first row from this table
BUT before you freak out, the files are only considered the same if the length and the hash are the same. So you need 3,000 files exactly the same size before you hit that 1-in-a-1000 chance.
So - sure, if you load up 3,000 single cycle waveforms, all exactly the same number of samples – you have a 1 in a 1,000 chance that two of these will collide and one won’t load. If you load up 77,000 single cycle waveforms… you have more time to audition samples than I do!
This all assumes that the hash algorithm is good - but any modern algo. producing 32 bits from 1k byte or more of sample data will be good. As far as I can tell, the algo. used by Elektron has all the right goodness.
Yes, it is a bit of a shame they choose only 32bits for the hash. At 64bits, you couldn’t load enough samples on the unit to get a 1-in-a-million chance of collision. At 128bits, we could all be loading samples until heat death of the universe and still have an infinitesimal chance of collision.
:: end of math geekery ::
Don’t worry - hash collision not impossible, but not likely unless you load thousands of samples all the same size.