Elektron Transfer 1.1

mzero · March 30, 2018, 8:01pm

Before I cause a panic… It is about 1-in-a-1000 chance of a collision if you have 3000 sample files, all the same length.

:: math geekery ahead ::

This is essentially a variation on the Birthday Problem, but with files and hash values, not people and birthdays.

The hash value is 32-bits. Assuming it is a decent algorithm, each file has arbitrary ~~birthday~~ hash value somewhere between 0 and 4 billion (2^32).

For a given number of files, we can compute the likelihood of two of them ~~partying~~ colliding on the same ~~birthday~~ hash:

p(n files) ≈ 1 - e ^ (-n×n / 2^33)

At about 3,000 files, this is 1 in a 1000 chance, at 9,000 files, about 1% chance, and at 77,000 files 50% of a collision. See the first row from this table

BUT before you freak out, the files are only considered the same if the length and the hash are the same. So you need 3,000 files exactly the same size before you hit that 1-in-a-1000 chance.

So - sure, if you load up 3,000 single cycle waveforms, all exactly the same number of samples – you have a 1 in a 1,000 chance that two of these will collide and one won’t load. If you load up 77,000 single cycle waveforms… you have more time to audition samples than I do!

This all assumes that the hash algorithm is good - but any modern algo. producing 32 bits from 1k byte or more of sample data will be good. As far as I can tell, the algo. used by Elektron has all the right goodness.

Yes, it is a bit of a shame they choose only 32bits for the hash. At 64bits, you couldn’t load enough samples on the unit to get a 1-in-a-million chance of collision. At 128bits, we could all be loading samples until heat death of the universe and still have an infinitesimal chance of collision.

:: end of math geekery ::

Don’t worry - hash collision not impossible, but not likely unless you load thousands of samples all the same size.

vasculator · March 30, 2018, 8:02pm

this explains the dupe file issue. thank you for providing the much needed detail. i hate to say it, but it sounds like this was all architected in a less-than-ideal way. that is an insanely high collision rate.

essentially, if you try to load the Adventure Kid single cycle waveforms, you’ll get a bunch of collisions which is what I experienced.

thelightshineth · March 30, 2018, 8:05pm

blah blah blah something about the death of the universe, partying and lots of alcohol? I’m in!

tnussb · March 30, 2018, 8:17pm

A bunch of collisions with “only” 4300 waveforms (are they really all of the exact same length)?

Either this is something completely different or you are quite an unlucky person.

vasculator · March 30, 2018, 10:22pm

luck is bullshit. it’s logic. single cycle waveforms at the same pitch are all extremely small and by definition the same size. it’s almost a perfect test case.

try it out yourself it takes a few minutes and no space.

tnussb · March 30, 2018, 10:59pm

From your description it wasn’t clear that you are using a single sample length and the website provides samples with different length.

And no, it’s not logic. Even with 256 sample length a good hashing algorithm shouldn’t generate “a bunch of collisions” in just 4300 files. It seems that the used algorithm is not that good …

mzero · March 30, 2018, 11:14pm

Actually, it has to do with the size of the hash, not the algorithm. Even a perfect hash, if only 32bits, will result in the probabilities I gave. Alas, the has size on these machines is just 32bits.

tnussb · March 30, 2018, 11:24pm

It’s not the probabilities you gave, mzero. You have done a fine job in explaining the details.

It’s about the “bunch of collisions” with just 4300 files (even if the files are only 256 samples long). That seems quite off the expected result. So either very “unlucky” (~1/800 chance) or a “not so good after all” hashing algorithm.

PS: … or maybe real duplicates? …

kwamensah · March 30, 2018, 11:46pm

Or, another issue unrelated to the hash.

Open_Mike · March 30, 2018, 11:50pm

My only hash collisions occur when I first place it in something before reaching for a lighter…

vasculator · March 31, 2018, 12:02am

nope as i said a clean machine with no previous material. no true dupes. the collisions (around 10) in a large group of unique waveforms.

re5et · March 31, 2018, 1:08am

Not something I’m concerned about .
The waveforms built in sound fine , i’m Quite sure i’ll Never encounter this issue with normal samples either , and if I do it’ll take a few seconds to fix .

ottucsak · March 31, 2018, 2:12am

The problem is that tuned single cycle waveforms all have the same length and the most common single cycle collection has enough files to probably cause a collision, so I would not consider this an edge case. To be honest after they choose a 32bit hash size, I have my doubts about the hashing algorithm as well.

mzero · March 31, 2018, 5:43am

I had to go deeper…

Turns out, if what we’re talking about is the Adventure Kid single cycle waveform library… The duplicates are real!

The library has four pairs of duplicate files:

AKWF_0001/AKWF_0031.wav
AKWF_0001/AKWF_0032.wav

AKWF_0003/AKWF_0232.wav
AKWF_0003/AKWF_0233.wav

AKWF_0006/AKWF_0515.wav
AKWF_0006/AKWF_0517.wav

AKWF_bw_blended/AKWF_blended_0055.wav
AKWF_bw_blended/AKWF_blended_0056.wav

These are exactly the four duplicates reported by Transfer, and so the second of each pair won’t be transferred. crunch will let you transfer the duplicates, and they will work, but it isn’t clear to me what the instruments do when you load a file into a project, and the same samples appear more than once +Drive.

On a stock Digitakt (well, stock plus some of my own stuff…) there were no other collisions. All clear… nothing to see… move along now… rest easy, folks!

— Your intrepid investigative engineer, Mark

Yes, I completely transferred - and then deleted - the whole AKWF library, all 4,358 files of it, twice! Once with crunch and once with Transfer. Each time took about 45min. total! My poor poor Digitakt…

kwamensah · March 31, 2018, 5:53am

Wonder how it performs with c6. Maybe I’ll have to run that test since c6 my rytm mkii still works with c6

tnussb · March 31, 2018, 7:14am

Thanx, Mark. I was about to check for duplicates myself today and you saved me that time.

Riddle solved. No more blame on the algorithm

acidhouseforall · March 31, 2018, 9:43am

– more geekery –

Any clue on this ? crc32 ?

md5 would be 128bits, right ?

darenager · March 31, 2018, 1:30pm

Experienced another bug today, when copy 2 samples from DT to PC whilst DT was playing the second sample got corrupted, it had 2 blasts of white noise about 1/3rd and 2/3rds into the waveform, not a biggie as I was just seeing if it would work.

Also I’m noticing that when selecting a bunch of samples to copy from DT to PC can sometimes work just fine, but most of the time at least a couple of files don’t transfer and the app freezes up, but if I relaunch and do the files individually (so far) no issues.

digimatt · March 31, 2018, 3:11pm

Digitakt picks up the same noise curruptions every now and then in normal play, its been doing it since release. Its allways about a third of the way through the sample and appears out of the blue. Going into the +drive and either reloading or just deselecting/reselecting the sample without loading removes it everytime, I think it happens more than I notice as its not allways easy to hear if the sample is short in length.

PeterHanes · April 5, 2018, 11:56pm

9 posts were split to a new topic: “Things were easier back when I was young …”