We just hit the biggest single win for MCL, in the entire development time.
Specifically, 8x TURBO is back and in a big way.
I was having a few lousy few days of coding. Struggling to get the performance from the Chain mode that I was looking for. Chain mode relies on being able to transmit parameter changes very quickly. But the timing of the changes was off, and I was unsatifised with the results.
I had spent an afternoon implementing delay compensation. Estimating the time it would take to transfer a collection of parameters, and then accounting for that delay appropriately.
It wasn’t really working and out of frustration I decided to measure the actual maximum byte rate that the MegaCommand was transmitting over MIDI.
My measurements for 4x TURBO were showing something between 5 and 10 KBps . This was surprising as I had expected the performance to be consitently closer to the theoretical max.
The actual max byte-rate for 4xTURBO is 12.5KBps ((31250 x 4) / 10) (each byte consists of 8 data bits + 1 start bit and 1 stop bit).
As expected, 8x turbo was performing similarly bad,
–
I spent a couple of hours looking at the interrupt code and rearranging things to win some small, but measurable improvements. Running out of solid ideas, I commented out a few lines of safety code that was written to prevent MIDI buffers from overflowing.
To my astonisment this had a signifiant impact on my throughput results, and then it occurred to me…
A few days ago, I noticed a clever line of code in some of the low level libraries that Wesen developed for MIDICtrl. The specific library was the ring buffer library. The ring buffers are the data structures that hold the MIDI send/and receive data.
#define RB_INC(x) (T)(((x) + 1) % N)
Wesen’s code is full of clever tricks of logic like this. What this macro does is increase the value x so that x < N; if x == N, reset x to 0
The mod operation includes division.
C = A % B is equivalent to C = A – B * (A / B).
and division of 16bit integers is extermely costly on an 8 bit micro-controller.
The above macro was being executed multiple times in the ring buffer code. Every time a byte needed to be transmitted or received at least one division operation was being performed.
Replacing the above macro with a standard approach, and we have gained full throughput for both 4x and 8x turbo (12.5KBps and 25KBps respectively). This not only increases transfer speeds, but it frees up an enormous amount of cpu time for other tasks.