Well, the original ISO code that I was optimizing produces the EXACT results (as it is the standard for comparing against) but takes ages to decode. I managed to get the speed down to 16.25secs for a 17:30minute song on my 3GHz, which (ignoring the fact that new processors do more in 1 cycle) suggests that it should just about play in real time on a 50Mhz:-
But then when I compared the outputs with the orignal, it was completely inaccurate. I hope that this is mainly due to how I converted from floating math to fixed point...
---- = 60, 60 * 16.25 = 975 < 1050 (17:30)