I've implemented DCT Type-II and Type-I, as well as the DFT. What I like about the DFT is the quality of the results. I suppose the nature of the DCT and DST plays in to that, I don't understand in full yet the consequences of the Even|Real / Odd|Imaginary-ness of the DCT/DST, but I understand enough from exploring their results that it affects the how 'well' they define the spectrum components of a given audio sample. Well, I shouldn't actually say that, I'm gauging that from what I can see and the few statistics I'm performing based on the results.
One of the things that is most important in my application is resolution and not so much precision. I want to get as much information about the signal with taking a degree of hit on the precision of the results (if I have to lose something). Approaching this from a strategic perspective I'm falling towards hw accelerating the DFT, considering GPU's are a little sloppy on the quality of the floating point values in the general case, they are fast, and would perform the 'luxurious' DFT pretty damn quick regardless. If i used a 16x16 texture that's 256 samples, which is double the number of samples I was already satisfied with. Most mid-range GPU's these days have at least 8 stream proc's, the newest one's out there now are over 800 for about $200. (ATI HD5880). From general testing I can tell greater than 32bit floating point precision is not necessary, so that's another aspects in favor of GPU's, I could quadruple the amount of data I could process over four 32-bit color channels and actually pull off 1024 sample DFTs. (I'd store the results in an associated 16x16 'result' texture.) (But of course even GPU's have performance limitations and I'm doing a lot there too so... I'll have to find a balance.)
I think I'm OK with recommending like a cheap-o 9800pro $30 card and just call it a min spec and do this all with only GPU-CPU transfers as a the limit to the CPU load. Hell, maybe I can offset all of my statistics to the GPU. I think I've already convinced myself to just go a whole new route with all of this. (I have *a lot of other information to track for real-time visual models, so I need to be as strict as possible when managing what the CPU is doing.)
I have to say though there are some things I can't hw accelerate, like classes I keep trying to optimize which track and hold vertex locations (particle classes). I definitely never thought to consider the role of L1/L2 cache limits when dealing with large enough arrays/objects(?), I think I could still apply that in other areas where I'm still trying to reach for performance in having to track and manage 9k-13k+ vertices. I'm a little excited to think there's room to get a decent increase in the amount of vertices I can manage within 2-3% CPU use on a ~C2D 2.5ghz, if it turns out I am being a lot more sloppy than I realized.
Good stuff, I got more out of this thread then I thought I would, thinking this out sorta helped me rethink a few things that shoulda been more obvious to me earlier.
I'm not sure if this is addressed to me, or whether you're trying to answer my question about implementing the FFT... But assuming it is, you are mistaken if you think the FFT is somehow less precise than the DFT. The FFT is what is used in ALL production signal processing applications, and I do mean all.
Originally Posted by since
Speaking to why the Fourier transform involves complex numbers, while the cosine and sine transforms do not, this is because the Fourier transform determines not only the magnitude of a particular frequency but also its phase. A wave of arbitrary phase can be expressed as the sum of a cosine and a sine wave. The complex value of the Fourier transform is just a way of representing how much of the wave is cosine and how much of it is sine -- in other words, it describes the phase.
The DCT and DFT produce purely real output because they pay no attention to phase. This results in a loss (or more accurately, an obfuscation) of information, because frequencies in the input which are not phase matched to the basis functions will appear to have magnitudes that are less than the true magnitude. But for many application (JPEG, for instance) this is not so important, since the only requirement is that the original signal be somehow reconstructible from the transformed signal, and that is always possible. These sorts of transforms are just vector projections, and given a complete basis, are always reversible.
I do understand that the FFT is used in all production applications, though I can't say anything about their performance. I've looked over the list of FFT algorithms on WIKI and tried to find websites which explained them more in depth so I could implement them as an educational exercise, but that kind of material is sparse for most of the algorithms. (I found a few very good PDF's but, I'm putting them to the side for now.)
I'm not saying I won't use an FFT algorithm in the end, if I do still... I think I'll implement it on the GPU instead of the CPU.
I wasn't making any comments on the FFT's, if I were I would have specified that. I strictly have implemented and can only speak (limited) about DCT's and DFT's at the moment, so I wouldn't say anything about the FFT's with confidence until I explore them. (sigh, I still have a lot more to go through on the theory side actually, to have a more intuitive understanding.)
I also don't have the time to evaluate every form of FFT (I might look at FFTW) conveniently as far as I know right now. I still feel I could sooner benefit even more greatly from a parallel processing model over any CPU centric optimizations as far as the bottom line is concerned (I'm sure a few of the FFT's could be easily adapted to a parallel processing model, hopefully). So I'm still going to go for getting the traditional DFT in to shaders first and then work my way in to FFT's after that.
I'm open to any suggestions if you have experience implementing FFT's, any problems I'm about to run in to I'm sure you might have already dealt with. It seems like most people just go with existing libraries, which makes sense but since I'm working on shaders instead that changes things a bit.