I got bored and in the complex gyrations of avoiding real responsibility I randomly hacked together this potentially useful set of DSP FFTs. The usage is really simple. Look at example.c to see how to use it.
This DSP FFT can support 16, 32, 64, 128, 256, 512, and 1024 point transforms using the DSP. An advantage is that it takes multiple successive buffers and does them all in one sweep. You can do transforms of complex or real data. All input and output values are float which are constrained between -1 and +1. (Internally it is converted to 16-bit integers). The output is always complex. For real data, values are in successive buffers in a continuous 1-D array. For complex inputs the data format is real values followed by complex values, repeating for as many buffers as you have. The complex output is the same as complex input. The transforms have about 5 digits of accuracy and are normalized by 1/N, where N is the transform size.
There are some dangers which are due to my laziness: If there is an error (loading the DSP core files, DSP preoccupied, etc.), then the thing just simply prints an error message to stderr and quits. This could cause your program to die suddenly. You can fix this by going into the dspfft.m file and modifying the error handling characteristics yourself for better robustness.
It uses streaming: for up to 512 point transforms it uses the double-buffered fast DMA protocol, and for the 1024 it uses the slow blocking DMA protocol. The limitation is that for the 1024 transform it wasn't possible to fit all the buffers and data on the 8K of DSP memory.
A really easy way to include the utilities in your program is to add the library "libdspfft.a" to your Interface Builder project. Also add the appropriate .snd files to your project (the names are dspfftXXX.snd, where XXX is the number of points). Be sure to include the dspfft.h file in the files which call the main function, dspfft().
There is a bit of overhead so the larger the number of blocks that are transformed, the faster the average FFT. I believe that most of the time is spent doing DMA transfers. Obviously, if the DSP program is made more complex there even better speedups. Meanwhile we're all impatiently waiting for the 200+ MIPS NeXT.