PDA

View Full Version : Frequency density of spectrogra[ph/m]



doubleanti
12-16-2004, 02:11 AM
Greetings,

I'm programming a DFT-based spectrograph, or... spectrogram (I'm not sure what it's called exactly) viewer. But I'm having a little snafu. My approach to selecting which frequencies to test was as follows:

I figured since the data was sampled discretely at 44.1 KHz, I would use samples as a measure of which frequencies to test rather than just using frequency. For example, I would use a period of 1 sample, or 2 samples, or 3, or 4, or 6, or 8 and so forth because it'd be easiest to calculate a sin/cos table which could be reused. That is, using arbitrary periods would require recalculation of the sin/cos values since the table's period would not coincide with the period we'd be testing for.

But, using 1, 2, 3, and so forth only corresponds to testing for 44.1K, 22.2K, 14.7K, and so forth. Granted, speech generally lies within the the bounds of 8 K, but even when the period gets to this using discrete sample boundaries (44.1/5 samples = 8.82K, next 44.1/6 = 7.35) the delta frequency ends up being a whopping K and a half roughly.

I don't wanna miss out data in that range! So what can I do to improve the resolution in that high frequency range? If I could get data sampled at like, 44.1K * 4 that'd be nice, 'cept what software can do that? Or... can my soundblaster compatible even do that? Help!

Thanks a lot!

VirtualAce
12-16-2004, 02:01 PM
You cannot sample over 44100 except on newer 24-bit cards which support 48100 or something like that.

If you are trying to get rid of the aliasing all I would do is linear interpolate samples over time. This will however cause the sound to become muffled as more and more samples are smoothed thus causing each sound sample to flatten out and reducing the dynamic range of them.

I have Sony's Sound Forge software which can accomplish much of this quite nicely but I do not have the algorithms for it.

doubleanti
12-16-2004, 02:22 PM
I see, that is a great idea!

What I've done in the mean time is calculate sine/cosine values for uneven (in samples) periods, but it's made my algorithm terribly slow. However it does allow me to have a linear taper of frequency (100Hz to 8K), or, actually, whichever frequencies I select.

I wonder though that, if I use linear interpolation, will I obtain the sort of resolution I would like at high frequencies. I would figure, using uniform interpolation, that I would get a similar 1/x taper in frequency as before. But again apparently the standard usage of spectrographs is to have linear taper... for reasons I suppose I'll find out when I do pattern matching for formants, but that's later.

I would figure I would have to interpolate 10 or so points for high frequencies (ie the 7.xx to 8.xx KHz jump) just to get that 1KHz gap to around 100 Hz, is that a correct approximation?

Or... should I look to FFT? What sort of taper does it offer? I don't quite understand it, which is why I'm avoiding it for now, but as time offers I will check it out. Thanks Bubba, as always.

VirtualAce
12-17-2004, 04:59 AM
I'm not totally familiar with FFTs so I can't offer any advice on them. But I can tell you that using linear interpolation you can effectively choose a resolution and be guaranteed to not miss any samples within that resolution. Since there are an infinite amount of values between two numbers it's impossible to recreate every single value between samples. But if you linear interpolate over enough samples you could simulate what the actual sample there would be.

| - points gained by linear interpolation
resolution of about 3 - 3 new points

Sample1.......................|................... ........|...........................|............. ............Sample2
Sample1......................(.25)................ .....(.50)...................(.75)................ ......Sample2

So draw your original wave or line from sample 1 to sample2 -> you are obviously missing a lot of information if the sampling rate is low. So to gain more resolution than is actually available you can linear interpolate it.

For instance if you have 10000 samples and use a resolution of .1 - you gain 10 more samples between each of the 10000 which would bring the total to 10000*10 samples. The obvious problem here is memory but speed should not be a concern.



float LinearInterpolate(float fV1,float fV2,float fInterp)
{
return (fV1+fInterp*(fV2-fV1));
}




float LinearInterpolate_ASM(float fV1,float fV2,float fInterp)
{
//Assuming parameters are pushed right to left
//fInterp - ebp+8
//fV2 - ebp+12
//fV1 - ebp+16
asm {
push ebp
mov ebp,esp

fld [ebp+12]
fsub [ebp+16]
fmul [ebp+8]
fadd [ebp+16]

pop ebp
ret
}
}

I'm not sure if you need the (push ebp - mov ebp,esp) stuff because I think the compiler will do that for you. In fact you might be pushing ebp twice if you do it my way. Check your compiler documentation about how to access parameters - you could just access them using their C names as well.

Of course you could just use the good old sine filter - but either way you are trying to get more samples than what you currently have. If you have time information (sample period info) and sample value information then you have enough info to discretely sample the wave for all unknown points. It will effectively turn a jagged digital wave into a somewhat more natural wave.

Note that this is also assuming that your wave or your data progresses in linear fashion. That is if there is some unknown non-linearity between sample1 and sample2 the linear interpolation function will NOT find the correct value for the discrete samples following the non-linearity.

This picture from Sony's Sound Forge shows the problem I'm talking about. If you only know sample 1 and sample 2 and you linear interpolate along that line -> the non-linearity will be skipped. Of course in this example linear interpolating between 1 and 2 will yield next to nothing since both samples lie on a wave crest so you will just get a straight line from 1 to 2....but I hope it shows what I'm talking about. Essentially any non-linearites between samples will be lost during linear interpolation.

doubleanti
12-17-2004, 02:41 PM
Right.

I took to the books last night on FFT, and it will give you as much frequency resolution as you have samples, just liked my previous 'brute-force' n-squared algorithm did (my first attempt). But then, there is the tradeoff between linear interpolation to increase the sample count (and thus frequency resolution) into an FFT algorithm, and the regular n^2 algorithm itself.

So then we're left with at what point

k n log (kn) = n^2

since we'd be introducing k samples for every n sample originally in the signal.

Which I'm not quite sure how to solve...

But for now I'll take it on faith that it is worth it, asymptotically.

Problem is, my window will only be sufficiently large to allow sampling periods of 20 Hz (the lower bound on human hearing), so though asymptotically it will be worth it, if I introduce enough samples I will end up throwing away parts of the FFT as it stands.

Well, we'll see... thanks Bubba.

VirtualAce
12-18-2004, 12:31 AM
No problem. Your post prompted me to do some research about sampling rates and so forth. I came across some interesting information about infrasounds or sounds below the frequency of audible human hearing. If you are messing around with infrasounds be extremely careful as to how much you amplify them. If they are amplified and you listen to them you can really hurt your body a great deal. There are certain frequencies that actually will disable a human or incapacitate the human brain if listened to at high enough amplitudes. Other signals are known to cause internal bleeding and have made people very ill.

It sounds strange but it is true. That wave you see in my picture is a 20Hz wave and I amped it and listened to it (prior to researching its effects) and wow what a headache. I could barely hear the sound so I listened closer. When I put the headset down I nearly had to go and lay down to stop my head from throbbing.

So sound sampling is very cool but be careful with it because infrasounds are extremely powerful. It's possible during linear interpolation to reach the infrasound level between samples.

Sort of off topic but its a word of caution nonetheless. Any sound below or at 100 Hz is really where the danger threshhold is. I would imagine the same is true for those sounds that reach the upper level of human hearing although I believe higher frequency sounds do not affect matter and/or human processes like lower sounds do. It is possible to generate a high-amplitude low frequency sound wave that will literally destroy a concrete wall.

doubleanti
12-18-2004, 12:47 AM
>There are certain frequencies that actually will disable a human or incapacitate the human brain if listened to at high enough amplitudes. Other signals are known to cause internal bleeding and have made people very ill.

I have always wondered about that... well, it turns out that as you know we use other frequency ranges to transmit radio and other sorts of information. Granted, these are not variations in fluid pressure that sound is. But I've always wondered that if we did use it as a transmission medium at subsonic or ultrasonic frequencies, what sorts or other animals or whatnot we'd be ........ing off.

Thanks for the tip... it's scary... good thing people don't walk around with those kinds of things to annoy people or make them ill... lord... knowledge is power.

I'm working on my FFT right now...

So, here's my process:


FFT (start, length, data[])

if l == 2, do + and - and store in respective slots

else

reshuffle into odd index array and even index array

FFT (start, length / 2, data)
FFT (start + length / 2, length / 2, data)

recombine



so... I don't know, I'm getting partially correct values for a 4 sample FFT of [1,2,3,4].

Is my ordering correct? Thanks. Workin' on it... I hope the book isn't wrong with the example results!

doubleanti
12-18-2004, 02:22 AM
Ah, crap, I got it...

But then, how do I interpret the results?

I would figure that an n-sample sequence would only have log2(n) different components, considering each must be a harmonic of the length of the piece, right?

But, you get n different values... a 4 value signal cannot have componenents with a period of 3. Right?

doubleanti
12-19-2004, 01:43 PM
Ah, figured out how to interpret it.

But then, if I'm to increase the frequency density using linear interpolation, it seems it may not be quite right for the following reason. Data sampled at, say, 8 KHz, can only contain sinusoids at 4 KHz via Shannon's Law (given zero noise). So perhaps this linear interpolation is in vain? Also, I have heard the phone company only samples at 3.x KHz, which apparently is 'good enough' for your noggin's speech recognition algorithm...

Something to think about...

Oh, and I'm sure many have pondered this before, but it brings up the idea of simulating a box in a box, and the paradox that (I think) it is. Similarly, should we not then propose an algorithm which works at the lowest frequencies at which our own brain works?

And I would like to see how I can eliminate the residues I will end up by trying to use non-linear interpolation. But enough talk, must code... Bubba?

VirtualAce
12-20-2004, 12:29 AM
To use non-linear interpolation you must change the interpolant over time. This will basically result in a type of pseudo spherical linear interpolation and will draw curves.

But as for the other FFT stuff you mentioned I have no idea. I have not coded any FFT algorithms - I understand how they work but I have no experience with them. You might try a sine filter instead of linear interpolation or some type of logarithmic filter.

doubleanti
12-20-2004, 12:45 AM
Sand-drax, do you know for a fact that using interpolation will not result in new information (in fact it would just add residue)?

What about frequency sweeps in the lower range (within the original signal) that aren't initially covered? I think they exist, and that using FFT on interpolated data would produce finer resoution further down the spectrum, I may be incorrect.

And Bubba, right, I figure I might, that'd help the residue introduced with linearity (since we are sampling for sinusoids, not sawtooth or square wave composition). But the residue is neglible as it stands, on the order of one per hundred of signal.

One final point of interest and problem. My n-squared standard DFT did a good job of detecting frequencies which weren't specfically tested for. There is a smooth gradient. For example of sampling tested a 512 Hz signal, but a 514 Hz signal existed instead, you'd still have appreciable amplitude at the 512 Hz DFT entry. But with FFT, because of the algorithm I suppose, it inheriently cannot leave such nice 'spreading' of frequency.

Is that right? I'll try to get some graphs eventually. Thanks...

VirtualAce
12-20-2004, 01:30 AM
Linear interpolation will add more information and fill in the gaps. However if the sampling rate stays the same then all of the straight line data from sample 1 to sample 2 will be skipped and you still maintain the original sound. You must change the sample rate in order to account for the new data. But linear interpolation is not going to produce brand new data, it's only going to increase the resolution which may or may not be a good thing.

The problem here is that even though you have added new points it may actually decrease the quality of the sound. You are adding discrete samples over a period of time from sample 1 to sample 2. You are placing a sample into a position in time based on 'what should have been there' IF

1. The frequency between sample 1 and sample 2 did not change over time.
2. The sampling rate between sample 1 and sample 2 did not change over time.
3. No non-linearity was introduced into the mix between sample 1 and sample 2.

Now since the sampling rate is such that we move from sample 1 to sample 2 in X time, you must alter the rate in order to account for the new data....and/or shift all samples over by the amount of samples added between samples to accomodate for the new samples. In other words sample 1.10 now will be where sample 2 was in order to be able to simulate what was there. But if the sampling rate does not change, the overall wave is altered such that it no longer coincides with time correctly and it will quickly degrade into a mess. What you probably need is a low pass or high pass filter which does not add any samples but mutes or strengthens certain samples within a certain range.

doubleanti
12-21-2004, 09:27 PM
Thanks Bubba.

Actually, there is ample data in the 4 KHz range of what I'm looking at at the moment.

But my spectrograph is not quite right and I think it has something to do with my decoding. Do you know of anything to do with u-law decoding? I think perhaps my decoding algorithm is incorrect. Here it is:


public static int [] decode_AU ()
{

switch (encoding)
{
case 1: // 8-bit linear U-LAW
for (int i = 0; i < AU_data.length; i++)
{
if (i < 100)
System.out.print (AU_data[i] + " ");

AU_data[i] = (byte) (( sign ( (double) AU_data[i] ) * Math.pow ( Math.E,
((double) AU_data[i] / 128) * Math.log (1 + 255) /
sign( (double) AU_data[i]) )
- 1) * 128 / 255);
if (i < 100)
System.out.println (AU_data[i]);
/*
if (i < 128)
System.out.println (i-64 + " " +
(byte) (( sign ( (double) (i-64) ) * Math.pow ( Math.E,
((double) (i-64) / 128) * Math.log (1 + 255) /
sign( (double) (i-64)) )
- 1) * 128 / 255));
*/
}
break;
default:
break;
}

// copy to AU_data_i, integer version
AU_data_i = new int [AU_data.length];

for (int i = 0; i < AU_data.length; i++)
AU_data_i [i] = AU_data [i];

return (AU_data_i);
}

Here is my waveform and corresponding spectrograph.

The comparable waveform and spectrograph can be found in this tutorial at...

http://www.ling.lu.se/research/speechtutorial/tutorial.html

Oh, and you may find the u-law formula here:

http://en.wikipedia.org/wiki/Mu-law

Notes on what I think is the problem:

It seems likely that my encoding code is incorrect because, though I still get bands in the right range, there are like, 6 or so bands which is crazy, let alone they are hard to follow. I suppose they are residue of three or four bands, which are shown in the tutorial. For the decoding, I'm not quite sure if I'm supposed to offset the data (assuming it's given as positive only), or do I take it as 2's compliment data, or what. I'll try to recenter it correctly and view the code over and over again.

Thanks, I hope these pictures and code give you a little more for to work with Bubba. As soon as I get the right spectrograph, it's onto binary feature detection... yum!

Edit: PS, the vertical frequency range is from 0 KHz (bottom) to 4 KHz (top), haven't bothered to code that yet, on it. Thanks.

doubleanti
12-29-2004, 03:48 AM
Ah... oookay. Update (with new questions for those whom may bite!)

So, I tested it without remapping, and there is no change in frequencies (though amplitudes, which are relative so it doesn't matter). So that seems fine, and looking at the websites' spectrogram it seems it's fabricated.

So, I moved onto feature extraction (fancy term for extracting, well, binary features which characterize, and differentiate speech sounds), and for certain features it would be useful to know the derivative of amplitude of a certain frequency in the spectrogram.

So, I have a discrete sequences, how would I get a discrete sequence of the derivative of this sequence? Is it as simple as taking the differences between adjacent samples? That is some indication, but I don't know if it is correct exactly. Maybe larger delta t sample differences have a component in the derivative?

Thanks all!