Beats-Per-Minute in .wav file

**loobian** · 12-15-2003

Hi all!

I am developing an application that should have a feature to cound the beats-per-minute in a .wav file.
Anyone knows how to do it? Any algorithms?

Thank you!

**DougDbug** · 12-15-2003

You could probably get something useful by using a couple of Digital Signal Processing filters.

This might be a bit like speech recognition... a human is better.

I did something like this in hardware (for a lighting system) once.

First, I'd try would make a low-pass filter to try to extract the beat from the bass. You'd have to experiment... There is beat information in the higher frequencies too.

Then, compare the instantanous level to the average (a moving average over several seconds). That will allow you to "extract" the peaks.

Finally, I'd use a very-low frequency filter (either low-pass or bandpass) to get the beat-frequency of a few beats per second.

This won't work perfectly, but it should get you started, and may work "well-enough". My hardware design didn't have any memory (totally analog processing) So, it didn't really "count" beats... it "extracted" beats, and would miss beats, and get extra beats, etc., But, it was "good enough" for dance lighting!

I have a couple of introductory books on DSP. One by Chris Cant, and one by Doug Coulter... I don't recall the titles, but both are good. The Coulter book is all about audio DSP and includes lots of source code, including the code for a full wave editor. The Cant book is more general. I don't remember if it includes code.

**Cat** · 12-15-2003

Originally posted by DougDbug
You could probably get something useful by using a couple of Digital Signal Processing filters.

This might be a bit like speech recognition... a human is better.

I did something like this in hardware (for a lighting system) once.

First, I'd try would make a low-pass filter to try to extract the beat from the bass. You'd have to experiment... There is beat information in the higher frequencies too.

Then, compare the instantanous level to the average (a moving average over several seconds). That will allow you to "extract" the peaks.

Finally, I'd use a very-low frequency filter (either low-pass or bandpass) to get the beat-frequency of a few beats per second.

This won't work perfectly, but it should get you started, and may work "well-enough". My hardware design didn't have any memory (totally analog processing) So, it didn't really "count" beats... it "extracted" beats, and would miss beats, and get extra beats, etc., But, it was "good enough" for dance lighting!

I have a couple of introductory books on DSP. One by Chris Cant, and one by Doug Coulter... I don't recall the titles, but both are good. The Coulter book is all about audio DSP and includes lots of source code, including the code for a full wave editor. The Cant book is more general. I don't remember if it includes code.

Well, a moving average actually itself has a pretty poor frequency response, you can probably do better. In fact, a moving average is itself a kind of LPF, but with some pretty big side lobes.

Does the beat detection need to be in real time, or can you "look ahead" on the sample? You can do some cool things with filters if you're allowed to use noncausal filters (filters that need future as well as past samples of the data).

Something you might try (a variation on the pan-tompkins algorithm for identifying heart beats on an ECG):

* LPF to remove all except the signal you are looking for. Don't use TOO narrow of a band, though, or you will get ringing.
* Square the signal -- this nonlinearly increases the peaks relative to the valleys
* Use an adaptive threshold -- if the signal is higher amplitude, you set a threshold higher.
* Keep a prediction for when the next beat should happen. If you miss the beat, go back to about when it should have been and "look for it" -- this will alter your threshold. This is also why it's a good idea to be calculating 2 or so beats in the future; you can go back and find a beat that you missed if need be.

**DougDbug** · 12-15-2003

Use an adaptive threshold

Yeah, that's why I suggested a moving average. I don't remember if my hardware design used average amplitude, or average peak level, but either should work... And, you don't want a mathmatical moving average of the samples. You need a moving-average of the amplitude (i.e the absolute value).

Cat,
It's nice being quoted, but you dont have to quote the entire freeking post!

**loobian** · 12-16-2003

Hi there!

Thank you for your responses.

Well, I will have to make analysis on .wav and .mp3 files, but it does not need to be in real-time.

But one think I still don't quite get- what actually is the beat? Is it just a peak in the signal, or drum or something else?

Thanks!

**Cat** · 12-16-2003

Well, it's the underlying rhythm of the song. It sometimes has a drum component, but it doesn't have to. A purely vocal song still has rhythm. It need not be a peak either.

**VirtualAce** · 12-16-2003

Best way to pick out the beat or rhythm is to plot the values on a graph and connect the points with lines. If the values are 8 bit then the midrange would be 128 which would be where the centerline would be - but they could be 8 bit signed values to. Same is true for 16-bit and 24-bit sound samples.

It should not be too hard to find the rhythm. One way to do it would be to over-exaggerate the sound sample values by scaling them. When you get a large separation you will know that either it is a drum beat or several sounds are mixing together to form a larger sound. Think of how a mixer works when you see the lights moving up and down to the rhythm of the song. Mixers measure the incoming signal via voltage and display the result.
More sound definitely means more voltage and more amplitude def means higher and larger numbers.

Let's take 1 sample of a track at any given time.
It is assumed this is recorded as 8 bit unsigned values.

Sound sample 1 - voice -> 125
Sound sample 2 - drums -> 204
Sound sample 3 - guitar -> 207
Sound sample 4 - keyboard -> 121

Average -> 164.25

Now let's drop the drums:

Sound sample 1 - voice -> 125
Sound sample 2 - drums -> 103
Sound sample 3 - guitar -> 207
Sound sample 4 - keyboard -> 121

Average -> 139

Notice that the drums are prob going to produce the largest spike in sound values. Guitars would prob be next in line, but it is possible that the drums could nearly reach the 255 limit for the sound - in which case they would be clipping.

But here's the point. If you know the sampling rate then you could easily determine the rhythm by analyzing the peaks and valleys. Even though rhythm does not always produce peaks, most of the time it does in my experience. Take for instance if you are playing the piano - the chords are going to normally make up the rhythm of the song - so the piano player emphasizes them by using more pressure on the keys producing louder sound.

**VirtualAce** · 12-16-2003

Got me thinking bout this real hard. Gonna post some code that might give you an idea.

Sound track is stored in Sound[] array.
Temp[] is scaled Sound track values

Code:

//Copy Sound to Temp - scale values
asm {
  push     ds
  lds        esi,[Sound]
  les        edi,[Temp]
  mov      ecx,[LengthOfSound]
  rep       movsd

  les        edi,[Temp]
  mov      ecx,[LengthOfSound]
}

SCALEVALUES:  
asm {
  mov      eax,[es:edi]
  shl        eax,1
  mov      [es:edi],eax
  dec       ecx                     //Decrement counter
  add       edi,2                  //advance one WORD
  loop      SCALEVALUES   //loop till ecx=0
}

//Compute Sample Differences
short SampleDifferences[LengthOfSound];
short Difference=0;
for (int i=0;i<LengthOfSound-1;i++)
{
  Difference=Temp[i]-Temp[i+1];
  //Truncate value
  if (Difference<=-32768) Difference=-32768;
  if (Difference>=32767) Difference=32767;
  
  SampleDifferences[i]=Difference;
}

//Linear interpolation=v1+f1*(v2-v1);
//We want to solve for f1
//So -v1=f1*(v2-v1)
//Divide by (v2-v1)
//-v1/(v2-v1)=f1

//does not test for divide by zero - potential crash here
#define FINDINTERP(v1,v2) ((-v1)/(v2-v1))

short Peaks[LengthOfSound];
short Valleys[LengthOfSound];
unsigned short numpeaks=0,numvalleys=0;

for (int i=0;i<LengthOfSound-1;i++)
{
  if (FINDINTERP(SampleDifferences[i],SampleDifferences[i+1])*100>400)
  {
    numpeaks++;
    Peaks[numpeaks]=i;
  } 
   else 
  {
    numvalleys++;
    Valleys[numvalleys]=i;
   }
}


for (int i=0;i<numpeaks;i++)
{
   printf("Peaks in sample differences are located at samples %ud\n",Peaks[i]);
}

for (int i=0;i<numvalleys;i++)
{
   printf("Valleys in sample differences are located at samples %ud\n",Valleys[i]);
}

If my algebra is correct this should compute the interpolation factor of two known values. If this value *100 is greater than 400 or .4 then the two values are separated by more than 40% which would be an indication of a peak or spike. If they are 40% or lower then I would say it is a valley. The peak locations are located in peaks[] and the valley locations are in valleys[]. You could use this to determine the rhythm of the selected track if you know the frequency of the sample.

The code might be buggy but you get the idea. I write it while sitting here so gimme a break here.

Sorry about the assembly but its just faster to do memory stuff in assembly than it is in C.

Thread: Beats-Per-Minute in .wav file

Thread Tools

Search Thread

Display

Beats-Per-Minute in .wav file

DSP

Re: DSP

Similar Threads

To find the memory leaks without using any tools

Making a LIB file from a DEF file for a DLL

System

Hmm....help me take a look at this: File Encryptor