Thread: Beats-Per-Minute in .wav file

  1. #1
    Registered User
    Join Date
    Oct 2001
    Posts
    104

    Beats-Per-Minute in .wav file

    Hi all!

    I am developing an application that should have a feature to cound the beats-per-minute in a .wav file.
    Anyone knows how to do it? Any algorithms?

    Thank you!
    Ilia Yordanov,
    http://www.cpp-home.com ; C++ Resources

  2. #2
    Hardware Engineer
    Join Date
    Sep 2001
    Posts
    1,398

    DSP

    You could probably get something useful by using a couple of Digital Signal Processing filters.

    This might be a bit like speech recognition... a human is better.

    I did something like this in hardware (for a lighting system) once.

    First, I'd try would make a low-pass filter to try to extract the beat from the bass. You'd have to experiment... There is beat information in the higher frequencies too.

    Then, compare the instantanous level to the average (a moving average over several seconds). That will allow you to "extract" the peaks.

    Finally, I'd use a very-low frequency filter (either low-pass or bandpass) to get the beat-frequency of a few beats per second.

    This won't work perfectly, but it should get you started, and may work "well-enough". My hardware design didn't have any memory (totally analog processing) So, it didn't really "count" beats... it "extracted" beats, and would miss beats, and get extra beats, etc., But, it was "good enough" for dance lighting!

    I have a couple of introductory books on DSP. One by Chris Cant, and one by Doug Coulter... I don't recall the titles, but both are good. The Coulter book is all about audio DSP and includes lots of source code, including the code for a full wave editor. The Cant book is more general. I don't remember if it includes code.
    Last edited by DougDbug; 12-15-2003 at 05:02 PM.

  3. #3
    Registered User
    Join Date
    May 2003
    Posts
    1,619

    Re: DSP

    Originally posted by DougDbug
    You could probably get something useful by using a couple of Digital Signal Processing filters.

    This might be a bit like speech recognition... a human is better.

    I did something like this in hardware (for a lighting system) once.

    First, I'd try would make a low-pass filter to try to extract the beat from the bass. You'd have to experiment... There is beat information in the higher frequencies too.

    Then, compare the instantanous level to the average (a moving average over several seconds). That will allow you to "extract" the peaks.

    Finally, I'd use a very-low frequency filter (either low-pass or bandpass) to get the beat-frequency of a few beats per second.

    This won't work perfectly, but it should get you started, and may work "well-enough". My hardware design didn't have any memory (totally analog processing) So, it didn't really "count" beats... it "extracted" beats, and would miss beats, and get extra beats, etc., But, it was "good enough" for dance lighting!

    I have a couple of introductory books on DSP. One by Chris Cant, and one by Doug Coulter... I don't recall the titles, but both are good. The Coulter book is all about audio DSP and includes lots of source code, including the code for a full wave editor. The Cant book is more general. I don't remember if it includes code.
    Well, a moving average actually itself has a pretty poor frequency response, you can probably do better. In fact, a moving average is itself a kind of LPF, but with some pretty big side lobes.

    Does the beat detection need to be in real time, or can you "look ahead" on the sample? You can do some cool things with filters if you're allowed to use noncausal filters (filters that need future as well as past samples of the data).

    Something you might try (a variation on the pan-tompkins algorithm for identifying heart beats on an ECG):

    * LPF to remove all except the signal you are looking for. Don't use TOO narrow of a band, though, or you will get ringing.
    * Square the signal -- this nonlinearly increases the peaks relative to the valleys
    * Use an adaptive threshold -- if the signal is higher amplitude, you set a threshold higher.
    * Keep a prediction for when the next beat should happen. If you miss the beat, go back to about when it should have been and "look for it" -- this will alter your threshold. This is also why it's a good idea to be calculating 2 or so beats in the future; you can go back and find a beat that you missed if need be.
    You ever try a pink golf ball, Wally? Why, the wind shear on a pink ball alone can take the head clean off a 90 pound midget at 300 yards.

  4. #4
    Hardware Engineer
    Join Date
    Sep 2001
    Posts
    1,398
    Use an adaptive threshold
    Yeah, that's why I suggested a moving average. I don't remember if my hardware design used average amplitude, or average peak level, but either should work... And, you don't want a mathmatical moving average of the samples. You need a moving-average of the amplitude (i.e the absolute value).

    Cat,
    It's nice being quoted, but you dont have to quote the entire freeking post!
    Last edited by DougDbug; 12-15-2003 at 07:57 PM.

  5. #5
    Registered User
    Join Date
    Oct 2001
    Posts
    104
    Hi there!

    Thank you for your responses.

    Well, I will have to make analysis on .wav and .mp3 files, but it does not need to be in real-time.

    But one think I still don't quite get- what actually is the beat? Is it just a peak in the signal, or drum or something else?

    Thanks!
    Ilia Yordanov,
    http://www.cpp-home.com ; C++ Resources

  6. #6
    Registered User
    Join Date
    May 2003
    Posts
    1,619
    Well, it's the underlying rhythm of the song. It sometimes has a drum component, but it doesn't have to. A purely vocal song still has rhythm. It need not be a peak either.
    You ever try a pink golf ball, Wally? Why, the wind shear on a pink ball alone can take the head clean off a 90 pound midget at 300 yards.

  7. #7
    Registered User VirtualAce's Avatar
    Join Date
    Aug 2001
    Posts
    9,607
    Best way to pick out the beat or rhythm is to plot the values on a graph and connect the points with lines. If the values are 8 bit then the midrange would be 128 which would be where the centerline would be - but they could be 8 bit signed values to. Same is true for 16-bit and 24-bit sound samples.

    It should not be too hard to find the rhythm. One way to do it would be to over-exaggerate the sound sample values by scaling them. When you get a large separation you will know that either it is a drum beat or several sounds are mixing together to form a larger sound. Think of how a mixer works when you see the lights moving up and down to the rhythm of the song. Mixers measure the incoming signal via voltage and display the result.
    More sound definitely means more voltage and more amplitude def means higher and larger numbers.

    Let's take 1 sample of a track at any given time.
    It is assumed this is recorded as 8 bit unsigned values.


    Sound sample 1 - voice -> 125
    Sound sample 2 - drums -> 204
    Sound sample 3 - guitar -> 207
    Sound sample 4 - keyboard -> 121

    Average -> 164.25

    Now let's drop the drums:

    Sound sample 1 - voice -> 125
    Sound sample 2 - drums -> 103
    Sound sample 3 - guitar -> 207
    Sound sample 4 - keyboard -> 121

    Average -> 139

    Notice that the drums are prob going to produce the largest spike in sound values. Guitars would prob be next in line, but it is possible that the drums could nearly reach the 255 limit for the sound - in which case they would be clipping.


    But here's the point. If you know the sampling rate then you could easily determine the rhythm by analyzing the peaks and valleys. Even though rhythm does not always produce peaks, most of the time it does in my experience. Take for instance if you are playing the piano - the chords are going to normally make up the rhythm of the song - so the piano player emphasizes them by using more pressure on the keys producing louder sound.

  8. #8
    Registered User VirtualAce's Avatar
    Join Date
    Aug 2001
    Posts
    9,607
    Got me thinking bout this real hard. Gonna post some code that might give you an idea.


    Sound track is stored in Sound[] array.
    Temp[] is scaled Sound track values

    Code:
    //Copy Sound to Temp - scale values
    asm {
      push     ds
      lds        esi,[Sound]
      les        edi,[Temp]
      mov      ecx,[LengthOfSound]
      rep       movsd
    
      les        edi,[Temp]
      mov      ecx,[LengthOfSound]
    }
    
    SCALEVALUES:  
    asm {
      mov      eax,[es:edi]
      shl        eax,1
      mov      [es:edi],eax
      dec       ecx                     //Decrement counter
      add       edi,2                  //advance one WORD
      loop      SCALEVALUES   //loop till ecx=0
    }
    
    //Compute Sample Differences
    short SampleDifferences[LengthOfSound];
    short Difference=0;
    for (int i=0;i<LengthOfSound-1;i++)
    {
      Difference=Temp[i]-Temp[i+1];
      //Truncate value
      if (Difference<=-32768) Difference=-32768;
      if (Difference>=32767) Difference=32767;
      
      SampleDifferences[i]=Difference;
    }
    
    //Linear interpolation=v1+f1*(v2-v1);
    //We want to solve for f1
    //So -v1=f1*(v2-v1)
    //Divide by (v2-v1)
    //-v1/(v2-v1)=f1
    
    //does not test for divide by zero - potential crash here
    #define FINDINTERP(v1,v2) ((-v1)/(v2-v1))
    
    short Peaks[LengthOfSound];
    short Valleys[LengthOfSound];
    unsigned short numpeaks=0,numvalleys=0;
    
    for (int i=0;i<LengthOfSound-1;i++)
    {
      if (FINDINTERP(SampleDifferences[i],SampleDifferences[i+1])*100>400)
      {
        numpeaks++;
        Peaks[numpeaks]=i;
      } 
       else 
      {
        numvalleys++;
        Valleys[numvalleys]=i;
       }
    }
    
    
    for (int i=0;i<numpeaks;i++)
    {
       printf("Peaks in sample differences are located at samples %ud\n",Peaks[i]);
    }
    
    for (int i=0;i<numvalleys;i++)
    {
       printf("Valleys in sample differences are located at samples %ud\n",Valleys[i]);
    }
    If my algebra is correct this should compute the interpolation factor of two known values. If this value *100 is greater than 400 or .4 then the two values are separated by more than 40% which would be an indication of a peak or spike. If they are 40% or lower then I would say it is a valley. The peak locations are located in peaks[] and the valley locations are in valleys[]. You could use this to determine the rhythm of the selected track if you know the frequency of the sample.

    The code might be buggy but you get the idea. I write it while sitting here so gimme a break here.

    Sorry about the assembly but its just faster to do memory stuff in assembly than it is in C.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. To find the memory leaks without using any tools
    By asadullah in forum C Programming
    Replies: 2
    Last Post: 05-12-2008, 07:54 AM
  2. Making a LIB file from a DEF file for a DLL
    By JMPACS in forum C++ Programming
    Replies: 0
    Last Post: 08-02-2003, 08:19 PM
  3. System
    By drdroid in forum C++ Programming
    Replies: 3
    Last Post: 06-28-2002, 10:12 PM
  4. Hmm....help me take a look at this: File Encryptor
    By heljy in forum C Programming
    Replies: 3
    Last Post: 03-23-2002, 10:57 AM