Thread: Speech Analysis

  1. #1
    Registered User
    Join Date
    Jun 2010
    Posts
    16

    Speech Analysis

    Hello, I know this isn't 100% C related but I dont know where else to post this.
    In a book I have for a comp class at school we are designing a super simplistic speech recognition type software. It takes data from a speech utterance and then computes the average power magnitude.. zero crossings etc and compares it to a control set of data to see if the two match...
    it gives me a file that contains data about the audio file.. "zero" a wave file of someone saying zero.... however the initial data that it compares must be in text format. These are some values from the example file they give you:

    2.8320312e-02
    2.9296875e-02
    2.8808594e-02
    2.7832031e-02
    2.6855469e-02
    2.7343750e-02
    2.6367188e-02
    2.4414062e-02
    1.8554688e-02
    1.5625000e-02
    1.5625000e-02
    1.2207031e-02
    8.3007812e-03
    4.8828125e-03
    1.4648438e-03
    0.0000000e+00
    -2.9296875e-03
    -5.3710938e-03
    -6.8359375e-03
    -8.3007812e-03
    -9.2773438e-03
    -1.1718750e-02

    which are values of the amplitude of the audio signal... my main question and believe me i've searched for 3 days.. is how can i get this data from a wave file.. i've been trying to find a program that can extract the amplitudes of the wave into a text file or any such thing.. and I have had no luck at all.. is there a way to extract this info in C or what? any help pleaseee

  2. #2
    Guest Sebastiani's Avatar
    Join Date
    Aug 2001
    Location
    Waterloo, Texas
    Posts
    5,708
    Quote Originally Posted by rambo5330 View Post
    Hello, I know this isn't 100% C related but I dont know where else to post this.
    In a book I have for a comp class at school we are designing a super simplistic speech recognition type software. It takes data from a speech utterance and then computes the average power magnitude.. zero crossings etc and compares it to a control set of data to see if the two match...
    it gives me a file that contains data about the audio file.. "zero" a wave file of someone saying zero.... however the initial data that it compares must be in text format. These are some values from the example file they give you:

    2.8320312e-02
    2.9296875e-02
    2.8808594e-02
    2.7832031e-02
    2.6855469e-02
    2.7343750e-02
    2.6367188e-02
    2.4414062e-02
    1.8554688e-02
    1.5625000e-02
    1.5625000e-02
    1.2207031e-02
    8.3007812e-03
    4.8828125e-03
    1.4648438e-03
    0.0000000e+00
    -2.9296875e-03
    -5.3710938e-03
    -6.8359375e-03
    -8.3007812e-03
    -9.2773438e-03
    -1.1718750e-02

    which are values of the amplitude of the audio signal... my main question and believe me i've searched for 3 days.. is how can i get this data from a wave file.. i've been trying to find a program that can extract the amplitudes of the wave into a text file or any such thing.. and I have had no luck at all.. is there a way to extract this info in C or what? any help pleaseee
    I suppose you could start by studying the WAV file format?
    Code:
    #include <cmath>
    #include <complex>
    bool euler_flip(bool value)
    {
        return std::pow
        (
            std::complex<float>(std::exp(1.0)), 
            std::complex<float>(0, 1) 
            * std::complex<float>(std::atan(1.0)
            *(1 << (value + 2)))
        ).real() < 0;
    }

  3. #3
    Registered User
    Join Date
    Jan 2009
    Posts
    1,485
    Just seek 44 bytes in the file and there it is. Off course, you'll need to look up things like sampling frequency, bit depth and number of channels. But all that is in the previous 44 bytes, information about the header and related offsets will be available in the link above.

  4. #4
    Registered User
    Join Date
    Jun 2010
    Posts
    16
    Yes i've read a lot on extracting data from the wave file itself and there was way to much info it lost me.. i just want the amplitudes of the audio signal every 0.00125 seconds.... how do i get that out of the data chunk..its all in hex.. its just numbers and letters to me.. thats why i'm asking if there is a program.. if someone can explain exactly what to do in c to get that great.. if not i need a program... I dont know how to desypher 5000 lines of hex to become the amplitude at 0.00125 second intervals... so if it that easy then explain a little more please

  5. #5
    Registered User
    Join Date
    Jan 2009
    Posts
    1,485
    You don't need to desypher anything, just read the file into a buffer in binary using fread.
    If it's in hex or not depends on how you view it. The bit depth field in the header will tell you how large each sample is, for example a 16 bit sound have two byte samples.

    The amplitude is the value of the sample. So, if you know that the two bytes is one sample, you also know that offset 44-46 on the file is your first sample.

    Your list is floats, so read in data from the file, from offset 44, into an array of floats. But be sure to only read 2 bytes at a time. Or read it into a buffer of short ints, and convert it later by assigning each index to a float value.

    Now the interval, you said you wanted, 0.00125 seconds. In the header you will get the information based on samples per second, normal CD quality is 44100hz. The figure you have, gives you a sampling frequency of 800, which is very low for speech, the highest frequency you can recreate then is 400hz.

    Make sure the file is mono, or you need to take that into consideration as well.
    Last edited by Subsonics; 06-09-2010 at 05:31 PM.

  6. #6
    Registered User
    Join Date
    Jun 2010
    Posts
    16
    thanks, you rock! I will play with that but will almost garuntee have a few more questions along the way for you... thanks a lot great explanation

  7. #7
    Registered User
    Join Date
    Jun 2010
    Posts
    16
    so here is the code im using to mess around with trying to read this wave file and view its contents in C

    Code:
    int main(void)
    {
    	int k, x;
    	int clip_info[10000];
    	FILE * file_in;
    	
    	
    	file_in = fopen(FILENAME,"r");
    	
    	if (file_in == NULL) 
    	{
    		printf("Error Opening file.");
    	}
    	else
    	{
    		printf("success");
    		
    		for (k=0; k<=5001; k++)
    		{
    			if ((fread(&clip_info[k],2,5000,file_in))!= NULL)
    			 {
    				 fread(&clip_info[k],2,5000,file_in);
    			}
    	
    		
    		}
    	}
    
    		for (x=0; x<= 5000; x++)
    		printf("%d\n", clip_info[x]);
    
    
    	getchar();
    	getchar();
    	return 0;
    }

    in this particular situation it will start to read in numbers and then print them out but eventually they dissapear and all that is left on the screen is zero... when i change the last for loop to x<= 20 or so.. it will print a few numbers and keep them on the screen... any idea what is going on here or if im even trying to read this wave file properly?

  8. #8
    Registered User
    Join Date
    Jun 2010
    Posts
    16
    Okay so instead of printing it to the screen i wrote the values to a .txt file... now im getting somewhere.. i get numbers such as this

    -5963985
    -10616913
    -10027213
    -2687069
    -8126540
    -1245280
    13697149
    4522152
    3407916
    1179695
    2424840
    -1900494
    -8847489
    -9109674
    -4456527
    -7274604
    -7536781
    -2752549
    -7274586
    -852057
    -65506
    -2687002
    851942
    262200
    -3932182
    -19792047
    -8716559
    -1900607
    -8454214
    -6684806



    these seem like huge amplitudes...im not sure if its pulling numbers out correctly

  9. #9
    Registered User
    Join Date
    Jan 2009
    Posts
    1,485
    Quote Originally Posted by rambo5330 View Post

    in this particular situation it will start to read in numbers and then print them out but eventually they dissapear and all that is left on the screen is zero... when i change the last for loop to x<= 20 or so.. it will print a few numbers and keep them on the screen... any idea what is going on here or if im even trying to read this wave file properly?
    Your problem is in this section

    Code:
    		for (k=0; k<=5001; k++)
    		{
    			if ((fread(&clip_info[k],2,5000,file_in))!= NULL)
    			 {
    				 fread(&clip_info[k],2,5000,file_in);
    			}
    	
    		
    		}
    That condition is actually reading twice for each iteration. And you are telling fread() to read 5000 items of 2 bytes each into clip_info[k].

    Code:
                    for (k=0; k < 5000; k++)
                    {
                            if ((fread(&clip_info[k],2,1,file_in))!= 0)
                             {
    //                               fread(&clip_info[k],2,5000,file_in);
                            }
    Note: the section that is out commented. This is just a quick edit you need to solve the error checking of fread in a different way.
    Last edited by Subsonics; 06-10-2010 at 12:31 AM.

  10. #10
    Registered User
    Join Date
    Jun 2010
    Posts
    16
    yes i just do this now

    Code:
    		for (k=0; k<=6001; k++)
    		{
    			
    				 fread(&clip_info[k],6000,2,file_in);
    			
    	
    		
    		}
    	}
    	
    		file_out = fopen(FILENAME2,"w");
    	    for (x=0; x<= 6000; x++)
    			{
    				fprintf(file_out,"%ld\n",clip_info[x]);
    			   
    		}
    
    	fclose;
        fclose;
    	getchar();
    	getchar();
    	return 0;

    and it prints several values to a txt file and i can read it i get several signed integers in the data section ...much like the amplitude should be.. going from negative numbers to positive..however these integers are huge and i think it must be reading it wrong im not sure how to check but everything ive found about the wave format says..16-bit samples are stored as 2's-complement signed integers, ranging from -32768 to 32767.
    and as you can see from above my integer values are muchh larger than these

  11. #11
    Registered User
    Join Date
    Jan 2009
    Posts
    1,485
    BTW, where is the sound file from? Is it already in the correct format re sampling frequency and channels? If not I would pre process it in some audio editing software and re-save it.

  12. #12
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    Rambo, would you post a line or two from a wav file that you know the correct values for (there must an example someplace)?

    If you will post that, then we can look at your code more closely, and see what's going on.

  13. #13
    Registered User
    Join Date
    Jun 2010
    Posts
    16
    as of right now i am just using a wave file i recorded myself in windows sound recorder.. 16 bits mono... i do not have a specific wave file for which i know the data ... this is a good idea though and can't believe i've over looked that.. i will try to find examples of a wave file with its data extracted... may be difficult though..

  14. #14
    Registered User
    Join Date
    Jan 2009
    Posts
    1,485
    Quote Originally Posted by rambo5330 View Post
    ..however these integers are huge and i think it must be reading it wrong im not sure how to check but everything ive found about the wave format says..16-bit samples are stored as 2's-complement signed integers, ranging from -32768 to 32767.
    and as you can see from above my integer values are muchh larger than these
    Yes, but you are reading 2 chunks of 6000 bytes each. But your buffer is int and the samples are only shorts so you are getting two samples in each int. That's why they are huge, they are interpreted as one 32 bit value. Either read one sample at a time with a loop, or use short instead of int for the array like I mentioned before.
    Last edited by Subsonics; 06-10-2010 at 12:49 AM.

  15. #15
    Registered User
    Join Date
    Jan 2009
    Posts
    1,485
    Quote Originally Posted by rambo5330 View Post
    as of right now i am just using a wave file i recorded myself in windows sound recorder.. 16 bits mono... i do not have a specific wave file for which i know the data ... this is a good idea though and can't believe i've over looked that.. i will try to find examples of a wave file with its data extracted... may be difficult though..
    16 bit and mono sounds right but the sampling frequency conversion is tricky, so it's probably better to match it to the interval you mentioned earlier or you will get the wrong pitch.

    Extracting the data is what you are doing now, a wave file is nothing but linear pcm data after the first 44 byte header.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Beam or Frame Analysis for Structural Analysis
    By greenmania in forum C Programming
    Replies: 3
    Last Post: 05-05-2010, 05:40 PM
  2. Using MS Speech in BCPPB 6.0
    By MiraX33 in forum Windows Programming
    Replies: 0
    Last Post: 02-26-2006, 10:21 AM
  3. Dev-C++ Profile Analysis
    By Orborde in forum C++ Programming
    Replies: 0
    Last Post: 05-28-2005, 01:37 AM
  4. Speech coding, detecting pitch
    By subdene in forum C++ Programming
    Replies: 2
    Last Post: 11-24-2004, 09:35 PM
  5. rhetorical analysis and why it is for YOU!
    By doubleanti in forum A Brief History of Cprogramming.com
    Replies: 2
    Last Post: 03-30-2003, 02:11 PM