I have been trying to write a pitch detector within C++ for voiced speech. I currently get the sample speech, apply centre clipping for the entire speech data, and then calculate all the peaks. However, I am not sure whether I need to split the data into frames, and then apply a hamming window for example on each frame. Is this necessary? All of this is done within the time domain. Additionally, I wanted to use autocorrelation for determining the pitch period, and whether or not a pitch was a glottal pitch. All the examples of autocorrelation that I have seen use something called a lag, but they never explain how to compute this lag.
So is it necessary to use frame and windowing? And
How should I calculate the glottal period from the other peaks?
Thanks in advance for any advice….. Sub.