View Full Version : speech identification

01-07-2005, 06:53 PM
Hi all, I know this is quite a specialised topic but I was just wondering if any of the members might have a bit of advise for me if they have worked within the speech field before. I am writing a speaker identification system within C++. The feature I am using from the voiced data is the glottal period, which is unique to each speaker. The feature extraction module works greater, however I think I need to use some sort of stochastic model to create a probability as to whether speaker x, is speaker x based upon their live sample compared against all the others within the database. I was wondering whether I should use Hidden Markov Models for this, or distance templates. Each template that is produced from the extraction module contains the length of the glottal cycle in MS, and the time at which it occurred during enrolment. Any help on this would be appreciated… thanks. BTW, the system will be operating within Text-Dependant mode.

01-07-2005, 10:14 PM
Doubleanti is quite into that kind of thing. He posted a similar thread a while ago, and all the speech-nerds (no offence - we're all nerds :D) discussed it for a while. Do a board search. It was within the last couple of weeks, so it should be easy to find - it was in GD forum.

Brain Cell
01-07-2005, 11:03 PM
(no offence - we're all nerds :D)

nerd also nurd ( P ) Pronunciation Key (nűrd)
n. Slang
1. A foolish, inept, or unattractive person.
2. A person who is single-minded or accomplished in scientific or technical pursuits but is felt to be socially inept
source : www.dictionary.com

Not "all" of us are nerds :rolleyes:

01-07-2005, 11:35 PM
>unattractive person.

That makes 95% of the people that visit these boards, a nerd....face the facts.

I myself am very attractive. But I'm socially inept....so :(

01-08-2005, 12:24 PM
All right, thanks.

01-08-2005, 12:44 PM
I'm not socially inept or foolish, but I'm not sure how attractive I am :P

01-08-2005, 12:56 PM
I am not familiar with any of the terms you used, or if what I'm going to say is going to come across as stupid.

I personally haven't implemented this yet, but I sat down and talked with a grad student who has implemented voice recognition software. They used a neural network model. To train the neural network, you speak into a microphone and say a couple of words. A bunch of calculations are done, and then afterwards when you speak into the microphone it can recognize a key set of words. It worked 80% of the time.

I don't think an elegant solution exists outside of using neural networks (or some type of 'fuzzy logic' with thresholds which are basically just a special type of neural network anyway).