Is voice recognition, text-to-speech, or anything like that possible in C++? Or should I just do it in assembly and/or machine? Just checking, thanks;-)!
Printable View
Is voice recognition, text-to-speech, or anything like that possible in C++? Or should I just do it in assembly and/or machine? Just checking, thanks;-)!
One more thing. If you can do it in C++, can I get the code? Thanx;-)!
Yes, it is possible... it's been done! Could you do it? No. Could 99% of the population of the world do it? No. Could 99% of all programmers do it? I'm guessing no.
Code for something like this would span many many many many many pages and would be extremely complicated. If you really want, you may be able to find examples somewhere on the internet... why not try www.google.com?
> Could you do it? No
Check out the Windows SAPI interface. For further help, try posting to the discussion board on www.generation5.org.
If you want write the such a thing from scratch, then admittedly this would be hard.
Dooh - I've done it again. The full stop on the end the link above will stop the link from working. Try
www.generation5.org
>Is voice recognition, text-to-speech, or anything like that
>possible in C++?
Yes it is. Though it requires a lot of math, especially the theory of signals and systems, to understand the involved algorithms which are used in digital signal processing.
A very good resource:
www.dspguide.com
An easy way out would be to download the speach sdk from MS.
Thanks, the links didn't help me with my program, (I gave in and used a Microsoft product, the speech SDK), though they helped me with another part of program. I'll post it when I finished it. It's more assembly then C++.
>the links didn't help me with my program,
Perhaps you were too less specific.
>I'll post it when I finished it.
That would be nice.
I was talking about programming it from scrath.. which it seems to me, is what he was asking.
It also seemed to me that he was asking how to do speech processing in C++ from scratch. But "voice recognition, text-to-speech, or anything like that" is not very specific.
it's possible in linux (text to speech) with the festival packages
I'll be able to do Voice Recognition one day. I really will. And about that 99% of the programmers cant do it? Hah! My uncle did before 1995. Ofcourse you can do it, just put your mind to it, learn what it involves, learn the mathematics, and ofcourse the API for the soundcard(no, not Win32 API). What always makes me laugh is when I see things like "Microsoft is developing Voice Recognition! For year and years and its almost done! We've got the technology and the scientists to get the tasks done!" lol, because they make it sound like their so great and that it sso hard, but once you figure it out... how hard can anything be? Anyways, good luck with it man.
have limited knowledge on this but anyway when starting from scratch, it does seem hard since you would have to translate the input from the mic, presumeably measuring and then translating the wavelengths or something. how to do it, I have no idea.
>And about that 99% of the programmers cant do it?
I'm quite sure that most programmers can't write speech processing software from scratch. It requires a lot of knowledge and good understanding of digital signal processing and speech processing in specific. Most programmer's don't have that knowledge.
>Hah! My uncle did before 1995.
So?
>it does seem hard since you would have to translate the input
>from the mic, presumeably measuring and then translating the
>wavelengths or something. how to do it, I have no idea.
The process of recording audio with the soundcard is called sampling. If you sample for example with a frequency of 44 kHz, then each 1/44k seconds a sample will be taken. The sample can be seen as the current amplitude. The soundcard will provide you this value as N bits, where N depends on the type of soundcard.
So the original audio signal, which is analog, will be represented as a digital signal, which is in fact a long array of values (samples). Note that you cannot allways capture the full signal, if the sampling frequency is N kHz, then you can at most capture frequencies of N/2 kHz.
Note that this is the process for recording the speech. After recording the speech you can use the digital version to perform operations on it for recognition and that kind of things.
Speech syntesis, the process for text-to-speech, is something different. You can to translate the text to a phonetic alphabet. Then you can use the phonetic alphabet elements to link it with a sound. This is how primitive text-to-speech is done. Today there are much more sophisticated algorithms for doing this.
Maybe, however how many programmers in the world have a need for voice recognition? I'm sure that if you had a goal set to do it then you would figure it out. Then along the way make updates, since there are so many ways to sample every frequency & frequency change to figure out what you are saying then there is not just one way to do it. You could use plenty of different algorithms that are equally comparable. But in the end it is just comparisons to frequencys and their changes in order to estimate what word the user is attempting to say. So once you can gather sound into its frequency's then do the comparisons I'd say your off to a real good start.
>Maybe, however how many programmers in the world have a
>need for voice recognition?
A very little part of all programmers are working in the field of speech processing.
>So once you can gather sound into its frequency's then do the
>comparisons I'd say your off to a real good start.
From a mathematical/technical point of view it's not hard to get the frequency spectrum of a signal and let some filters work on it. But it are the filters which are the hard part of speech processing, no voice is equal, so you would need adaptive filters and a good knowledge about human speech.
The same is with audio coding. Compression algorithms for audio and for data are not hard to implement. But it takes a lot of testing to get a good quality audio coder.
Yes, getting an adaptive filter to work successfully would take some practice and time. I think how it would have to be done is to take an average of what frequency change each character said has; for example: if you say "Hello" the average person that speaks english should have approximately the same frequency change between the H and the e regardless of the pitch of their voice. At least thats how I think it would be done, then ofcourse there may be some software-like troublshooting to try to find out how fast they speak etc... and how long each character the person speaks is.
That's great, I'm so proud of you.. but right now you fall into the 99% category who cannot program this from scratch.Quote:
I'll be able to do Voice Recognition one day. I really will.
Wow, I'm so proud of your uncle. But what relevance does this have to the assumption that 99% of all programmers cannot do it?Quote:
And about that 99% of the programmers cant do it? Hah! My uncle did before 1995.
Mmm, this reminds me of all the newbies who come and want to program an OS, or a game which ranks up with StarCraft or Quake 3. Sure, if I decided today that I wanted to program a text-to-speech program, i'd eventually finish... but until then; guess what? I side with the 99% who cannot do it.Quote:
Ofcourse you can do it, just put your mind to it, learn what it involves, learn the mathematics, and ofcourse the API for the soundcard(no, not Win32 API).
sure why not
lol, what relevance? Well if you read the board before making your own assumptions you might have re-phrased that.
All of that message was in regards to the person saying "Could you do it? No." because that is an extremely negative attitude.
You know what else makes me mad? When people say things like this:
>"Mmm, this reminds me of all the newbies who come and want to program an OS"
It reminds you of a newb eh? Good for you I'm so proud of you too!! But what does your comment have to do with relevance? My first message was just aimed at giving him confidence then the rest of my posts were just general conversation. But if shooting down other peoples posts gives you that warm feeling inside... just go ahead and continue.
You're new around here aren't ya Xei? Stick around for awhile, I think you'll be reading many posts like this one. I'm not going to get into a flame war with you, because that's exactly where this is headed.
Give him all the confidence in the world... but I'll just prepare him for when he returns to reality.
Yes I am new. I am not trying to be mean, but I dont think you read the post very well. But oh well, whatever, maybe we both just don't understand eachother or something ;). Reality doesnt say that he can't do it, and that is what my whole post was about.
-Confidence breeds success.
a good idea is to learn the way .wav files are written and check to see if the default file is within 50 ascii codes of the ones the... something usmthin, i dunno, delete the post if u think its spam lol
I wrote a text to speech program just the other day here is a section of the code
Code:
cout << "Give me your word";
cin >> text_string;
if (text_string=="lostminds")
Play_wav("lostminds.wav");
whats the lib for Play_wav?
Tis from the lostminds library :)