Is voice recognition, text-to-speech, or anything like that possible in C++? Or should I just do it in assembly and/or machine? Just checking, thanks;-)!
Is voice recognition, text-to-speech, or anything like that possible in C++? Or should I just do it in assembly and/or machine? Just checking, thanks;-)!
One more thing. If you can do it in C++, can I get the code? Thanx;-)!
Yes, it is possible... it's been done! Could you do it? No. Could 99% of the population of the world do it? No. Could 99% of all programmers do it? I'm guessing no.
Code for something like this would span many many many many many pages and would be extremely complicated. If you really want, you may be able to find examples somewhere on the internet... why not try www.google.com?
> Could you do it? No
Check out the Windows SAPI interface. For further help, try posting to the discussion board on www.generation5.org.
If you want write the such a thing from scratch, then admittedly this would be hard.
Dooh - I've done it again. The full stop on the end the link above will stop the link from working. Try
www.generation5.org
>Is voice recognition, text-to-speech, or anything like that
>possible in C++?
Yes it is. Though it requires a lot of math, especially the theory of signals and systems, to understand the involved algorithms which are used in digital signal processing.
A very good resource:
www.dspguide.com
An easy way out would be to download the speach sdk from MS.
Thanks, the links didn't help me with my program, (I gave in and used a Microsoft product, the speech SDK), though they helped me with another part of program. I'll post it when I finished it. It's more assembly then C++.
>the links didn't help me with my program,
Perhaps you were too less specific.
>I'll post it when I finished it.
That would be nice.
I was talking about programming it from scrath.. which it seems to me, is what he was asking.
It also seemed to me that he was asking how to do speech processing in C++ from scratch. But "voice recognition, text-to-speech, or anything like that" is not very specific.
it's possible in linux (text to speech) with the festival packages
I'll be able to do Voice Recognition one day. I really will. And about that 99% of the programmers cant do it? Hah! My uncle did before 1995. Ofcourse you can do it, just put your mind to it, learn what it involves, learn the mathematics, and ofcourse the API for the soundcard(no, not Win32 API). What always makes me laugh is when I see things like "Microsoft is developing Voice Recognition! For year and years and its almost done! We've got the technology and the scientists to get the tasks done!" lol, because they make it sound like their so great and that it sso hard, but once you figure it out... how hard can anything be? Anyways, good luck with it man.
have limited knowledge on this but anyway when starting from scratch, it does seem hard since you would have to translate the input from the mic, presumeably measuring and then translating the wavelengths or something. how to do it, I have no idea.
think only with code.
write only with source.
>And about that 99% of the programmers cant do it?
I'm quite sure that most programmers can't write speech processing software from scratch. It requires a lot of knowledge and good understanding of digital signal processing and speech processing in specific. Most programmer's don't have that knowledge.
>Hah! My uncle did before 1995.
So?
>it does seem hard since you would have to translate the input
>from the mic, presumeably measuring and then translating the
>wavelengths or something. how to do it, I have no idea.
The process of recording audio with the soundcard is called sampling. If you sample for example with a frequency of 44 kHz, then each 1/44k seconds a sample will be taken. The sample can be seen as the current amplitude. The soundcard will provide you this value as N bits, where N depends on the type of soundcard.
So the original audio signal, which is analog, will be represented as a digital signal, which is in fact a long array of values (samples). Note that you cannot allways capture the full signal, if the sampling frequency is N kHz, then you can at most capture frequencies of N/2 kHz.
Note that this is the process for recording the speech. After recording the speech you can use the digital version to perform operations on it for recognition and that kind of things.
Speech syntesis, the process for text-to-speech, is something different. You can to translate the text to a phonetic alphabet. Then you can use the phonetic alphabet elements to link it with a sound. This is how primitive text-to-speech is done. Today there are much more sophisticated algorithms for doing this.