Speech Recognition - w/ attachment
Wondering if someone could help me with a few questions I have about Speech Recognition. I'm using the kinect as my input and had some code that I wanted to build on.
Code:
namespace WpfApplication1
{
class Program
{
private const string RecognizerId = "SR_MS_en-US_Kinect_10.0";
static void MainWindow (string[] args)
{
using (var source = new KinectAudioSource())
{
source.FeatureMode = true;
source.AutomaticGainControl = false; //Important to turn this off for speech recognition
source.SystemMode = SystemMode.OptibeamArrayOnly; //No AEC for this sample
RecognizerInfo ri = SpeechRecognitionEngine.InstalledRecognizers().Where(r => r.Id == RecognizerId).FirstOrDefault();
if (ri == null)
{
Console.WriteLine("Could not find speech recognizer: {0}. Please refer to the sample requirements.", RecognizerId);
return;
}
Console.WriteLine("Using: {0}", ri.Name);
using (var sre = new SpeechRecognitionEngine(ri.Id))
{
var colors = new Choices();
colors.Add("red");
colors.Add("green");
colors.Add("blue");
colors.Add("yellow");
colors.Add("orange");
colors.Add("brown");
colors.Add("black");
colors.Add("white");
colors.Add("pink");
colors.Add("go home");
colors.Add("Moose");
colors.Add("computer");
var gb = new GrammarBuilder();
//Specify the culture to match the recognizer in case we are running in a different culture.
gb.Culture = ri.Culture;
gb.Append(colors);
// Create the actual Grammar instance, and then load it into the speech recognizer.
var g = new Grammar(gb);
sre.LoadGrammar(g);
sre.SpeechRecognized += SreSpeechRecognized;
sre.SpeechHypothesized += SreSpeechHypothesized;
sre.SpeechRecognitionRejected += SreSpeechRecognitionRejected;
using (Stream s = source.Start())
{
sre.SetInputToAudioStream(s,
new SpeechAudioFormatInfo(
EncodingFormat.Pcm, 16000, 16, 1,
32000, 2, null));
Console.WriteLine("Recognizing. Say: 'red', 'green' or 'blue'. Press ENTER to stop");
sre.RecognizeAsync(RecognizeMode.Multiple);
Console.ReadLine();
Console.WriteLine("Stopping recognizer ...");
sre.RecognizeAsyncStop();
}
}
}
}
static void SreSpeechRecognitionRejected(object sender, SpeechRecognitionRejectedEventArgs e)
{
Console.WriteLine("\nSpeech Rejected");
if (e.Result != null)
DumpRecordedAudio(e.Result.Audio);
}
static void SreSpeechHypothesized(object sender, SpeechHypothesizedEventArgs e)
{
Console.Write("\rSpeech Hypothesized: \t{0}", e.Result.Text);
}
static void SreSpeechRecognized(object sender, SpeechRecognizedEventArgs e)
{
//This first release of the Kinect language pack doesn't have a reliable confidence model, so
//we don't use e.Result.Confidence here.
Console.WriteLine("\nSpeech Recognized: \t{0}", e.Result.Text);
}
private static void DumpRecordedAudio(RecognizedAudio audio)
{
if (audio == null) return;
int fileId = 0;
string filename;
while (File.Exists((filename = "RetainedAudio_" + fileId + ".wav")))
fileId++;
Console.WriteLine("\nWriting file: {0}", filename);
using (var file = new FileStream(filename, System.IO.FileMode.CreateNew))
audio.WriteToWaveStream(file);
}
}
}
1. How could the the voice recognition grammar phrases (i.e green, blue, go home, moose, computer) produce confirmation when recognized and cause a response, whether verbal response or actions.
i.e
Code:
var colors = new Choices();
colors.Add("red");
colors.Add("pink");
colors.Add("go home");
colors.Add("Moose");
colors.Add("computer");
Code:
Console.WriteLine("Recognizing. Say: 'red', 'green' or 'blue'. Press ENTER to stop");
sre.RecognizeAsync(RecognizeMode.Multiple);
Console.ReadLine();
Console.WriteLine("Stopping recognizer ...");
sre.RecognizeAsyncStop();
when recognized the program prints the recognized phrase and a confidence model. How would I use this data to offer a verbal response rather than just printing? Any help would be greatly appreciated.
i.e
Code:
static void SreSpeechRecognitionRejected(object sender, SpeechRecognitionRejectedEventArgs e)
{
Console.WriteLine("\nSpeech Rejected");
if (e.Result != null)
DumpRecordedAudio(e.Result.Audio);
}
static void SreSpeechHypothesized(object sender, SpeechHypothesizedEventArgs e)
{
Console.Write("\rSpeech Hypothesized: \t{0}", e.Result.Text);
}
static void SreSpeechRecognized(object sender, SpeechRecognizedEventArgs e)
{
//This first release of the Kinect language pack doesn't have a reliable confidence model, so
//we don't use e.Result.Confidence here.
Console.WriteLine("\nSpeech Recognized: \t{0}", e.Result.Text);
Would it be possible to add more grammar variables that were not included in colors? Would this be building a new grammar instance?
Code:
// Create the actual Grammar instance, and then load it into the speech recognizer.
var g = new Grammar(gb);
sre.LoadGrammar(g);
sre.SpeechRecognized += SreSpeechRecognized;
sre.SpeechHypothesized += SreSpeechHypothesized;
sre.SpeechRecognitionRejected += SreSpeechRecognitionRejected;
using (Stream s = source.Start())
{
sre.SetInputToAudioStream(s,
new SpeechAudioFormatInfo(
EncodingFormat.Pcm, 16000, 16, 1,
32000, 2, null));
Console.WriteLine("Recognizing. Say: 'red', 'green' or 'blue'. Press ENTER to stop");
sre.RecognizeAsync(RecognizeMode.Multiple);
Console.ReadLine();
Console.WriteLine("Stopping recognizer ...");
sre.RecognizeAsyncStop();