Speech Recognition - w/ attachment

Wondering if someone could help me with a few questions I have about Speech Recognition. I'm using the kinect as my input and had some code that I wanted to build on.

Code:

namespace WpfApplication1 { class Program { private const string RecognizerId = "SR_MS_en-US_Kinect_10.0"; static void MainWindow (string[] args) { using (var source = new KinectAudioSource()) { source.FeatureMode = true; source.AutomaticGainControl = false; //Important to turn this off for speech recognition source.SystemMode = SystemMode.OptibeamArrayOnly; //No AEC for this sample RecognizerInfo ri = SpeechRecognitionEngine.InstalledRecognizers().Where(r => r.Id == RecognizerId).FirstOrDefault(); if (ri == null) { Console.WriteLine("Could not find speech recognizer: {0}. Please refer to the sample requirements.", RecognizerId); return; } Console.WriteLine("Using: {0}", ri.Name); using (var sre = new SpeechRecognitionEngine(ri.Id)) { var colors = new Choices(); colors.Add("red"); colors.Add("green"); colors.Add("blue"); colors.Add("yellow"); colors.Add("orange"); colors.Add("brown"); colors.Add("black"); colors.Add("white"); colors.Add("pink"); colors.Add("go home"); colors.Add("Moose"); colors.Add("computer"); var gb = new GrammarBuilder(); //Specify the culture to match the recognizer in case we are running in a different culture. gb.Culture = ri.Culture; gb.Append(colors); // Create the actual Grammar instance, and then load it into the speech recognizer. var g = new Grammar(gb); sre.LoadGrammar(g); sre.SpeechRecognized += SreSpeechRecognized; sre.SpeechHypothesized += SreSpeechHypothesized; sre.SpeechRecognitionRejected += SreSpeechRecognitionRejected; using (Stream s = source.Start()) { sre.SetInputToAudioStream(s, new SpeechAudioFormatInfo( EncodingFormat.Pcm, 16000, 16, 1, 32000, 2, null)); Console.WriteLine("Recognizing. Say: 'red', 'green' or 'blue'. Press ENTER to stop"); sre.RecognizeAsync(RecognizeMode.Multiple); Console.ReadLine(); Console.WriteLine("Stopping recognizer ..."); sre.RecognizeAsyncStop(); } } } } static void SreSpeechRecognitionRejected(object sender, SpeechRecognitionRejectedEventArgs e) { Console.WriteLine("\nSpeech Rejected"); if (e.Result != null) DumpRecordedAudio(e.Result.Audio); } static void SreSpeechHypothesized(object sender, SpeechHypothesizedEventArgs e) { Console.Write("\rSpeech Hypothesized: \t{0}", e.Result.Text); } static void SreSpeechRecognized(object sender, SpeechRecognizedEventArgs e) { //This first release of the Kinect language pack doesn't have a reliable confidence model, so //we don't use e.Result.Confidence here. Console.WriteLine("\nSpeech Recognized: \t{0}", e.Result.Text); } private static void DumpRecordedAudio(RecognizedAudio audio) { if (audio == null) return; int fileId = 0; string filename; while (File.Exists((filename = "RetainedAudio_" + fileId + ".wav"))) fileId++; Console.WriteLine("\nWriting file: {0}", filename); using (var file = new FileStream(filename, System.IO.FileMode.CreateNew)) audio.WriteToWaveStream(file); } } }

1. How could the the voice recognition grammar phrases (i.e green, blue, go home, moose, computer) produce confirmation when recognized and cause a response, whether verbal response or actions.

i.e

Code:

var colors = new Choices(); colors.Add("red"); colors.Add("pink"); colors.Add("go home"); colors.Add("Moose"); colors.Add("computer");

Code:

Console.WriteLine("Recognizing. Say: 'red', 'green' or 'blue'. Press ENTER to stop"); sre.RecognizeAsync(RecognizeMode.Multiple); Console.ReadLine(); Console.WriteLine("Stopping recognizer ..."); sre.RecognizeAsyncStop();

when recognized the program prints the recognized phrase and a confidence model. How would I use this data to offer a verbal response rather than just printing? Any help would be greatly appreciated.

i.e

Code:

static void SreSpeechRecognitionRejected(object sender, SpeechRecognitionRejectedEventArgs e) { Console.WriteLine("\nSpeech Rejected"); if (e.Result != null) DumpRecordedAudio(e.Result.Audio); } static void SreSpeechHypothesized(object sender, SpeechHypothesizedEventArgs e) { Console.Write("\rSpeech Hypothesized: \t{0}", e.Result.Text); } static void SreSpeechRecognized(object sender, SpeechRecognizedEventArgs e) { //This first release of the Kinect language pack doesn't have a reliable confidence model, so //we don't use e.Result.Confidence here. Console.WriteLine("\nSpeech Recognized: \t{0}", e.Result.Text);

Would it be possible to add more grammar variables that were not included in colors? Would this be building a new grammar instance?

Code:

// Create the actual Grammar instance, and then load it into the speech recognizer. var g = new Grammar(gb); sre.LoadGrammar(g); sre.SpeechRecognized += SreSpeechRecognized; sre.SpeechHypothesized += SreSpeechHypothesized; sre.SpeechRecognitionRejected += SreSpeechRecognitionRejected; using (Stream s = source.Start()) { sre.SetInputToAudioStream(s, new SpeechAudioFormatInfo( EncodingFormat.Pcm, 16000, 16, 1, 32000, 2, null)); Console.WriteLine("Recognizing. Say: 'red', 'green' or 'blue'. Press ENTER to stop"); sre.RecognizeAsync(RecognizeMode.Multiple); Console.ReadLine(); Console.WriteLine("Stopping recognizer ..."); sre.RecognizeAsyncStop();

Looks nothing like C, moved to C#

Quote:

Originally Posted by Salem

Looks nothing like C, moved to C#

Thanks

Well since I'm still researching it also, as I come across possible solutions, ill include them here. Would the code below answer the question about response/action above if they were merged?

Code:

Namespace ShapeGame_Speech Public Class Recognizer Public Enum Verbs None = 0 Bigger Biggest Smaller Smallest More Fewer Faster Slower Colorize RandomColors DoShapes ShapesAndColors Reset Pause [Resume] End Enum Private Structure WhatSaid Public verb As Verbs Public shape As PolyType Public color As System.Windows.Media.Color End Structure Private GameplayPhrases As New Dictionary(Of String, WhatSaid) From {{"Faster", New WhatSaid With {.verb=Verbs.Faster}}, {"Slower", New WhatSaid With {.verb=Verbs.Slower}}, {"Bigger", New WhatSaid With {.verb=Verbs.Bigger}}, {"Bigger Shapes", New WhatSaid With {.verb=Verbs.Bigger}}, {"Larger", New WhatSaid With {.verb=Verbs.Bigger}}, {"Huge", New WhatSaid With {.verb=Verbs.Biggest}}, {"Giant", New WhatSaid With {.verb=Verbs.Biggest}}, {"Biggest", New WhatSaid With {.verb=Verbs.Biggest}}, {"Super Big", New WhatSaid With {.verb=Verbs.Biggest}}, {"Smaller", New WhatSaid With {.verb=Verbs.Smaller}}, {"Tiny", New WhatSaid With {.verb=Verbs.Smallest}}, {"Super Small", New WhatSaid With {.verb=Verbs.Smallest}}, {"Smallest", New WhatSaid With {.verb=Verbs.Smallest}}, {"More Shapes", New WhatSaid With {.verb=Verbs.More}}, {"More", New WhatSaid With {.verb=Verbs.More}}, {"Less", New WhatSaid With {.verb=Verbs.Fewer}}, {"Fewer", New WhatSaid With {.verb=Verbs.Fewer}}} Private ShapePhrases As New Dictionary(Of String, WhatSaid) From {{"7 Pointed Stars", New WhatSaid With {.verb=Verbs.DoShapes, .shape=PolyType.Star7}}, {"Triangles", New WhatSaid With {.verb=Verbs.DoShapes, .shape=PolyType.Triangle}}, {"Squares", New WhatSaid With {.verb=Verbs.DoShapes, .shape=PolyType.Square}}, {"Hexagons", New WhatSaid With {.verb=Verbs.DoShapes, .shape=PolyType.Hex}}, {"Pentagons", New WhatSaid With {.verb=Verbs.DoShapes, .shape=PolyType.Pentagon}}, {"Stars", New WhatSaid With {.verb=Verbs.DoShapes, .shape=PolyType.Star}}, {"Circles", New WhatSaid With {.verb=Verbs.DoShapes, .shape=PolyType.Circle}}, {"Bubbles", New WhatSaid With {.verb=Verbs.DoShapes, .shape=PolyType.Bubble}}, {"All Shapes", New WhatSaid With {.verb=Verbs.DoShapes, .shape=PolyType.All}}, {"Everything", New WhatSaid With {.verb=Verbs.DoShapes, .shape=PolyType.All}}, {"Shapes", New WhatSaid With {.verb=Verbs.DoShapes, .shape=PolyType.All}}} Private ColorPhrases As New Dictionary(Of String, WhatSaid) From {{"Every Color", New WhatSaid With {.verb = Verbs.RandomColors}}, {"All Colors", New WhatSaid With {.verb = Verbs.RandomColors}}, {"Random Colors", New WhatSaid With {.verb = Verbs.RandomColors}}, {"Red", New WhatSaid With {.verb = Verbs.Colorize, .color = System.Windows.Media.Color.FromRgb(240, 60, 60)}}, {"Green", New WhatSaid With {.verb = Verbs.Colorize, .color = System.Windows.Media.Color.FromRgb(60, 240, 60)}}, {"Blue", New WhatSaid With {.verb = Verbs.Colorize, .color = System.Windows.Media.Color.FromRgb(60, 60, 240)}}, {"Yellow", New WhatSaid With {.verb = Verbs.Colorize, .color = System.Windows.Media.Color.FromRgb(240, 240, 60)}}, {"Orange", New WhatSaid With {.verb = Verbs.Colorize, .color = System.Windows.Media.Color.FromRgb(255, 110, 20)}}, {"Purple", New WhatSaid With {.verb = Verbs.Colorize, .color = System.Windows.Media.Color.FromRgb(70, 30, 255)}}, {"Violet", New WhatSaid With {.verb = Verbs.Colorize, .color = System.Windows.Media.Color.FromRgb(160, 30, 245)}}, {"Pink", New WhatSaid With {.verb = Verbs.Colorize, .color = System.Windows.Media.Color.FromRgb(255, 128, 225)}}, {"Gray", New WhatSaid With {.verb = Verbs.Colorize, .color = System.Windows.Media.Color.FromRgb(192, 192, 192)}}, {"Brown", New WhatSaid With {.verb = Verbs.Colorize, .color = System.Windows.Media.Color.FromRgb(130, 80, 50)}}, {"Dark", New WhatSaid With {.verb = Verbs.Colorize, .color = System.Windows.Media.Color.FromRgb(40, 40, 40)}}, {"Black", New WhatSaid With {.verb = Verbs.Colorize, .color = System.Windows.Media.Color.FromRgb(5, 5, 5)}}, {"Bright", New WhatSaid With {.verb = Verbs.Colorize, .color = System.Windows.Media.Color.FromRgb(240, 240, 240)}}, {"White", New WhatSaid With {.verb = Verbs.Colorize, .color = System.Windows.Media.Color.FromRgb(255, 255, 255)}}} Private SinglePhrases As New Dictionary(Of String, WhatSaid) From {{"Speed Up", New WhatSaid With {.verb=Verbs.Faster}}, {"Slow Down", New WhatSaid With {.verb=Verbs.Slower}}, {"Reset", New WhatSaid With {.verb=Verbs.Reset}}, {"Clear", New WhatSaid With {.verb=Verbs.Reset}}, {"Stop", New WhatSaid With {.verb=Verbs.Pause}}, {"Pause Game", New WhatSaid With {.verb=Verbs.Pause}}, {"Freeze", New WhatSaid With {.verb=Verbs.Pause}}, {"Unfreeze", New WhatSaid With {.verb=Verbs.Resume}}, {"Resume", New WhatSaid With {.verb=Verbs.Resume}}, {"Continue", New WhatSaid With {.verb=Verbs.Resume}}, {"Play", New WhatSaid With {.verb=Verbs.Resume}}, {"Start", New WhatSaid With {.verb=Verbs.Resume}}, {"Go", New WhatSaid With {.verb=Verbs.Resume}}} Public Class SaidSomethingArgs Inherits EventArgs Public Property Verb As Verbs Public Property Shape As PolyType Public Property RGBColor As System.Windows.Media.Color Public Property Phrase As String Public Property Matched As String End Class Public Event SaidSomething As EventHandler(Of SaidSomethingArgs) Private kinectSource As KinectAudioSource Private sre As SpeechRecognitionEngine Private Const RecognizerId As String = "SR_MS_en-US_Kinect_10.0" Private paused As Boolean = False Private valid As Boolean = False Public Sub New() Try Dim ri As RecognizerInfo = SpeechRecognitionEngine.InstalledRecognizers().Where(Function(r) r.Id = RecognizerId).FirstOrDefault() If ri Is Nothing Then Return End If ' Build a simple grammar of shapes, colors, and some simple program control sre = New SpeechRecognitionEngine(ri.Id) Catch _Exception As Exception Console.WriteLine(_Exception.ToString()) Return End Try Dim [single] = New Choices For Each phrase In SinglePhrases [single].Add(phrase.Key) Next phrase Dim gameplay = New Choices For Each phrase In GameplayPhrases gameplay.Add(phrase.Key) Next phrase Dim shapes = New Choices For Each phrase In ShapePhrases shapes.Add(phrase.Key) Next phrase Dim colors = New Choices For Each phrase In ColorPhrases colors.Add(phrase.Key) Next phrase Dim coloredShapeGrammar = New GrammarBuilder coloredShapeGrammar.Append(colors) coloredShapeGrammar.Append(shapes) Dim objectChoices = New Choices objectChoices.Add(gameplay) objectChoices.Add(shapes) objectChoices.Add(colors) objectChoices.Add(coloredShapeGrammar) Dim actionGrammar = New GrammarBuilder actionGrammar.AppendWildcard() actionGrammar.Append(objectChoices) Dim allChoices = New Choices allChoices.Add(actionGrammar) allChoices.Add([single]) Dim gb = New GrammarBuilder gb.Append(allChoices) Dim g = New Grammar(gb) sre.LoadGrammar(g) AddHandler sre.SpeechRecognized, AddressOf sre_SpeechRecognized AddHandler sre.SpeechHypothesized, AddressOf sre_SpeechHypothesized AddHandler sre.SpeechRecognitionRejected, AddressOf sre_SpeechRecognitionRejected Dim t = New Thread(AddressOf StartDMO) t.Start() valid = True End Sub Public Function IsValid() As Boolean Return valid End Function Private Sub StartDMO() kinectSource = New KinectAudioSource kinectSource.SystemMode = SystemMode.OptibeamArrayOnly kinectSource.FeatureMode = True kinectSource.AutomaticGainControl = False kinectSource.MicArrayMode = MicArrayMode.MicArrayAdaptiveBeam Dim kinectStream = kinectSource.Start() sre.SetInputToAudioStream(kinectStream, New SpeechAudioFormatInfo(EncodingFormat.Pcm, 16000, 16, 1, 32000, 2, Nothing)) sre.RecognizeAsync(RecognizeMode.Multiple) End Sub Public Sub [Stop]() If sre IsNot Nothing Then sre.RecognizeAsyncCancel() sre.RecognizeAsyncStop() kinectSource.Dispose() End If End Sub Private Sub sre_SpeechRecognitionRejected(ByVal sender As Object, ByVal e As SpeechRecognitionRejectedEventArgs) Dim said = New SaidSomethingArgs said.Verb = Verbs.None said.Matched = "?" RaiseEvent SaidSomething(New Object, said) Console.WriteLine(vbLf & "Speech Rejected") End Sub Private Sub sre_SpeechHypothesized(ByVal sender As Object, ByVal e As SpeechHypothesizedEventArgs) Console.Write(vbCr & "Speech Hypothesized: " & vbTab & "{0}" & vbTab & "Conf:" & vbTab & "{1}", e.Result.Text, e.Result.Confidence) End Sub Private Sub sre_SpeechRecognized(ByVal sender As Object, ByVal e As SpeechRecognizedEventArgs) Console.Write(vbCr & "Speech Recognized: " & vbTab & "{0}" & vbTab & "Conf:" & vbTab & "{1}", e.Result.Text, e.Result.Confidence) If (e.Result.Confidence < 0.75) OrElse (SaidSomethingEvent Is Nothing) Then Return End If Dim said = New SaidSomethingArgs said.RGBColor = System.Windows.Media.Color.FromRgb(0, 0, 0) said.Shape = 0 said.Verb = 0 said.Phrase = e.Result.Text ' First check for color, in case both color _and_ shape were both spoken Dim foundColor As Boolean = False For Each phrase In ColorPhrases If e.Result.Text.Contains(phrase.Key) AndAlso (phrase.Value.verb = Verbs.Colorize) Then said.RGBColor = phrase.Value.color said.Matched = phrase.Key foundColor = True Exit For End If Next phrase ' Look for a match in the order of the lists below, first match wins. Dim allDicts As New List(Of Dictionary(Of String, WhatSaid)) From {GameplayPhrases, ShapePhrases, ColorPhrases, SinglePhrases} Dim found As Boolean = False Dim i As Integer = 0 Do While i < allDicts.Count AndAlso Not found For Each phrase In allDicts(i) If e.Result.Text.Contains(phrase.Key) Then said.Verb = phrase.Value.verb said.Shape = phrase.Value.shape If (said.Verb = Verbs.DoShapes) AndAlso (foundColor) Then said.Verb = Verbs.ShapesAndColors said.Matched &= " " & phrase.Key Else said.Matched = phrase.Key said.RGBColor = phrase.Value.color End If found = True Exit For End If Next phrase i += 1 Loop If Not found Then Return End If If paused Then ' Only accept restart or reset If (said.Verb <> Verbs.Resume) AndAlso (said.Verb <> Verbs.Reset) Then Return End If paused = False Else If said.Verb = Verbs.Resume Then Return End If End If If said.Verb = Verbs.Pause Then paused = True End If RaiseEvent SaidSomething(New Object, said) End Sub End Class End Namespace

Quote:

When recognized the program prints the recognized phrase and a confidence model. How would I use this data to offer a verbal response rather than just printing? Any help would be greatly appreciated.

You could store and return back the recorded user voice. Naturally you should also print it, so the user gets feedback that his speech was correctly recognized. However to offer a computer generated speech, you need more than Kinect. I don't think it offers a speech generator.

Quote:

Would it be possible to add more grammar variables that were not included in colors? Would this be building a new grammar instance?

I'm not sure what Kinect understands grammar to be. It should only concern itself with word tokens; that's where the value of speech recognition lies. Grammar is then dealt with outside the speech recognition routines by a traditional textual parser.

If, on the other hand grammar is understood here as simply a list of words (a dictionary), then known that Kinect uses the Windows Speech Recognition API, which, to my knowledge, does not support user-defined word lists.