Thread: Speech Recognition - w/ attachment

  1. #1
    Registered User
    Join Date
    Jun 2011
    Posts
    7

    Speech Recognition - w/ attachment

    Wondering if someone could help me with a few questions I have about Speech Recognition. I'm using the kinect as my input and had some code that I wanted to build on.

    Code:
     namespace WpfApplication1
    {
    
        class Program
        {
            private const string RecognizerId = "SR_MS_en-US_Kinect_10.0";
    
            static void MainWindow (string[] args)
         
            {
                using (var source = new KinectAudioSource())
                {
                    source.FeatureMode = true;
                    source.AutomaticGainControl = false; //Important to turn this off for speech recognition
                    source.SystemMode = SystemMode.OptibeamArrayOnly; //No AEC for this sample
    
                    RecognizerInfo ri = SpeechRecognitionEngine.InstalledRecognizers().Where(r => r.Id == RecognizerId).FirstOrDefault();
    
                    if (ri == null)
                    {
                        Console.WriteLine("Could not find speech recognizer: {0}. Please refer to the sample requirements.", RecognizerId);
                        return;
                    }
    
                    Console.WriteLine("Using: {0}", ri.Name);
    
                    using (var sre = new SpeechRecognitionEngine(ri.Id))
                    {
    
                        
    
                        var colors = new Choices();
                        colors.Add("red");
                        colors.Add("green");
                        colors.Add("blue");
                        colors.Add("yellow");
                        colors.Add("orange");
                        colors.Add("brown");
                        colors.Add("black");
                        colors.Add("white");
                        colors.Add("pink");
                        colors.Add("go home");
                        colors.Add("Moose");
                        colors.Add("computer");
                        
                      
                         var gb = new GrammarBuilder();
    
    
                        //Specify the culture to match the recognizer in case we are running in a different culture.                                 
                        gb.Culture = ri.Culture;
                        gb.Append(colors);
    
    
                        // Create the actual Grammar instance, and then load it into the speech recognizer.
                        var g = new Grammar(gb);
    
                        sre.LoadGrammar(g);
                        sre.SpeechRecognized += SreSpeechRecognized;
                        sre.SpeechHypothesized += SreSpeechHypothesized;
                        sre.SpeechRecognitionRejected += SreSpeechRecognitionRejected;
    
                        using (Stream s = source.Start())
                        {
                            sre.SetInputToAudioStream(s,
                                                      new SpeechAudioFormatInfo(
                                                          EncodingFormat.Pcm, 16000, 16, 1,
                                                          32000, 2, null));
    
                            Console.WriteLine("Recognizing. Say: 'red', 'green' or 'blue'. Press ENTER to stop");
    
                            sre.RecognizeAsync(RecognizeMode.Multiple);
                            Console.ReadLine();
                            Console.WriteLine("Stopping recognizer ...");
                            sre.RecognizeAsyncStop();
                        }
                    }
                }
            }
    
            static void SreSpeechRecognitionRejected(object sender, SpeechRecognitionRejectedEventArgs e)
            {
                Console.WriteLine("\nSpeech Rejected");
                if (e.Result != null)
                    DumpRecordedAudio(e.Result.Audio);
            }
    
            static void SreSpeechHypothesized(object sender, SpeechHypothesizedEventArgs e)
            {
                Console.Write("\rSpeech Hypothesized: \t{0}", e.Result.Text);
            }
    
            static void SreSpeechRecognized(object sender, SpeechRecognizedEventArgs e)
            {
                //This first release of the Kinect language pack doesn't have a reliable confidence model, so 
                //we don't use e.Result.Confidence here.
                Console.WriteLine("\nSpeech Recognized: \t{0}", e.Result.Text);
            }
    
            private static void DumpRecordedAudio(RecognizedAudio audio)
            {
                if (audio == null) return;
    
                int fileId = 0;
                string filename;
                while (File.Exists((filename = "RetainedAudio_" + fileId + ".wav")))
                    fileId++;
    
                Console.WriteLine("\nWriting file: {0}", filename);
                using (var file = new FileStream(filename, System.IO.FileMode.CreateNew))
                    audio.WriteToWaveStream(file);
            }
    
        }
    }


    1. How could the the voice recognition grammar phrases (i.e green, blue, go home, moose, computer) produce confirmation when recognized and cause a response, whether verbal response or actions.

    i.e

    Code:
     var colors = new Choices();
                        colors.Add("red");
                        colors.Add("pink");
                        colors.Add("go home");
                        colors.Add("Moose");
                        colors.Add("computer");
    Code:
     Console.WriteLine("Recognizing. Say: 'red', 'green' or 'blue'. Press ENTER to stop");
    
                            sre.RecognizeAsync(RecognizeMode.Multiple);
                            Console.ReadLine();
                            Console.WriteLine("Stopping recognizer ...");
                            sre.RecognizeAsyncStop();
    when recognized the program prints the recognized phrase and a confidence model. How would I use this data to offer a verbal response rather than just printing? Any help would be greatly appreciated.

    i.e

    Code:
         static void SreSpeechRecognitionRejected(object sender, SpeechRecognitionRejectedEventArgs e)
            {
                Console.WriteLine("\nSpeech Rejected");
                if (e.Result != null)
                    DumpRecordedAudio(e.Result.Audio);
            }
    
            static void SreSpeechHypothesized(object sender, SpeechHypothesizedEventArgs e)
            {
                Console.Write("\rSpeech Hypothesized: \t{0}", e.Result.Text);
            }
    
            static void SreSpeechRecognized(object sender, SpeechRecognizedEventArgs e)
            {
                //This first release of the Kinect language pack doesn't have a reliable confidence model, so 
                //we don't use e.Result.Confidence here.
                Console.WriteLine("\nSpeech Recognized: \t{0}", e.Result.Text);
    Would it be possible to add more grammar variables that were not included in colors? Would this be building a new grammar instance?

    Code:
     // Create the actual Grammar instance, and then load it into the speech recognizer.
                        var g = new Grammar(gb);
    
                        sre.LoadGrammar(g);
                        sre.SpeechRecognized += SreSpeechRecognized;
                        sre.SpeechHypothesized += SreSpeechHypothesized;
                        sre.SpeechRecognitionRejected += SreSpeechRecognitionRejected;
    
                        using (Stream s = source.Start())
                        {
                            sre.SetInputToAudioStream(s,
                                                      new SpeechAudioFormatInfo(
                                                          EncodingFormat.Pcm, 16000, 16, 1,
                                                          32000, 2, null));
    
                            Console.WriteLine("Recognizing. Say: 'red', 'green' or 'blue'. Press ENTER to stop");
    
                            sre.RecognizeAsync(RecognizeMode.Multiple);
                            Console.ReadLine();
                            Console.WriteLine("Stopping recognizer ...");
                            sre.RecognizeAsyncStop();
    Last edited by Reece Allen; 06-19-2011 at 12:00 PM.

  2. #2
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,660
    Looks nothing like C, moved to C#
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  3. #3
    Registered User
    Join Date
    Jun 2011
    Posts
    7
    Quote Originally Posted by Salem View Post
    Looks nothing like C, moved to C#
    Thanks

  4. #4
    Registered User
    Join Date
    Jun 2011
    Posts
    7
    Well since I'm still researching it also, as I come across possible solutions, ill include them here. Would the code below answer the question about response/action above if they were merged?

    Code:
     Namespace ShapeGame_Speech
    
    	Public Class Recognizer
    
    		Public Enum Verbs
    
    			None = 0
    			Bigger
    			Biggest
    			Smaller
    			Smallest
    			More
    			Fewer
    			Faster
    			Slower
    			Colorize
    			RandomColors
    			DoShapes
    			ShapesAndColors
    			Reset
    			Pause
    			[Resume]
    
    		End Enum
    
    		Private Structure WhatSaid
    
    			Public verb As Verbs
    			Public shape As PolyType
    			Public color As System.Windows.Media.Color
    
    		End Structure
    
    		Private GameplayPhrases As New Dictionary(Of String, WhatSaid) From {{"Faster", New WhatSaid With {.verb=Verbs.Faster}}, {"Slower", New WhatSaid With {.verb=Verbs.Slower}}, {"Bigger", New WhatSaid With {.verb=Verbs.Bigger}}, {"Bigger Shapes", New WhatSaid With {.verb=Verbs.Bigger}}, {"Larger", New WhatSaid With {.verb=Verbs.Bigger}}, {"Huge", New WhatSaid With {.verb=Verbs.Biggest}}, {"Giant", New WhatSaid With {.verb=Verbs.Biggest}}, {"Biggest", New WhatSaid With {.verb=Verbs.Biggest}}, {"Super Big", New WhatSaid With {.verb=Verbs.Biggest}}, {"Smaller", New WhatSaid With {.verb=Verbs.Smaller}}, {"Tiny", New WhatSaid With {.verb=Verbs.Smallest}}, {"Super Small", New WhatSaid With {.verb=Verbs.Smallest}}, {"Smallest", New WhatSaid With {.verb=Verbs.Smallest}}, {"More Shapes", New WhatSaid With {.verb=Verbs.More}}, {"More", New WhatSaid With {.verb=Verbs.More}}, {"Less", New WhatSaid With {.verb=Verbs.Fewer}}, {"Fewer", New WhatSaid With {.verb=Verbs.Fewer}}}
    
    		Private ShapePhrases As New Dictionary(Of String, WhatSaid) From {{"7 Pointed Stars", New WhatSaid With {.verb=Verbs.DoShapes, .shape=PolyType.Star7}}, {"Triangles", New WhatSaid With {.verb=Verbs.DoShapes, .shape=PolyType.Triangle}}, {"Squares", New WhatSaid With {.verb=Verbs.DoShapes, .shape=PolyType.Square}}, {"Hexagons", New WhatSaid With {.verb=Verbs.DoShapes, .shape=PolyType.Hex}}, {"Pentagons", New WhatSaid With {.verb=Verbs.DoShapes, .shape=PolyType.Pentagon}}, {"Stars", New WhatSaid With {.verb=Verbs.DoShapes, .shape=PolyType.Star}}, {"Circles", New WhatSaid With {.verb=Verbs.DoShapes, .shape=PolyType.Circle}}, {"Bubbles", New WhatSaid With {.verb=Verbs.DoShapes, .shape=PolyType.Bubble}}, {"All Shapes", New WhatSaid With {.verb=Verbs.DoShapes, .shape=PolyType.All}}, {"Everything", New WhatSaid With {.verb=Verbs.DoShapes, .shape=PolyType.All}}, {"Shapes", New WhatSaid With {.verb=Verbs.DoShapes, .shape=PolyType.All}}}
    
    		Private ColorPhrases As New Dictionary(Of String, WhatSaid) From {{"Every Color", New WhatSaid With {.verb = Verbs.RandomColors}}, {"All Colors", New WhatSaid With {.verb = Verbs.RandomColors}}, {"Random Colors", New WhatSaid With {.verb = Verbs.RandomColors}}, {"Red", New WhatSaid With {.verb = Verbs.Colorize, .color = System.Windows.Media.Color.FromRgb(240, 60, 60)}}, {"Green", New WhatSaid With {.verb = Verbs.Colorize, .color = System.Windows.Media.Color.FromRgb(60, 240, 60)}}, {"Blue", New WhatSaid With {.verb = Verbs.Colorize, .color = System.Windows.Media.Color.FromRgb(60, 60, 240)}}, {"Yellow", New WhatSaid With {.verb = Verbs.Colorize, .color = System.Windows.Media.Color.FromRgb(240, 240, 60)}}, {"Orange", New WhatSaid With {.verb = Verbs.Colorize, .color = System.Windows.Media.Color.FromRgb(255, 110, 20)}}, {"Purple", New WhatSaid With {.verb = Verbs.Colorize, .color = System.Windows.Media.Color.FromRgb(70, 30, 255)}}, {"Violet", New WhatSaid With {.verb = Verbs.Colorize, .color = System.Windows.Media.Color.FromRgb(160, 30, 245)}}, {"Pink", New WhatSaid With {.verb = Verbs.Colorize, .color = System.Windows.Media.Color.FromRgb(255, 128, 225)}}, {"Gray", New WhatSaid With {.verb = Verbs.Colorize, .color = System.Windows.Media.Color.FromRgb(192, 192, 192)}}, {"Brown", New WhatSaid With {.verb = Verbs.Colorize, .color = System.Windows.Media.Color.FromRgb(130, 80, 50)}}, {"Dark", New WhatSaid With {.verb = Verbs.Colorize, .color = System.Windows.Media.Color.FromRgb(40, 40, 40)}}, {"Black", New WhatSaid With {.verb = Verbs.Colorize, .color = System.Windows.Media.Color.FromRgb(5, 5, 5)}}, {"Bright", New WhatSaid With {.verb = Verbs.Colorize, .color = System.Windows.Media.Color.FromRgb(240, 240, 240)}}, {"White", New WhatSaid With {.verb = Verbs.Colorize, .color = System.Windows.Media.Color.FromRgb(255, 255, 255)}}}
    
    		Private SinglePhrases As New Dictionary(Of String, WhatSaid) From {{"Speed Up", New WhatSaid With {.verb=Verbs.Faster}}, {"Slow Down", New WhatSaid With {.verb=Verbs.Slower}}, {"Reset", New WhatSaid With {.verb=Verbs.Reset}}, {"Clear", New WhatSaid With {.verb=Verbs.Reset}}, {"Stop", New WhatSaid With {.verb=Verbs.Pause}}, {"Pause Game", New WhatSaid With {.verb=Verbs.Pause}}, {"Freeze", New WhatSaid With {.verb=Verbs.Pause}}, {"Unfreeze", New WhatSaid With {.verb=Verbs.Resume}}, {"Resume", New WhatSaid With {.verb=Verbs.Resume}}, {"Continue", New WhatSaid With {.verb=Verbs.Resume}}, {"Play", New WhatSaid With {.verb=Verbs.Resume}}, {"Start", New WhatSaid With {.verb=Verbs.Resume}}, {"Go", New WhatSaid With {.verb=Verbs.Resume}}}
    
    		Public Class SaidSomethingArgs
    			Inherits EventArgs
    
    			Public Property Verb As Verbs
    			Public Property Shape As PolyType
    			Public Property RGBColor As System.Windows.Media.Color
    			Public Property Phrase As String
    			Public Property Matched As String
    
    		End Class
    
    		Public Event SaidSomething As EventHandler(Of SaidSomethingArgs)
    
    		Private kinectSource As KinectAudioSource
    		Private sre As SpeechRecognitionEngine
    		Private Const RecognizerId As String = "SR_MS_en-US_Kinect_10.0"
    		Private paused As Boolean = False
    		Private valid As Boolean = False
    
    		Public Sub New()
    
    			Try
    
    				Dim ri As RecognizerInfo = SpeechRecognitionEngine.InstalledRecognizers().Where(Function(r) r.Id = RecognizerId).FirstOrDefault()
    				If ri Is Nothing Then
    
    					Return
    
    				End If
    
    				' Build a simple grammar of shapes, colors, and some simple program control
    				sre = New SpeechRecognitionEngine(ri.Id)
    
    			Catch _Exception As Exception
    
    				Console.WriteLine(_Exception.ToString())
    				Return
    
    			End Try
    
    			Dim [single] = New Choices
    			For Each phrase In SinglePhrases
    				[single].Add(phrase.Key)
    			Next phrase
    
    			Dim gameplay = New Choices
    			For Each phrase In GameplayPhrases
    				gameplay.Add(phrase.Key)
    			Next phrase
    
    			Dim shapes = New Choices
    			For Each phrase In ShapePhrases
    				shapes.Add(phrase.Key)
    			Next phrase
    
    			Dim colors = New Choices
    			For Each phrase In ColorPhrases
    				colors.Add(phrase.Key)
    			Next phrase
    
    			Dim coloredShapeGrammar = New GrammarBuilder
    			coloredShapeGrammar.Append(colors)
    			coloredShapeGrammar.Append(shapes)
    
    			Dim objectChoices = New Choices
    			objectChoices.Add(gameplay)
    			objectChoices.Add(shapes)
    			objectChoices.Add(colors)
    			objectChoices.Add(coloredShapeGrammar)
    
    			Dim actionGrammar = New GrammarBuilder
    			actionGrammar.AppendWildcard()
    			actionGrammar.Append(objectChoices)
    
    			Dim allChoices = New Choices
    			allChoices.Add(actionGrammar)
    			allChoices.Add([single])
    
    			Dim gb = New GrammarBuilder
    			gb.Append(allChoices)
    
    			Dim g = New Grammar(gb)
    			sre.LoadGrammar(g)
    			AddHandler sre.SpeechRecognized, AddressOf sre_SpeechRecognized
    			AddHandler sre.SpeechHypothesized, AddressOf sre_SpeechHypothesized
    			AddHandler sre.SpeechRecognitionRejected, AddressOf sre_SpeechRecognitionRejected
    
    			Dim t = New Thread(AddressOf StartDMO)
    			t.Start()
    
    			valid = True
    
    		End Sub
    
    		Public Function IsValid() As Boolean
    
    			Return valid
    
    		End Function
    
    		Private Sub StartDMO()
    
    			kinectSource = New KinectAudioSource
    			kinectSource.SystemMode = SystemMode.OptibeamArrayOnly
    			kinectSource.FeatureMode = True
    			kinectSource.AutomaticGainControl = False
    			kinectSource.MicArrayMode = MicArrayMode.MicArrayAdaptiveBeam
    			Dim kinectStream = kinectSource.Start()
    			sre.SetInputToAudioStream(kinectStream, New SpeechAudioFormatInfo(EncodingFormat.Pcm, 16000, 16, 1, 32000, 2, Nothing))
    			sre.RecognizeAsync(RecognizeMode.Multiple)
    
    		End Sub
    
    		Public Sub [Stop]()
    
    			If sre IsNot Nothing Then
    
    				sre.RecognizeAsyncCancel()
    				sre.RecognizeAsyncStop()
    				kinectSource.Dispose()
    
    			End If
    
    		End Sub
    
    		Private Sub sre_SpeechRecognitionRejected(ByVal sender As Object, ByVal e As SpeechRecognitionRejectedEventArgs)
    
    			Dim said = New SaidSomethingArgs
    			said.Verb = Verbs.None
    			said.Matched = "?"
    			RaiseEvent SaidSomething(New Object, said)
    			Console.WriteLine(vbLf & "Speech Rejected")
    
    		End Sub
    
    		Private Sub sre_SpeechHypothesized(ByVal sender As Object, ByVal e As SpeechHypothesizedEventArgs)
    
    			Console.Write(vbCr & "Speech Hypothesized: " & vbTab & "{0}" & vbTab & "Conf:" & vbTab & "{1}", e.Result.Text, e.Result.Confidence)
    
    		End Sub
    
    		Private Sub sre_SpeechRecognized(ByVal sender As Object, ByVal e As SpeechRecognizedEventArgs)
    
    			Console.Write(vbCr & "Speech Recognized: " & vbTab & "{0}" & vbTab & "Conf:" & vbTab & "{1}", e.Result.Text, e.Result.Confidence)
    
    			If (e.Result.Confidence < 0.75) OrElse (SaidSomethingEvent Is Nothing) Then
    				Return
    			End If
    
    			Dim said = New SaidSomethingArgs
    			said.RGBColor = System.Windows.Media.Color.FromRgb(0, 0, 0)
    			said.Shape = 0
    			said.Verb = 0
    			said.Phrase = e.Result.Text
    
    			' First check for color, in case both color _and_ shape were both spoken
    			Dim foundColor As Boolean = False
    			For Each phrase In ColorPhrases
    				If e.Result.Text.Contains(phrase.Key) AndAlso (phrase.Value.verb = Verbs.Colorize) Then
    
    					said.RGBColor = phrase.Value.color
    					said.Matched = phrase.Key
    					foundColor = True
    					Exit For
    
    				End If
    			Next phrase
    
    			' Look for a match in the order of the lists below, first match wins.
    			Dim allDicts As New List(Of Dictionary(Of String, WhatSaid)) From {GameplayPhrases, ShapePhrases, ColorPhrases, SinglePhrases}
    
    			Dim found As Boolean = False
    			Dim i As Integer = 0
    			Do While i < allDicts.Count AndAlso Not found
    
    				For Each phrase In allDicts(i)
    
    					If e.Result.Text.Contains(phrase.Key) Then
    
    						said.Verb = phrase.Value.verb
    						said.Shape = phrase.Value.shape
    						If (said.Verb = Verbs.DoShapes) AndAlso (foundColor) Then
    
    							said.Verb = Verbs.ShapesAndColors
    							said.Matched &= " " & phrase.Key
    
    						Else
    
    							said.Matched = phrase.Key
    							said.RGBColor = phrase.Value.color
    
    						End If
    						found = True
    						Exit For
    
    					End If
    
    				Next phrase
    
    				i += 1
    			Loop
    
    			If Not found Then
    				Return
    			End If
    
    			If paused Then ' Only accept restart or reset
    
    				If (said.Verb <> Verbs.Resume) AndAlso (said.Verb <> Verbs.Reset) Then
    					Return
    				End If
    				paused = False
    
    			Else
    
    				If said.Verb = Verbs.Resume Then
    					Return
    				End If
    
    			End If
    
    			If said.Verb = Verbs.Pause Then
    				paused = True
    			End If
    
    			RaiseEvent SaidSomething(New Object, said)
    
    		End Sub
    
    	End Class
    
    End Namespace

  5. #5
    (?<!re)tired Mario F.'s Avatar
    Join Date
    May 2006
    Location
    Ireland
    Posts
    8,446
    When recognized the program prints the recognized phrase and a confidence model. How would I use this data to offer a verbal response rather than just printing? Any help would be greatly appreciated.
    You could store and return back the recorded user voice. Naturally you should also print it, so the user gets feedback that his speech was correctly recognized. However to offer a computer generated speech, you need more than Kinect. I don't think it offers a speech generator.

    Would it be possible to add more grammar variables that were not included in colors? Would this be building a new grammar instance?
    I'm not sure what Kinect understands grammar to be. It should only concern itself with word tokens; that's where the value of speech recognition lies. Grammar is then dealt with outside the speech recognition routines by a traditional textual parser.

    If, on the other hand grammar is understood here as simply a list of words (a dictionary), then known that Kinect uses the Windows Speech Recognition API, which, to my knowledge, does not support user-defined word lists.
    Originally Posted by brewbuck:
    Reimplementing a large system in another language to get a 25% performance boost is nonsense. It would be cheaper to just get a computer which is 25% faster.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. help with message attachment
    By dragunov in forum Networking/Device Communication
    Replies: 4
    Last Post: 09-10-2007, 11:36 PM
  2. Open Source I.B.M. Speech Recognition Software
    By hk_mp5kpdw in forum A Brief History of Cprogramming.com
    Replies: 4
    Last Post: 09-13-2004, 05:28 PM
  3. speech/handwriting recognition for programming?
    By Sargnagel in forum Tech Board
    Replies: 7
    Last Post: 08-25-2003, 02:24 PM
  4. speech recognition
    By mehmet in forum C++ Programming
    Replies: 4
    Last Post: 07-15-2002, 11:50 AM