Creating a Speech Recognition Application

The creation of a speech recognition applications is just as easy as creating any other application in Visual Basic. This ease is primarily because all of the necessary API calls are encapsulated into a set of six ActiveX controls.

One popular use of speech recognition is the ability for a user to speak a word or phrase and have the computer perform a specific task based on what it heard. This method of speech recognition is commonly referred to as “Command and Control.” Some popular uses for this type of speech recognition include the following

  • Automatic Email handling
  • Computer activated security systems
  • Computer controlled devices
  • Games
  • Remote Data Entry
  • Learning systems

Command and Control can be used in most every case where a directive must be issued to a computer to perform a specific task. Until now, Tthese directives have up to now been limited to responses via a mouse or a keyboard only, ; however, with the advent of enhanced speech engines, speech can also be added as a form of input to a computer.

In this section you will get the opportunity to create your own version of an application that can be used to respond to voice commands. This application (although limited) can be used as the core of a much more sophisticated application. As you use the Speech SDK to build your application you will find that the underlying code for each one is nearly identical, the major changes come in the design of the interface.

Setting up the Microphone

Before you can begin building any application that depends upon speech recognition, the speech engine needs to be trained to the type of microphone that is to be used as well as your individual speech pattern.

The opening page of the Speech SDK Web page show in Fig 30.1 includes a link to allow enable you to adjust the microphone to the speech engine (see fig. 30.1). The Microphone setup Wizard is accessed from the link as shown in Figure 30.2.

Figure 30.1

The Welcome page of the Speech SDK Web page provides links to sample applications.

Figure 30.2

The Microphone setup page of the Speech SDK includes a link to the C++ source code for the Wizard.

After it is installed on your computer, The Microphone Setup Wizard once installed on your computer will prompts you to enter the type of microphone that you have and the type of speakers. After you have answered these questions, you need to adjust the microphone levels. This task is accomplished simply by speaking into the microphone andt reciting a specific paragraph as shown in Figure 30.3. After you have done this click the Finish button to complete the setup process.

Figure 30.3

A good quality microphone will reproduce your speech more accurately than one of a lesser quality.

tip

The Microphone setup Wizard can be re-run as often as you like. It is also a good idea to run the Microphone Setup Wizard if you have recently changed the type of microphone that you’re using.

Using the Direct Speech Control

To create the Direct Speech application , first start a new Visual Basic 6 project and perform the following steps:

  1. Add the Microsoft Direct Speech control to your tool box as shown in fig Figure 30.4.
  2. Add the controls to the form and set their properties as shown in table 30.1 When it is complete, your form should resemble the oneat shown in fig Figure 30.5.
  3. Add the code in listing Table 30.1.
  4. Save your project and run the application.

Figure 30.4

The Direct Speech Control is one of six ActiveX controls in the SAPI.

Table 30.1. The controls for the Speech Recognition program.

Control Property Setting
TextBox Name txtCommand
Direct Speech Recognition Name DirectSR
Recognition

Figure 30.5

The completed Command and Control Applicationapplication.

The Form_Load() event in listing 30.1 loads the Direct Speech control with the specified grammar in the text string. This text string contains the words or phrases that the speech engine is expecting to hear. The DirectSR_PhraseFinish() property is called when the user is done speaking. This property is then used to compare what was heard with what the phrases are in the grammar file. If a match was is found, the appropriate application is launched.

  Listing 30.1 - command.frm - The code required to launch applications with the spoken word.

  Private Sub Form_Load()
  Dim retval As Integer
  DirectSR.GrammarFromString "[Grammar]" + vbNewLine + _
                          "type=cfg" + vbNewLine + _
                          "[<start>]" + vbNewLine + _
                          "<start>=Launch Notepad" + vbNewLine + _
                          "<start>=Launch Browser" + vbNewLine + _
                          "<start>=Launch Calculator" + vbNewLine
  DirectSR.Activate
  End Sub
  Private Sub DirectSR_PhraseFinish(ByVal flags As Long, ByVal beginhi As Long, ByVal beginlo As Long,
   ByVal endhi As Long, ByVal endlo As Long, ByVal Phrase As String, ByVal parsed As String,
   ByVal results As Long)
  txtCommand.Text = Phrase
  Select Case Phrase
     Case Is = "Launch Notepad"
       retval = Shell("C:\windows\Notepad.exe", 1
     Case Is = "Launch Browser"
       retval = Shell("C:\windows\Iexplore.exe", 1)
     Case Is = "Launch Calculator"
       retval = Shell("C:\windows\Calc.exe", 1)
  End Select
  End Sub

tip

The Phrase keyword is case case sensitive. Make sure that the phrase in your grammar matches the case in you Select statements.

After the application has started, speak one of the phrases in the grammar string. If the speech engine was able to properly discern what was said, the desired application should have beenbe launched.

Properties and Methods of the Direct Speech Control

The Direct Speech ActiveX control has rich collection of properties and methods that allow enable you to configure the control to your needs. Due to space limitations, I will not discuss all of the properties are discussed here, ; rather justonly those that are used in the sample project are covered.

Activate

The Activate method tells the speech recognizer to start listening. The recognizer must be initialized and a grammar must be loaded before calling Activate.

GrammarFromString

The GrammarFromString method loads a grammar from a string. When creating a grammar string pay special attention to the explicit use of the vbNewLine character. It aAutomatically initializes the speech engine if it has not already been done. The declaration for this property is:

GrammarFromString(grammar As String)

This method is similar in nature to the GrammarFromFile, GrammarFromFile, and GrammarFromResource methods.

PhraseFinish

The property is called when the user has finished speaking a phrase and the speech-recognition engine is certain about the words that were spoken. The declaration for this property is:

PhraseFinish(flags As Long, beginhi As Long, beginlo As Long, endhi As Long, endlo As Long, Phrase As String, parsed As String, results As Long)

sidebar

Do you have the power ?

Developing applications using any of the ActiveX controls included with the Speech SDK takes a lot of computer power. Due to the type of processes that are happening in your computer, what would seem like a “normal” amount of RAM can be woefully inadequate.

Microsoft indicates that the speech engines require that your computer be a Pentium with at least 32 meg MB of RAM and have a processor in excess of 90 MHz. I tested the Speech API on a number of machines ranging from a Pentium 75 to a Pentium 233 with anywhere from 16 to 64 meg MB of RAM.

The final results showed that if your machine has upwards of 32 meg MB of RAM and a processor greater than 150 MHz, the speech engines will produce excellent results. Any systems that have less than 32 meg MB of RAM will suffer from missed words and the inability to process both speech recognition and text text-to to-speech in the same application.

Top Home