Apr 14 AT 8:00 AM Alex Byrnes 7 Comments

Voice Commands Coming to Android Cupcake?

Android phones already have voice dialing and voice search.  What about voice commands?  The signs say voice commands are coming, and soon.  The Cupcake development branch features page includes the note “VoiceDialer supports ‘open app’ command”.  Also announced in the Android roadmap was the new Input Method Framework (IMF).  The IMF supports other input methods outside of hardware keyboards such as soft keyboards (see HTC Dream).  We can only speculate as to where voice commands fit into the roadmap, and where Cupcake itself fits in.  It is likely that voice commands fit under the umbrella of IMF, and IMF has center stage in the first quarter ’09 development.

The Cupcake source code has more detail.  Voice recognition is provided by Nuance Communications(makers of Dragon NaturallySpeaking), but copyrighted under the Apache license, like most of Android.  Currently, the only core application to use voice commands is the settings application, although there may be more on someone’s desktop at Google, or in proprietary code.

All this is certainly not groundbreaking, but the implications go beyond everyday phone use because of the way the feature is implemented.

Some background

Most applications in Android, or pieces of applications (roughly corresponding to a single screen), are what are known as “Activities.”  Activities communicate with each other by way of “Intents” sort of the way web pages “communicate” by linking to each other.  The Maps application, for instance, launches the phone application when you click on a business’ phone number the way it would on Google maps.  When one activity wants to start another, it creates an intent.

Every Android application lists the types of intents that it wants to respond to.  That’s why you get a dialog asking you what application you’d like to use to complete such and such action when there are two similar applications installed.   It means that two applications have said that they can handle the current intent.  Pushing “Home” after you install a new desktop manager will do this because both the built-in desktop manager and the new one (aHome, Open Home, dxTop) offers to handle “Home” intents.

Skip here if you don’t want any technical jive.  What this means is not only does the new Android include some voice commands, it allows market applications to use them too.  This could open up a whole new avenue of voice activated games and applications.  Applications could take pictures remotely, record voices on cue, skip calls, and more.  The possibilities are endless.  The best applications probably haven’t been conceived of yet (as in cameras and bar code reading).

Great right?

Most people’s reaction will be tempered by the reception of other Google voice recognition technologies:  voice dialing, voice search, and Google Voice.  The first time I used voice search on my phone, I tried to think of the most common, yet distinctive, and therefore easiest search possible so I said, very clearly, AVVRIILL LAAVVVIIGNE .  I got “Emerald Limousine.”

Aside from taking you in interesting new avenues, and giving you the occasional chuckle, it’s not very useful.  Obviously, until it gets easier than typing a search, most people won’t use it.

But this is not voice recognition, it’s voice command.  The application doesn’t need to differentiate amongst every word in its dictionary, it just needs to know if you said, “Pawn to a5” or “Queen to c6.”  “Pawn” and “Queen” are very different words.  “Pawn,” “Prawn,” and “Palm” are not, and you could be searching for any of them, or any of the billions of other possible searches.

The same goes for voicemail transcription from Google Voice.

However, in our forthcoming, perfect, Cupcake universe, we can expect to glance at a call from across our desks and say, “No,” to make it go to voicemail.  Or “skip!” when the newest Emerald Limousine single comes on imeem.

Of course, it’s hard to say whether or not Android voice recognition will be optimized to voice commands.  Will the application be able to choose which words it wants the voice recognition engine to differentiate?  Will the user be able to create their own dictionary or use their custom spell check dictionary?  This, and many other questions, will only be answered by the actual release of Cupcake on a production phone.  Even if something exists in the code today, it could be gone tomorrow, or not integrated into the new phone where developers can make use of it.

Fortunately, in this case, the cupcake’s trajectory is looking pretty good.

Are you ready to "talk" with your phone yet?

Are you ready to "talk" with your phone yet?

Author info coming soon...

    Most Tweeted This Week

  • http://yellowrex.com William Furr

    I have a dash mount for my phone. It would be awesome to be able to input directions to Google Maps or control the Music application or TuneWiki with my voice.

    In the meantime, I’ll settle for an on-screen predictive soft keyboard so I don’t have to take my phone out of the mount to type in addresses while I’m pulled over.

  • Tim H

    The odd voice search results is because the speech recognition uses a google search to get your results, and fine tunes the results according to google searches, not by what the recognition engine actually picked up. At least, that was my interpretation of the code when I added voice for launching stuff in my application.

    Look for LANGUAGE_MODEL here:

    • http://androidandme.com Alex Byrnes

      Thanks for the link, Tim. I think this will take some more investigation. My sense was that it was processed locally but I could be wrong. Whatever the algorithm is, it needs to be optimized somehow because speech recognition is just not good enough yet, especially with all the conditions a phone has to go through. Background noise, distance for the speaker etc. Not to mention all the variations on people’s voices. BUT if we could get simple recognition amongst 10 to 20 words at a time… wow. Add in adaptation for a particular user… Could be very cool.

      • nEx.Software

        Alex, it is my understanding that the Voice Dialer processes the speech locally, while the voice recognition API processes via the same recognition engine as Google 411, over the web. I’m not positive but we might try running it without a data connection.

  • nEx.Software

    I had started a Voice-Controlled SMS app a while back using the unsupported. Voice recognition intent and found it to work well for longer words/phrases but was very unreliable on short words and phrases such as yes and no commands. Maybe had I used the language model option (I didn’t know it existed) that would have been different. I hope it works better with cupcake. I have downloaded but not yet worked with it yet.

    • http://androidandme.com Alex Byrnes

      Good to know. Context must be important. Hopefully they’ll leave it fairly open for developers to tune it the way they want. The yes/no scheme is naturally much easier to implement but their ambitions were high.

      And even if shorter words are more difficult, I don’t think anyone would have a problem with saying, “Voice mail” versus “Speaker phone” or some such.

      Clap on! Clap off!

  • Gammax

    voice anything will help for on the road!!!