Apr 14 AT 8:00 AM Alex Byrnes 7 Comments

Voice Commands Coming to Android Cupcake?

Android phones already have voice dialing and voice search.  What about voice commands?  The signs say voice commands are coming, and soon.  The Cupcake development branch features page includes the note “VoiceDialer supports ‘open app’ command”.  Also announced in the Android roadmap was the new Input Method Framework (IMF).  The IMF supports other input methods outside of hardware keyboards such as soft keyboards (see HTC Dream).  We can only speculate as to where voice commands fit into the roadmap, and where Cupcake itself fits in.  It is likely that voice commands fit under the umbrella of IMF, and IMF has center stage in the first quarter ’09 development.

The Cupcake source code has more detail.  Voice recognition is provided by Nuance Communications(makers of Dragon NaturallySpeaking), but copyrighted under the Apache license, like most of Android.  Currently, the only core application to use voice commands is the settings application, although there may be more on someone’s desktop at Google, or in proprietary code.

All this is certainly not groundbreaking, but the implications go beyond everyday phone use because of the way the feature is implemented.

Some background

Most applications in Android, or pieces of applications (roughly corresponding to a single screen), are what are known as “Activities.”  Activities communicate with each other by way of “Intents” sort of the way web pages “communicate” by linking to each other.  The Maps application, for instance, launches the phone application when you click on a business’ phone number the way it would on Google maps.  When one activity wants to start another, it creates an intent.

Every Android application lists the types of intents that it wants to respond to.  That’s why you get a dialog asking you what application you’d like to use to complete such and such action when there are two similar applications installed.   It means that two applications have said that they can handle the current intent.  Pushing “Home” after you install a new desktop manager will do this because both the built-in desktop manager and the new one (aHome, Open Home, dxTop) offers to handle “Home” intents.

Skip here if you don’t want any technical jive.  What this means is not only does the new Android include some voice commands, it allows market applications to use them too.  This could open up a whole new avenue of voice activated games and applications.  Applications could take pictures remotely, record voices on cue, skip calls, and more.  The possibilities are endless.  The best applications probably haven’t been conceived of yet (as in cameras and bar code reading).

Great right?

Most people’s reaction will be tempered by the reception of other Google voice recognition technologies:  voice dialing, voice search, and Google Voice.  The first time I used voice search on my phone, I tried to think of the most common, yet distinctive, and therefore easiest search possible so I said, very clearly, AVVRIILL LAAVVVIIGNE .  I got “Emerald Limousine.”

Aside from taking you in interesting new avenues, and giving you the occasional chuckle, it’s not very useful.  Obviously, until it gets easier than typing a search, most people won’t use it.

But this is not voice recognition, it’s voice command.  The application doesn’t need to differentiate amongst every word in its dictionary, it just needs to know if you said, “Pawn to a5” or “Queen to c6.”  “Pawn” and “Queen” are very different words.  “Pawn,” “Prawn,” and “Palm” are not, and you could be searching for any of them, or any of the billions of other possible searches.

The same goes for voicemail transcription from Google Voice.

However, in our forthcoming, perfect, Cupcake universe, we can expect to glance at a call from across our desks and say, “No,” to make it go to voicemail.  Or “skip!” when the newest Emerald Limousine single comes on imeem.

Of course, it’s hard to say whether or not Android voice recognition will be optimized to voice commands.  Will the application be able to choose which words it wants the voice recognition engine to differentiate?  Will the user be able to create their own dictionary or use their custom spell check dictionary?  This, and many other questions, will only be answered by the actual release of Cupcake on a production phone.  Even if something exists in the code today, it could be gone tomorrow, or not integrated into the new phone where developers can make use of it.

Fortunately, in this case, the cupcake’s trajectory is looking pretty good.

Are you ready to "talk" with your phone yet?

Are you ready to "talk" with your phone yet?

Author info coming soon...

    Most Tweeted This Week