Alexa is the New Command Line

In the beginning…

In the beginning, man created the command line interface, and lo! There were commands that no-one could remember and syntax designed by Satan himself.

User interface experts call this problem discoverability; given that you’re at a particular point in an application, how do you find where you can go next, or what you can do? The early graphical user interfaces beat the more powerful command line because they allowed users to discover features without needing to remember that a feature was there. This property turns out to be so compelling that command lines were relegated to, well, somewhere that you can discover with a bit of digging.

Challenging the graphical user interface

The unchallenged dominance of the graphical user interface is facing a new contender: voice-activated assistants, such as Siri, Alexa, and Google Assistant. These ever-listening devices attack the soft underbelly of the graphical user interface; the (non-alternative) fact that you need a graphical screen to interact with them, and you need to be within touching distance of that screen. With voice-activated assistants, you only need to be in yelling distance (or have your phone nearby).

Asking the same question?

Once you’ve vocally activated your assistant, you need to give it commands. One of the hard problems with this, and with life in general, is that different people ask questions in different ways. Where you’ll say “Alexa, what time is it?”, I’ll proclaim “Alexa, what be the hour?”. Internally, the servers powering Alexa need to figure out that we’re asking the same question, which we call disambiguation. One of the strengths of command and graphical interfaces is that input is unambiguous (yes, you really did click the “delete all my files” button). Unfortunately, disambiguation is a hard problem, even for relatively simple commands. Try adding “fork handles” to your shopping list to discover this for yourself.

Discoverability on a voice assistant?

If we can make the simplifying assumption that we’ve solved the above problem, we’ve just discovered a deeper problem; how do you do discoverability on a voice assistant? “Siri, tell me everything you can do” is likely to flatten your phone battery pretty quickly (which I don’t believe is an intended feature of Siri), nor does it help you decide if Siri can order you a late-night Chimichanga delivery. At the moment, this isn’t really a problem because voice assistants are very limited in what they can achieve. Alexa is about to run face-first into this problem with the addition of Traits. Given two Alexa devices with Traits, there’s no way to tell which Traits are available. Without a good solution to the discoverability problem (wait, you were expecting me to have one?), voice assistants will be limited to simple commands and instructions.

An interesting property of command lines that hasn’t featured in voice assistants yet is that of composition (i.e. can I chain the output of multiple commands together?). We even have this concept in the graphical world – the humble copy-paste allows us to move data from one program to another with only a modicum of mouse-pointer shuffling. Telling Siri to “email the news story about the giraffe to my mother” could lead to some unexpected (but possibly hilarious) results. Which is a pity, because the composition is incredibly powerful, and we really ought to continue making it available.

How long will Alexa last?

Is the end nigh for our mellifluous Alexa? It seems unlikely; convenience outweighs theoretical concerns, and there are some genuine good uses as well as novelties and party tricks. Only time will tell. If we can figure out how to ask for it, anyway.