Alexa, ask Fajita what’s for dinner

Several weeks ago I acquired an Amazon Echo. One of the most interesting things about this device, to me, was that I can write my own Echo skills, using the free skills kit.

This turned out to be both easier, and harder, than expected.

TL;DR: The code is at https://github.com/rbowen/fajita_echo

The biggest challenges were 1) that the docs are convoluted and incomplete, and 2) that the Echo doesn’t like SNI SSL vhosts. Details follow.

At a very simple level, when you say a certain thing to the Echo, it makes (can make) an HTTPS request to somewhere that you define, and then return a response. The request POSTs some JSON, and expects some specially-formatted JSON in response.

The Echo developer interface gives you a web-based config form to tell it where to POST your data, and what speech patterns should trigger this POST. This can be any web service of your creation, as long as it’s running HTTPS, not HTTP. Your service can also run on something called Lambda. I have no idea what that is.

Here’s how you do it:

  1. Register a developer account at https://developer.amazon.com/home.html and then go to https://developer.amazon.com/edw/home.html#/skills/list to create a new Alexa app.
  2. On the ‘Skill Information’ tab, point it to a web app endpoint running on an HTTPS server somewhere. This can be anywhere, and you can write it in any language you’re familiar with.

capture

 

 

 

 

3. On the ‘Interaction Model’ page, tell Alexa what phrases she should respond to. This is done in two steps.

a) First, tell it what actions, or ‘Intents’, your app will perform. For some reason, for this step you have to write a schema in JSON.

capture

 

 

 

 

 

b) Second, tell Alexa what phrases should trigger each of these actions by creating ‘Sample Utterances’. For example, to trigger my GetMenu action, I create a sample utterance of ‘GetMenu Whats for dinner’, which means what I will actually say is ‘Alexa, ask Fajita what’s for dinner’. (In my case, my skill is named “Fajita”, for reasons that I’ll explain at some point You just need a name for your skill that isn’t already taken by another skill.

4. On the SSL Certificate tab, indicate that you have a valid SSL cert.

5. On the ‘Test’ tab, you can start experimenting with your app. This will cause it to send JSON requests to your app, and tell you what comes  back. The format of the request and response are most clearly described (with examples) here: https://developer.amazon.com/public/solutions/alexa/alexa-skills-kit/docs/alexa-skills-kit-interface-reference.

See my example code at https://github.com/rbowen/fajita_echo/blob/master/echo.php to get an idea of what you can do. In my particular case, I have Alexa making a dinner recommendation, and, optionally, only returning vegetarian options.

Hurdles

The biggest problem in all of this was not the code, which was fairly easy, but the SSL setup. In short, it appears that the Amazon Echo doesn’t know about SNI. So, my server configuration, where I have several different HTTPS virtual hosts, each with its own SSL cert, doesn’t work. For a while, I thought it was because it didn’t like my Let’s Encrypt certificate, but this turned out not to be the case. As soon as I set my Echo app vhost to be the first, default SSL host, everything worked fine. This tells me that it’s just not making an SNI request. So, easy enough to fix, but this cost me a couple hours to hunt down.

The other, smaller problem, was that the documentation is thorough, but not laid out for a beginner. Finding the JSON format definitions, for example, took a great deal of hunting, as well as a lot of debug dumping of POST data. The links above should help bypass a little of that learning curve.

TODO

There’s several things I still want to do with my app.

  • Put the menu options in a database, rather than hard-coded
  • Provide an interface where someone can edit the menu options via the web, indicating whether a particular one is vegetarian or not
  • The interaction model provides the ability to give some variables in the sentence structure. Like, for example, I could ask for a menu suggestion “containing ginger”, and have “ginger” passed as an argument to my app. I’d like to play with this some.

Why Fajita?

An IRC channel where I spend a lot of time – #httpd on Freenode – has a helper bot named Fajita. I think of her as my personal assistant, and have long wanted to have some kind of voice interface to her in my home. Alexa is that, but Fajita will fill some of the talents that Alexa doesn’t have yet.