R&D|Engineering

BrightHR’s Alexa Integration: Lessons learnt building an Alexa Skill

August 23, 2017

Published by Lazy Dave

Alexa Integration - Lessons learnt building an Alexa Skill

You may have seen that we recently released the BrightHR Alexa skill and read our blog about how voice technology could be integral in the future of the workplace. But making a skill from scratch isn’t easy. So how did we go about it? We asked our Head of Innovation, Dave Sellars, to share his thoughts on developing our Alexa skill.

"Where do I start?"

Amazon has provided a lot of documentation around skills. There are tutorials that explain how to ‘build a voice experience in 5 minutes or less’. I initially followed the node.js flavour of these tutorials.

Whilst it is true, that in a very short time, I had a skill up and running in AWS Lambda; during the tutorials, I never actually touched any code. The tutorials gave me an introduction into the key components that comprise an Alexa skill: AWS Lambda functions and the Interaction Model. After completing the tutorial, the code which powered the function was available in the Lambda console under the ‘code’ tab and was editable inline. I set about creating some responses I expected would make it into the public version of the skill. I initially started with the response to ‘Alexa, ask BrightHR who is out today?’, the response being ‘It’s a full house, there are no absences!’.

Lesson: The tutorials help to explain the terminology used when creating Alexa Skills. Once you’ve followed one of the tutorials, you can see the code in the AWS Lambda function console and edit it directly in your browser. Be sure to copy and paste the code somewhere safe to allow you to rollback to a working version for when you break something!

"Ok, Let’s add a custom response!"

In the Lambda development portal, I added a new intent to the handlers array using the code editor:

TodaysAbsencesIntent: function () {
    this.emit(':tell', "It’s a full house, there are a no absences")
},

This is the code which will run after the intent has been triggered. I took inspiration from the ‘GetNewFactIntent’ handler that was there as a result from completing the tutorial.

In the interaction model of the skill development portal, I added the details of the new intent I wanted to be available to the skill. This was added to the Intent array within the Intent Schema section:

{
    "intent": "TodaysAbsencesIntent"
}

This makes our TodaysAbsenceIntent in the code available to our Skills utterences. Finally, I added an utterance in the Sample Utterances section:

TodaysAbsencesIntent who is out today

This is the the glue between the intent and the spoken word - when the Alexa Voice Service recognises ‘who is out today’, it tries to find the ‘TodaysAbsencesIntent’ in the Intent Schema and in turn the ‘TodaysAbsencesIntent’ handler in the lambda function.

Lesson : There are a couple of moving parts to wire up the function to the voice recognition, it’s simple enough to understand though. Using two development portals feels a little clumsy - It’d be nice if there was a way to jump between them instead of having to manually log into each respective console.

"How can I test the skill?"

I went to the test page of the developer portal and entered ‘who is out today’ into the utterances field and hit ‘Ask BrightHR’.

In the Service Request pane, I saw that the new intent was being triggered - it contained (amongst some other uninteresting things):

"request": {
    "type": "IntentRequest",
    "requestId": "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
    "intent": {
        "name": "TodaysAbsencesIntent",
        "slots": {}
        }
    }

In the Service Response pane, I saw the result of the Lambda function contained the response I wanted:

    "outputSpeech": {
                    "ssml": "<speak> It’s a full house! There are no absences </speak>",
                    "type": "SSML"
        }

The skill is working and I was able to test it without an Amazon Echo, Dot, Kindle or Fire TV!

Lesson: You can test the functionality of the skill using the test page in the development portal. This will confirm that the skill fires the correct intent if your speech has been successfully recognised. However, it won’t verify that Alexa actually understands your speech.

"What is this SSML tag?"

Alexa uses SSML (Speech Synthesis Markup Language), it’s similar to XML but for speech. Instead of having the function return plain text, you can include additional tags from the documentation here.

When the service for your skill returns a response to a user’s request, you provide text that the Alexa service converts to speech. Alexa automatically handles normal punctuation, such as pausing after a period, or speaking a sentence ending in a question mark as a question.

However, in some cases you may want additional control over how Alexa generates the speech from the text in your response. For example, you may want a longer pause within the speech, or you may want a string of digits read back as a standard telephone number. The Alexa Skills Kit provides this type of control with Speech Synthesis Markup Language (SSML) support."

Lesson: Rather than simply returning a plain text response, it's possible to have a little more control over how the responses are transformed to speech.

"Let’s make a REAL device talk!"

I’d confirmed that the code was being fired in response to the intent if it was typed, and that was cool. However, there was something that was bothering me at this point - the test was using an emulated device. If there is one thing I have learnt from developing software for devices rather than computers (I’ve written code for lots of different devices - from phones to microcontrollers). It is that there is no substitute for the real thing. It is essential that software is run on the intended platform and not only an emulator, I really can’t labor this point enough.

I grabbed an Amazon Echo and my phone. I installed the app on my phone and started it. After following the on screen prompts, the Echo and app were up and running. I made sure to set up the app with the same account that I was using for the Alexa skill. I could see that the skill was enabled by going tapping ‘skills’ on the menu and from there ‘my skills’. I could tell that the skill was under development as there was a little badge with read ‘dev uk’.

Lesson: Alexa apps which are in development are shown in the Alexa app. If not, go to the test page of the developer portal and hit enable at the top of the page and it will appear. I didn’t have to do this - it was already there, waiting for me to test it.

"Alexa pronounces some words strangely!"

I fired up the Echo and proudly said ‘Alexa, Ask BrightHR who is out today?’. After the blue ring flashed, Alexa began to speak. But there was a problem, the pronunciation was off. The word ‘absences' sounded more like ‘ab sawn sez'.

I could replicate the problem using the Voice Simulator in the Alexa Developer Console, too!

Remembering about the SSML syntax, I did a little digging around the documentation and found that I could use a <phoneme> tag.

"A phoneme (/ˈfoʊniːm/) is one of the units of sound (or gesture in the case of sign languages, see chereme) that distinguish one word from another in a particular language. In most dialects of English, the difference in meaning between the words kill /kɪl/ and kiss /kɪs/ is a result of the substitution of one phoneme, /l/, for another phoneme, /s/. Two words like this that differ in meaning through a contrast of a single phoneme form what is called a minimal pair." - wikipedia

In order to find an example of how to use phonemes to build up ‘absences’, I turned to google and searched for ‘define absences’. At the top of the results shows the pronunciation in the search results.

I built up a phoneme tag as shown in the documentation and ended up with

<phoneme alphabet="ipa" ph="ˈabs(ə)ns/(ə)s">absences</phoneme>

I used the Voice Simulator to verify that the word was pronounced as expected.

I suspect that this is likely not the only word I’ll encounter dubious pronunciation, I’ve created a simple key-value pair dictionary to hold phonemes.

{
    "absences":"ˈabs(ə)ns/(ə)s",
    "absence":"ˈabs(ə)ns/"
}

Each time I send text back to Alexa, I pass the text through a function that checks for words that need forced pronunciation.

Lesson : Always test spoken word using your ears, relying on the text alone is likely to lead to unexpected results. Alexa will sometimes say things in an unexpected way, you simply won’t be able to know unless you hear it. You can test your output in the voice simulator or a real device. When it comes to forcing a pronunciation, phonemes are easy to find on google - copy and paste!

"Let’s wire it up to an external API!"

The data used by Alexa comes from the BrightHR API. I used a node module to make the requests to the api and get some data back. The module I used makes it easy to make requests and get their responses back from the server.

I wrote my own module to wrap the call to api up and after ensuring it worked as needed, it was time to push it to AWS. This is where I learnt that the inline editor wasn’t going to work for me. The inline editor is fine if you only have a single file for all of your skills logic, but now I had a module I needed to reference. I zipped up all my code except for the node_modules folder, uploaded the package and tried my skill on the Echo.

It didn’t work.

I had expected AWS to update my dependencies for me by calling ‘npm install’ but it didn’t. After I had included the node_modules folder, the skill worked as expected.

Lesson: Include your node_modules folder inside your zip archive. AWS Lambda won’t run ‘npm install’ for you.

"I’m lazy, updating my skill is getting tedious!"

After making changes to the skill, it's necessary to upload the fresh version of the code to AWS. I pride myself in being "Lazy". I don’t want to waste my time carrying out repetitive tasks which can be automated away from me. I realise that this is only a case of zipping up the files and uploading the zip through the portal but it’s definitely an automation candidate.

In order to update the skill, the steps are:

run npm install
zip up all the files
log on to the AWS Lambda console
upload the zip

I installed and set up AWS CLI (Amazon Web Services Command Line Interface). This is a tool by amazon used specifically for interacting with AWS via the command line. You can read more about it here.

I wrote a script to update the dependencies and create the zip file. I was able to remove the need to log on to AWS and upload the file using the browser with this command:

aws lambda update-function-code \
                    --function-name alexaBrightHR \
                    --zip-file fileb://$filename \
                    --no-publish \
                    --no-dry-run \

All of the options should be fairly self explanatory:

function-name - the name of our function
zip-file fileb:// - the location of our zip file on disk
no-publish - don’t submit the skill for certification (this allows me to test before pushing the ‘submit for certification’ button)
no-dry-run - run the AWS CLI script. If you want to test the script only, you can use —dry-run here instead.

Lesson: Use AWSCLI to deploy your skill - It’ll save you precious time and keystrokes once its all set up. Be lazy and make a cuppa instead!

"I want to make my sentences dynamic"

I wanted to make my skill a little cleverer now, I wanted to ask "Alexa, ask BrightHR is Dave out on October 23rd?"

When using variables in sentences (in the example above, ‘Dave’ and ‘October 23rd’), you use ‘slots’, these are substituted by the Alexa Voice Service when they are sent to your Lambda function.

In the Interaction Model of the developer portal, I put a new utterance including the slots:

Alexa, is {employeeFirstName} out {absenceDate}

After reading about Amazon’s built slot types I settled on using AMAZON.GB_FIRST_NAME and AMAZON.DATE as these seemed the most appropriate ones to use.

AMAZON.GB_FIRST_NAME Provides recognition of thousands of popular first names commonly used by speakers in the United Kingdom.

The slot type recognises both formal names and nicknames. The name sent to your service matches the value spoken by the user. That is, the Alexa service does not attempt to convert from the nickname to the formal name.

For first names that sound alike, but are spelled differently, the Alexa service typically sends your service a single common form.

AMAZON.DATE converts words that represent dates into a date format.

The date is provided to your service in ISO-8601 date format. Note that the date your service receives in the slot can vary depending on the specific phrase uttered by the user:

Utterances that map to a specific date (such as "today", or "November twenty-fifth") convert to a complete date: 2015-11-25. Note that this defaults to dates on or after the current date (see below for more examples)
Utterances that map to just a specific week (such as "this week" or "next week"), convert a date indicating the week number: 2015-W49
Utterances that map to the weekend for a specific week (such as "this weekend") convert to a date indicating the week number and weekend: 2015-W49-WE.
Utterances that map to a month, but not a specific day (such as "next month", or "December") convert to a date with just the year and month: 2015-12.
Utterances that map to a year (such as "next year") convert to a date containing just the year: 2016.
Utterances that map to a decade convert to a date indicating the decade: 201X.
Utterances that map to a season (such as "next winter") convert to a date with the year and a season indicator: winter: WI, spring: SP, summer: SU, fall: FA)
The utterance "now" resolves to the indicator PRESENT_REF rather than a specific date or time.

I added a slots array inside the Intent Schema section.

{
  "slots": [
    {
      "name": "employeeFirstName",
      "type": "AMAZON.GB_FIRST_NAME"
    },
    {
      "name": "absenceDate",
      "type": "AMAZON.DATE"
    }
  ],
  "intent": "TodaysAbsencesIntent"
}

Now that the Interaction Model is complete, it’s time to the value of these slots in my function. I can get the values by reading the values from the event object.

The employee name is here:

this.event.request.intent.slots.employeeFirstName.value;

and the date is here:

this.event.request.intent.slots.absenceDate.value;

Lesson: Amazon provides builtin slot types and these are helpful. Sometimes dates don't fit the YYYY-MM-DD format. You will need to write code to handle these. You will probably want to say something different when working with week or month ranges rather than single dates or even ignore these completely.

"Alexa, I mean Jon, not John!"

We use names to look up absences in BrightHR’s api. As mentioned above, if Alexa hears "John", it sets the employeeFirstName value to "John". Including that silent "h" - that’s awesome, it’s spelt correctly. We can find anyone called John!

But what if you don't actually want that? What if it’s the short version of Jonathon? What about Lesley and Leslie?

I had to do a two phased filtering of the data which is returned from the API to get around this problem.

First I grab all the absences for the specific date, then I filter on names which sound like the name we have been given. We use a phonetics library to compare the names. It has a function that allows me to test whether two words sound alike.

Lesson: Double check the values that the Alexa Voice Service is giving you. Just because it sounds like a duck, it doesn't mean it actually IS a duck!

"Alexa, ask BrightHR to pick someone to make tea!"

At this point in the mission, I demonstrated the work I’d done to the rest of BrightHR to gather feedback. The feedback was good and we decided to put it to the hands of our users. There was one suggestion though, "Let’s see if we can add something fun in there too…"