building nelly - for some reason we don't have an alt tag here

Building Nelly, my DIY voice assistant for Android

May 29, 2012 Andreas Ødegård 35 Comments Tasker

Few things in the mobile technology space has been as rewarding and useful as the time I spent learning how to use Tasker. My phone is so different now than it was a month ago, in a good way. From barely being able to make the speakerphone turn on while the phone is flat on a table during a call, I’ve trial and errored my way to where I have a sleep mode that can even turn off my PC monitors remotely. With the ability to make basically anything I want in Tasker, the big question was what to make.

I’ve been wishing for a good voice assistant for a while, but have been disappointed at the accuracy, capability, and overall design of the ones available on Android – and iOS for that matter, Siri hasn’t exactly impressed me.

The name “Nelly”, and the choice of voice for her

Voice assistants often have personalities, so why should mine be any different? Nelly is a reference that I doubt anyone will pick up on without an explanation, because it’s a reference to a science fiction book series that is unfortunately not as popular as Harry Potter. Written by Mike Shepherd (Mike Moscoe), the series of books about the space heroine Kris Longknife features a range of original characters, where one of my favorites is Nelly, the main character’s personal computer. Featuring the best hardware money can buy, Nelly’s abilities are far beyond even many full sized computers in the Kris Longknife universe. Later in the series she even has kids, which happens by borrowing her owner’s credit card to go shopping for parts to build more of herself. Point being, it’s an awesome personal computer, just what I want my Nelly to be.

As for the voice, well, I started the book series back when I consumed audio books left and right while working, and have continued on with the series in audio format ever since. The narrator, Dina Pearlman, has a voice she uses for Nelly that is very consistent throughout the books. I wish I could say it was on purpose, but the truth is that the similarity of that voice to IVONA’s Amy British English TTS (Text To Speech) engine (which I had already installed before thinking of my Nelly) is entirely coincidental. That doesn’t mean I get any less satisfaction out of Nelly sounding like Nelly of course. IVONA has some awesome TTS engines, and I have to say that its Amy engine puts anything else I’ve tried to shame. Siri sounds like a robot in comparison, and that’s with Amy still being in beta!

How Nelly works

Tasker is able to tap into Android’s speech recognition system through its Get Voice action. This pops up a voice input window to indicate it listening, and then stores the converted text in a variable named %VOICE. If you say hello world, %VOICE‘s value will be hello world. Pretty much all of Tasker’s other actions can use an If condition, which means that they will only run if a certain condition is fulfilled. These conditions use variables and various types of math to control whether an action can run.

For Nelly, most of her responses are very simple: Use Tasker’s Say action and the IVONA Amy TTS engine to speak a specified text If %VOICE matches *trigger*. The trigger depends on what I want it to react to, and the asterixis are there when the term can be part of a much longer phrase. For instance, the trigger for Nelly’s answer to what the best smartphone is uses a very simple trigger: *best smartphone*. While this means that in theory it would be fooled by asking “what is not the best smartphone”, keeping things simple also means that it’s less picky about what you ask it. Sup dawg, so, yeah, um, I was kinda just wondering, you know, what be the best smartphone would work just as well as what is the best smartphone.

For a voice assistant that is custom made for a single person, me, this is definitely the way to go. I know the triggers, which means that I know how to use them correctly. I can however make them as specific or non-specific as I want, should there ever be a problem.

Using the If system, Nelly goes through a list of potential actions and checks if they apply for the current situation based on what I just told it. Since it won’t do anything if it can’t find any matches, I naturally won’t have to cover every single possibility out there.

Nelly’s features

This is a list of all the features – or responses if you will – that Nelly has. Anything you see here can be triggered with voice input by pressing the invisible Nelly activation icon I have on my home screen, and the voice input box will pop up as an overlay to the home screen – so no need to enter any apps.

Best smartphone

Already covered this one, but basically, the trigger is best smartphone and that trigger is used in two other actions: a Say action to give the response, and a Browse Url action that opens Pocketables once the spoken response finishes.

Google search

In order to use Nelly to search Google, I made a separate task called Nelly search. This task contains four actions, and what’s special about them is that none of them have any If triggers. Instead, the main Nelly task has a Perform Task action that is tied to a *find something* trigger. This means that if I tell Nelly anything that contains find something, it then starts the separate task. The reason for this is to tie several independent actions to a single trigger, as well as be able to overwrite %VOICE without that affecting the following tasks’ ability to trigger.

The first action in the Nelly search task is a normal Say, with Sure, what do you need as the spoken text. Next you have a new Get Voice action, which then overwrites the original %VOICE that was created when I first asked Nelly to find something. Then there’s a new Say, this time with Here you go. If this doesn’t do it for you, blame google. Finally, there’s a Browse Url action with http://www.google.com/search?q=%VOICE as the URL.

What happens in practice when I ask Nelly to find something is that she replies Sure, what do you need, then records a response, replies Here you go. If this doesn’t do it for you, blame Google, and then does a Google search for whatever I told her to search for. This will bring up a normal Google search results page in the browser.

Sleep Mode on/off

My Sleep Mode Tasker profile is a rather complicated set of actions in itself, and this enables me to enable and disable Sleep Mode using Nelly. The triggers are *night* and *morning*,which will use the Set Variable action to set %Sleepmode to on or off respectively. These triggers are rather generic, but I’m unlikely to use them elsewhere (if so I can change things around), and keeping them simple means that Nelly will both respond to variations like night night, Nelly and be less sensitive to sleepy mumbling.

The actual sleep mode itself is an independent profile outside of the task that Nelly runs in. Its context – the way the entire profile triggers – is simply Variable Value: %Sleepmode matches on. This makes the profile active when I’ve told Nelly to make it active, and not active when I’ve told Nelly to deactivate it.

The profile has both an enter and an exit task, meaning a task that is run when it’s turned on, and one that is run when it turns off. The enter task starts off by running a separate task called Screen Off. Screen Off isn’t a complicated task in terms of number of actions, instead the reason for having it as a separate task and use Perform Task is to quickly access it from other tasks as well. What it actually does is append the date to a file called sleepmode.txt, which is then synced automatically using Dropsync. On my computer, I use a program called RoboTask to monitor the Dropbox folder that the sleepmode.txt is synced to, and trigger its own task when it’s changed. The task it triggers is to run a UI-less .exe file called nircmd, and do so with the screen off parameter. In practice, my phone creates a file that my PC reads and uses to turn my two monitors off. I tell Nelly to activate sleep mode, and about 15 seconds later my PC monitors go black.

The next two actions maximize alarm volume and sets screen brightness to 20, about 10% of max. It then sets the variable %Lastsleep to SM is on, and runs a plugin that parses the value of that variable to Make Your Clock Widget. In practice, when I activate sleep mode, my homescreen clock widget displays SM is on. Next it sets %Smactivation to %TIMES, where %TIMES is the current date and time in seconds – the only way to really make date and time compatible with normal math, as you can’t simply ask a calculator to subtract June 20th 2011 1:43 AM from July 30th 2012 3:10 PM. You can however make one subtract a large number of seconds that corresponds to the first date from an even larger number of seconds that correspons to the second date, and then convert it back. Think of it as a real life example of stardates from Star Trek…or something like that.

Next up is the actual response. These tasks run so quickly that the delay is unnoticeable, with the exception of the Say action, which has to wait for the TTS engine to actually say the text before continuing. That’s why the response is so far down the list of actions, allowing the process of turning off the monitor to run while Nelly is talking. The actual response is simple: Sleep mode activated. Good night.

The next action writes Andreas has been sleeping since %TIME to status.txt, a file that is synced to the web using Dropsync and available to friends and family who can then quickly check if I’m asleep, home, or away (other profiles change this file too), and for how long I’ve been sleeping.

Finally, there’s a 15 second wait, and then it triggers a complete Dropsync sync session, rather than the partial ones triggered by monitoring file changes. This is both a way to make sure I end the day with a full sync, and a way to make sure that the status.txt and sleepmode.txt files are really synced, even if I turn off my screen and mess up the file monitor syncs. Waiting for this complete sync to finish is why the sleep mode system seems slow in the video above, as sleep mode isn’t designed to be turned off right after being turned on.

The exit task is basically reversed. It writes Andreas is home to the status.txt file, writes to a wakeup.txt file that wakes my computer, and updates the widget to display LS %TIME, meaning the time it was deactivated and in turn the time I woke up. It also creates a variable called %Smduration that subtracts the %Smactivation variable written in the enter task from the current time in seconds, which means the result is how long sleep mode was active, in seconds. It also divides this by 3600 to get hours instead of seconds.

If %Smduration is a number greater than 9 (hours), it writes You lazy bastard to %Lazy. If %Smduration is lower than 9, it will simply retain its original value, which is a space. There’s also a Set Variable action at the very end of the exit task that sets %Lazy to a space so that the default for next time is a space even if %Smduration was higher than 9 this time.

The Say duration utilizes data from two of the variables that were just created. The Say text is Good morning. You slept for %Smduration hours. %Lazy. %Smduration is the time sleep mode was active, and %Lazy is either nothing or You lazy bastard depending on the value of %Smduration. %Lazy needs to default to a space rather than nothing, otherwise the TTS engine will actually read the word %Lazy (as in “percentage lazy”) if %Smduration is lower than 9. In practice, this results in two types of responses based on whether or not I slept for more than 9 hours. Examples are Good morning. You slept for 5.443 hours and Good morning. You slept for 9.724 hours. You lazy bastard. Just a little bit of an automated pep talk if I sleep for too long.

The answer to life, the universe, and everything

There are quite a few questions that people always end up asking voice assistants, and what is the answer to life, the universe, and everything is a classic. It refers to the brilliant book series The Hitch Hiker’s Guide to the Galaxy. The answer according to the book is 42, which is what you’ll get from most voice assistants. That means that mine can’t give such a “boring” reply though, so I set it to instead reply Ask the dolphins. Read the books to find out why. As for the trigger, *life*universe*everything* handles that nicely.

Am I in danger?

Another novelty response like the one above, but much, much less common. Since my Nelly is from the Kris Longknife series, I wanted something that actually hints to that, even if I’m probably the only one to ever use my Nelly that would understand the reference. Anyways, the trigger is am I in danger, and the response is Not unless there’s a Longknife nearby. Again, read the books to understand why ;)

Universal fixing substance

Another internal joke for readers of science fiction books. If asked to give the “universal fixing substance”, Nelly will reply “That’s the wrong book series, you moron. There aren’t any spiderwolves here”. References this book series.

Ask the boffins

Another actually useful feature, but with a twist from the book series Nelly is from. Boffin is a British slang term for scientist, and is used in the books to describe the scientists on board. The trigger here is *have a question*/*boston*, which means it will trigger on any mention of have a question or Boston. Why Boston? That’s what the speech to text system thinks I say when I say boffins. When presented with the pronunciation of the word from the Oxford dictionary iPad app I have, it instead heard it as office. Either way, I couldn’t get it to recognize boffin, so I adapted. For the record, if you have trouble making triggers work, you can create a new task with two actions: Get Voice, and Alert – Flash with %VOICE as text. This makes a message pop up on screen for a few seconds with the text that the voice recognition system thought you said after you read it in, and you can trigger the task manually when you need to test something. In my case, I wanted it to trigger on I have a question for the boffins, so if it consistently thinks I ask for Boston, the result is the same.

As for what this triggers, well, it’s another case of Perform Task where the actions are in a separate task. First Nelly replies Let me know what it is and I’ll pass it along, and then you have another Get Voice. I then use Variable set %Wolfram to %VOICE to get a custom variable that I can work with in ways that Tasker won’t allow you to work with built in ones. The %Wolfram variable is then split using Variable Split, and then Variable Join puts it back together with + as the joiner. This is to convert it to a format that Wolfram Alpha can read, specifically %Wolfram now looks something like word1+word2+word3 instead of word1 word2 word3,

Nelly then says The surviving boffins asked me to give you this, a reminder of the events of the latest book. The final action is Browse Url, with http://www.wolframalpha.com/input/?i=%Wolfram as the URL. That brings up a search result page on Wolfram Alpha, and since WA is a mathematical search engine that can do calculations this way, it truly does ask the scientists. If you don’t go via the Split and Join system, any multi-word search phrase will only have the first word actually make it to WA’s search box.

Norwegian Yellow Pages

If I need to find any business or similar things nearby, nothing based on English is going to help me. I need to search a Norwegian website with Norwegian text, so an English voice recognition system will be less than useless. As such, my yellow pages search trigger and action is much simpler than you’d think: trigger on *yellow pages* and browse the URL. Then I enter whatever I need manually, because that’s frankly the only way to do it with the language difference.

Music

Another simple one. A separate Task has Load App: Poweramp and Music Control: Toggle Pause as actions. A Perform Task with *music* as trigger in the Nelly task then starts Poweramp and starts playback when the word music is heard.

Notification note

The final feature in this first version of Nelly lets you create a note in the notification menu of your device by using your voice. It’s a separate task that is triggered in the Nelly task with *a note*, e.g. Add a note or Nelly, I need you to make a note of something for me. The separate task starts out with Say: What do you want it to say. Then it uses Get Voice to get the content of the note, does a Say: Done for good measure, and then uses the Notify action under Alerts with %VOICE as name. It also has a note pad as the notification icon just for good measure. I already have a fairly complex todo system set up on my device, and if I find a way to control that with Tasker I will add that to Nelly too, but for now this is sort of a “quick note” system for something that I need to do ASAP and so having a notification to remind whenever I look at the device works well.

In conclusion

There are clear advantages to making your own voice assistant, as well as disadvantages. While you’re stuck having to do everything yourself, and in many ways are limited compared to what you can do with true app programming, you only need one feature that you’ll actually use to make it worthwhile to skip 10 that you probably won’t. For Nelly, sleep mode and notification note are the two features you’ll unlikely to find in other voice assistants, aside from all the comic relief responses. Even so, being able to activate those using voice isn’t really 100% necessary, as it could be just as easily done with buttons, and in case of the notification note, a text input field. A voice assistant really becomes the most useful when it reads incoming messages, lets you dictate outgoing ones, and do it all in one go. I could easily program Nelly to do that if I wanted to, as it’s just a matter of transferring %VOICE to variables which go into Send SMS actions and so on. Problem is I don’t use SMS very often, and most definitely not in English.

Point being that you shouldn’t look at this article as something to be duplicated, but rather as an idea to be adapted for other uses, and more importantly: custom uses. I can tell my phone “good night” and it turns off my computer monitors for me. Try doing that with Siri, which won’t even let you look for a replacement phone without overriding web searches with propaganda. To send you off, here’s an undocumented Nelly easter egg for Mass Effect fans:

Pocketables does not accept targeted advertising, phony guest posts, paid reviews, etc. Help us keep this way with support on Patreon!