My new project: Tact, a simple chat app.

Voicebot v2

February 26, 2019

One of my most popular recent posts has been about creating a simple voicebot to work with api.ai. After I wrote that post, a bunch of things have happened. Most notably, Google acquired api.ai and rebranded it as DialogFlow. They also deprecated v1 of their API which Voicebot was previously using, and it will be shut down in October 2019.

I have gotten emails from people who have found the Voicebot demo valuable, and I wanted to update it for the current state of the art. I’ve now updated the demo running at voicebot.jaanus.com, and the code remains available on Github. Here are a few notes from the update process. (Update March 2020: the online demo is no longer available. The code remains available as an educational resource.)

Google API-s

The most significant change was to authentication. In the previous api.ai world, you had a simple token that you could embed in your code. This was insecure for pure client-side apps like Voicebot, but I understood and accepted the risks and it made for a very simple demo.

In the new world, DialogFlow API-s are part of Google cloud world, and you must gain access to them with Google methods. I haven’t interacted with Google cloud API-s much before and I found this part to be not very approachable. There is a myriad of ways to access the same API-s, and different flavors are highlighted in different parts of the docs. Dialogflow docs are geared towards server-side apps and they have a lot of info about how to work with server keys and such, but I was interested in keeping a purely client-side app. I figured out I can authenticate with standard Google OAuth and call the DialogFlow API-s with Javascript after that, but I had to triangulate across many pieces of disjointed docs and examples to figure this part out.

This makes the user experience worse, as you must authenticate as the first (extra) step. I apologize for that. The app itself does not need any authentication, as I do not store or keep any personal data. It is necessary to be able to access the new DialogFlow API though. I could avoid it with some kind of server-side solution, but I’m not interested in building or running a server for this demo.

Request fulfillment

Dialogflow (api.ai) didn’t have any capability to run code on its own, so any of the external fulfillment had to happen by calling external webhooks. The request/response format for these hooks also changed in v2 API, so I had to rewrite my custom fulfillment example slightly. The logic and approach is largely the same.

These days, you can also write custom fulfillment code straight in Dialogflow to deploy as Firebase Cloud Function, but I didn’t want to change around my demo logic so much.

Voice recognition in browsers

I was surprised to find out that the voice recognition capability remains the same in 2019 as it was in 2017, with only Chrome having it. I was expecting to open the demo to more browsers, but alas, this isn’t happening yet.

One possible approach for supporting more browsers would be to use the new DialogFlow audio stream API-s. Previously, api.ai only accepted text. They now have API-s and capability to accept voice streams from browsers, do speech-to-text on the server side, and then respond to those requests. Most modern browsers have such voice streaming API-s available. Again, though, I don’t have too much time and effort for this demo, so I didn’t change the voice recognition part at all.