February 26, 2019
One of my most popular recent posts has been about creating a simple voicebot to work with api.ai. After I wrote that post, a bunch of things have happened. Most notably, Google acquired api.ai and rebranded it as DialogFlow. They also deprecated v1 of their API which Voicebot was previously using, and it will be shut down in October 2019.
I have gotten emails from people who have found the Voicebot demo valuable, and I wanted to update it for the current state of the art. I’ve now updated the demo running at voicebot.jaanus.com, and the code remains available on Github. Here are a few notes from the update process.
The most significant change was to authentication. In the previous api.ai world, you had a simple token that you could embed in your code. This was insecure for pure client-side apps like Voicebot, but I understood and accepted the risks and it made for a very simple demo.
This makes the user experience worse, as you must authenticate as the first (extra) step. I apologize for that. The app itself does not need any authentication, as I do not store or keep any personal data. It is necessary to be able to access the new DialogFlow API though. I could avoid it with some kind of server-side solution, but I’m not interested in building or running a server for this demo.
Dialogflow (api.ai) didn’t have any capability to run code on its own, so any of the external fulfillment had to happen by calling external webhooks. The request/response format for these hooks also changed in v2 API, so I had to rewrite my custom fulfillment example slightly. The logic and approach is largely the same.
These days, you can also write custom fulfillment code straight in Dialogflow to deploy as Firebase Cloud Function, but I didn’t want to change around my demo logic so much.
Voice recognition in browsers
I was surprised to find out that the voice recognition capability remains the same in 2019 as it was in 2017, with only Chrome having it. I was expecting to open the demo to more browsers, but alas, this isn’t happening yet.
One possible approach for supporting more browsers would be to use the new DialogFlow audio stream API-s. Previously, api.ai only accepted text. They now have API-s and capability to accept voice streams from browsers, do speech-to-text on the server side, and then respond to those requests. Most modern browsers have such voice streaming API-s available. Again, though, I don’t have too much time and effort for this demo, so I didn’t change the voice recognition part at all.