Speech recognition has become more relevant in recent years: it enables computers to translate spoken language into text. It can be found in different types of applications, such as translators or closed captioning. An example of this technology is Mozilla’s DeepSpeech, an open-source speech-to-text engine, which uses a model trained by machine learning techniques based on Baidu’s Deep Speech research paper. We are going to provide an overview of how we are running version 0.5.1 of this model, by accelerating a static LSTM network on the Imagination neural network accelerator (NNA), with the goal of creating a prototype of a voice assistant for an automotive use case.
Jesus Garza is an AI Applications Engineer in the demo development team at Imagination Technologies. He focuses on working with various types of neural networks and producing demos to illustrate the capabilities of Imagination’s AI tools and accelerators.