THE CHALLENGES OF EDGE AI
As AI applications grow in popularity, from time-saving robots to advanced content creation tools, the need for advanced computing resources to deliver on the promise of AI grows as well. The cloud can handle an incredible amount of processing; but the real interest is in increasing the intelligence potential of the devices around us: our cars, our smart speakers, our smartphones, and laptops. This is what is known as Edge AI.
There are two main challenges when it comes to AI at the edge. The first is giving the solution the performance it needs to deliver a good user experience. If you consider large language models, the performance question would be around how quickly the system can provide an answer to the user, both in terms of time to first token (the first words appearing) and then tokens per second (how quickly the answer appears after that).
Performance in an Edge AI system can be found in many ways. AI workloads are easily parallelised and benefit from being run on a highly parallel accelerator like a GPU or an NPU. The more operations per second (TOPS) that these accelerators can achieve, the higher the potential performance of the accelerator. At the edge, the challenge has been delivering this performance in an area and power budget consistent with the needs of the end device, many of which either can’t contain a massive SoC or the cost of that much silicon would make the device unviable in the market; others are battery constrained and also don’t have the ability to handle excess heat produced by hefty power consumption. So, while addressing theoretical TOPS performance is important, Edge AI hardware development also emphasises compute density – packing as many TOPS as possible into the same area.
The other area of focus when it comes to performance is efficiency. Real-world performance comes down to the ability of the processor to convert theoretical TOPS into actual work. Many processors are bottlenecked by poor data management with pipelines left idle as data is fetched or stored. Other processors do not maximise their utilisation by mapping models effectively onto the hardware. Low level libraries and back-end graph compilers can help address this challenge.
Read about how our processors deliver exceptional performance efficiency on the Imagination Blog.
The second challenge is power efficiency. As mentioned above, many Edge AI devices are battery operated, and excessive power consumption can quickly drain the battery or cause thermal throttling due to the production of excessive heat, which then detrimentally impacts performance and the user experience.
For Edge AI processors, memory management and data movement are the main sources of poor power consumption. A lot of data needs to be fetched to keep processor threads fed and working – and the more parallel the processor is the more data needs to move; and then the results of the operation need to be stored somewhere. The more reads / writes that a processor needs to get a result, and the further the data needs to travel, the more power heavy the application will be. The best Edge AI processors have highly optimised memory subsystems that limit data movement while maintaining processor utilisation. Additional techniques like data compression can help to reduce the impact of data moving around the system.
We are a leading designer of processor IP for edge applications, and its latest GPU contains numerous memory management advantages for the efficient handling of Edge AI. Find out more on the E-Series pages.