##Java to open first accordion##

COMMON EDGE AI SOFTWARE

For Edge AI, the software ecosystem separates into distinct areas

Device delegates: Providing the runtime environment and work dispatch mechanics for running multiple user models in parallel on the same hardware, this is very much like running graphics applications, where priority is given to foreground or visible tasks. PyTorch and LiteRT are examples of device delegate software.
Graph runtimes: Providing a mode data-centre-like model approach where you want to compile and execute an entire model into a single task for reasons of latency or performance requirement. ONNX and TVM are examples of graph runtime software.
Portability software: Allowing customers and users to easily migrate custom applications written for another GPU compute platform such as CUDA for ease-of-use purposes, providing that first user confidence of getting things running in hours not days. SYCL and oneAPI are examples of portability software.
Low-level software: Compute libraries and DDKs which support the major open hardware level abstractions for compute and graphics. The core interfaces of OpenCL, Vulkan, OpenGL and DirectX sit on top of a set of compilation and workload scheduling software that maps from the higher-level applications down into instructions which execution on the underlying compute platforms.

Let’s explore some examples of software solutions in these areas in more detail.

DEVICE DELEGATES: LiteRT

LiteRT delegates are a powerful feature of the LiteRT framework, designed to optimise and accelerate the execution of machine learning models on various hardware platforms. A delegate acts as an intermediary, enabling LiteRT to offload certain operations or entire models from the default CPU execution to specialised hardware accelerators, such as GPUs, DSPs (Digital Signal Processors), and NPUs (Neural Processing Units).

Delegates in LiteRT are utilised to enhance the performance of inference operations by taking advantage of specific hardware accelerators available on the device. They can significantly reduce inference time and increase efficiency by leveraging the hardware’s computational capabilities. LiteRT provides a range of delegates to choose from, allowing developers to select the most suitable hardware acceleration based on the device’s capabilities and the application’s requirements.

They have a GPU Delegate which accelerates model execution on the device’s GPU, offering significant speedups for operations that are parallelisable. This delegate is especially beneficial for models with high computational demands, such as those used in image processing and computer vision tasks. Beyond the standard delegates provided by LiteRT, developers have the option to create custom delegates for specialised hardware accelerators. This advanced feature allows for further optimisation and customisation for unique hardware platforms not directly supported by the predefined delegates.

Implementing a delegate in a LiteRT application involves minimal changes to the codebase. Developers can instantiate a delegate and apply it to the LiteRT interpreter with simple API calls. This flexibility makes it easy to experiment with different hardware accelerations to find the optimal configuration for a given model and device.

LiteRT delegates are a crucial component for optimising ML model inference on diverse hardware platforms. They enable developers to leverage device-specific accelerators, enhancing performance and efficiency across a wide range of applications and devices.

DEVICE DELEGATES: PyTorch (ExecuTorch)

An end-to-end solution for Edge AI inference capabilities across mobile and Edge AI devices including wearables, embedded devices, and microcontrollers. It enables efficient deployment of PyTorch models to edge devices.

ExecuTorch is compatible with a wide variety of computing platforms, from high-end mobile phones to highly constrained embedded systems and microcontrollers. It means that developers can use the same toolchains and SDK from PyTorch model authoring and conversion, to debugging and deployment to a wide variety of platforms. It provides end users with a seamless and high-performance experience due to a lightweight runtime and utilising full hardware capabilities such as CPUs, NPUs, and DSPs.

The basic flow for running a PyTorch model with ExecuTorch on an Edge AI device is to export the model, compile it into an executable format and then run this format on the target device. The compilation stage introduces optimisations like model compression and memory planning. ExecuTorch has a standardised interface for delegation to

compilers. This allows third-party vendors like Imagination to implement interfaces and API entry points for compilation and execution of (either partial or full) graphs targeting our hardware. This provides greater flexibility in terms of hardware support and performance optimisation, as well as easier integration with the PyTorch open-source ecosystem for Edge AI

DEVICE DELEGATES: DirectML

DirectML provides a powerful and flexible platform for integrating machine learning into applications on Windows. It is a low-level hardware-accelerated graphics API designed to provide high-performance machine learning (ML) inference. It operates on Windows 10 and newer versions, offering developers a consistent ML performance across lots of graphics hardware, including integrated GPUs, discrete GPUs, and other DirectX 12 compatible devices. DirectML works well with other DirectX 12 APIs, making it easier for developers to add ML features like post processing effects into games. It provides a flexible API that supports a broad range of models and operations, including those from popular frameworks like LiteRT and PyTorch outlined above. It can run on any DirectX 12 compatible device, meaning that developers can use DirectML to accelerate their applications without fear of vendor lock-in.

DEVICE DELEGATES: WinML

Windows Machine Learning (WinML) democratises access to Edge AI technologies, enabling developers to harness the power of machine learning directly within Windows applications. It is developer-friendly and high-level, abstracting away the complexities of running machine learning models.

WinML is a powerful machine learning framework introduced by Microsoft that leverages the hardware accelerated performance of modern Windows devices to run machine learning models more efficiently. It supports hardware-accelerated inference and optimises performance across a range of different devices by using the CPU, GPU, or dedicated AI processors. It relies on ONNX (Open Neural Network Exchange) format, an open model format that allows for interoperability across different frameworks and tools. So, developers building frameworks in e.g. LiteRT can convert these models to ONNX for use with WinML.

GRAPH RUNTIME: ONNX RUNTIME

Referenced above for WinML, ONNX is gaining traction with the developer community as a low resistance path for deploying AI models thanks to its integration with platforms like Huggingface. ONNX Runtime is a high-performance inference engine for machine learning models, optimised for both CPU and GPU, enabling fast model inference on servers or for Edge AI. It uses the special ONNX format; once models trained in frameworks like PyTorch are converted into the ONNX format they can use the ONNX runtime and be deployed across a wide range of devices and operating systems. It leverages parallel computing and hardware acceleration to achieve the best possible performance.

GRAPH RUNTIME: APACHE TVM

Apache TVM is a comprehensive, open-source machine learning compiler framework that enables the efficient deployment of deep learning models on different hardware platforms. It bridges the gap between the rapidly evolving landscape of deep learning models and the diverse ecosystem of computing hardware. It focuses on optimising and compiling models from higher level frameworks (like PyTorch) into machine executable code that is optimised for the specific target hardware. It allows for the addition of new hardware backends as the computing landscape evolves to maintain cross-platform support.

PORTABILITY: oneAPI

oneAPI is an initiative started by Intel and now run by the UXL Foundation (a project within the Linux Foundation) aimed at creating a unified and open programming model for developing applications that can run across various computing architectures, including CPUs, GPUs, FPGAs, and other accelerators. The project seeks to address the complexity of programming for diverse hardware by providing a single, coherent framework that abstracts hardware-specific details.

oneAPI consists of DPC++ programming language, toolkits targeting specific development domains, and a set of performance libraries and APIs. It facilitates cross-platform software development and builds on open standards, notably SYCL (more below) to ensure broad compatibility and encourage adoption by the developer community.

PORTABILITY: SYCL

SYCL is a high-level programming model designed to help developers write code for heterogeneous computing using standard C++. It is developed by the Khronos Group and aims to make parallel programming more accessible and efficient by leveraging different platforms like CPUs, GPUs, DSPs, and FPGAs using a unified framework. It provides abstractions for expressing parallelism and managing memory across different compute devices. SYCL code is portable and can run on any Edge AI device that supports the SYCL runtime.

INTERFACES: OpenCL

OpenCL is widely used in applications that require high performance computing and offers a powerful way to harness the computational power of GPUs and other processors alongside traditional CPU resources. It is an open standard for cross-platform, parallel programming of diverse processors found in Edge AI devices, using kernels that execute across many parallel work units and offering explicit memory management control. It provides a framework for writing programs that execute on a heterogeneous platform. Managed by the Khronos Group, it enables developers to write efficient and portable code for a wide range of Edge AI devices.

INTERFACES: VULKAN COMPUTE

Vulkan Compute offers a powerful and flexible platform for leveraging GPUs for non-graphics computational asks. It is part of the Vulkan API that operates across multiple platforms, enabling applications to run on a wide range of devices. It gives developers explicit control over GPU operations, memory management and synchronisation for finely tuned performance optimisations. Its cross-platform nature and explicit control over GPU resources make it a compelling choice for developers looking to optimise the performance of compute-intensive Edge AI applications.

What is Edge AI?

Edge AI vs Cloud AI

The Best Processors for Edge AI

The Challenges of Edge AI

What Software does Edge AI Need?

Where is Edge AI Used?

Why is Edge AI Growing So Fast?

News:

Imagination Announces E-Series: A New Era of On-Device AI and Graphics

Resources:

A New Era of Edge AI: Introducing Imagination E-Series GPU IP

Getting Real about AI Processors: White Paper

Cookie	Duration	Description
__hssrc	session	This cookie is set by Hubspot whenever it changes the session cookie. The __hssrc cookie set to 1 indicates that the user has restarted the browser, and if the cookie does not exist, it is assumed to be a new session.
_cfuvid	session	Calendly sets this cookie to track users across sessions to optimize user experience by maintaining session consistency and providing personalized services
_GRECAPTCHA	5 months 27 days	This cookie is set by the Google recaptcha service to identify bots to protect the website against malicious spam attacks.
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
CookieLawInfoConsent	1 year	Records the default button state of the corresponding category & the status of CCPA. It works only in coordination with the primary cookie.
rc::a	never	This cookie is set by the Google recaptcha service to identify bots to protect the website against malicious spam attacks.
rc::b	session	This cookie is set by the Google recaptcha service to identify bots to protect the website against malicious spam attacks.
rc::c	session	This cookie is set by the Google recaptcha service to identify bots to protect the website against malicious spam attacks.
rc::f	never	This cookie is set by the Google recaptcha service to identify bots to protect the website against malicious spam attacks.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
__cf_bm	30 minutes	This cookie, set by Cloudflare, is used to support Cloudflare Bot Management.
__hssc	30 minutes	HubSpot sets this cookie to keep track of sessions and to determine if HubSpot should increment the session number and timestamps in the __hstc cookie.
bcookie	2 years	LinkedIn sets this cookie from LinkedIn share buttons and ad tags to recognize browser ID.
bscookie	2 years	LinkedIn sets this cookie to store performed actions on the website.
lang	session	LinkedIn sets this cookie to remember a user's language setting.
lidc	1 day	LinkedIn sets the lidc cookie to facilitate data center selection.
UserMatchHistory	1 month	LinkedIn sets this cookie for LinkedIn Ads ID syncing.
yt-remote-cast-installed	session	The yt-remote-cast-installed cookie is used to store the user's video player preferences using embedded YouTube video.
yt-remote-fast-check-period	session	The yt-remote-fast-check-period cookie is used by YouTube to store the user's video player preferences for embedded YouTube videos.
yt-remote-session-app	session	The yt-remote-session-app cookie is used by YouTube to store user preferences and information about the interface of the embedded YouTube video player.
yt-remote-session-name	session	The yt-remote-session-name cookie is used by YouTube to store the user's video player preferences using embedded YouTube video.
ytidb::LAST_RESULT_ENTRY_KEY	never	The cookie ytidb::LAST_RESULT_ENTRY_KEY is used by YouTube to store the last search result entry that was clicked by the user. This information is used to improve the user experience by providing more relevant search results in the future.

Cookie	Duration	Description
__hstc	1 year 24 days	This is the main cookie set by Hubspot, for tracking visitors. It contains the domain, initial timestamp (first visit), last timestamp (last visit), current timestamp (this visit), and session number (increments for each subsequent session).
_clck	1 year	Microsoft Clarity sets this cookie to retain the browser's Clarity User ID and settings exclusive to that website. This guarantees that actions taken during subsequent visits to the same website will be linked to the same user ID.
_clsk	1 day	Microsoft Clarity sets this cookie to store and consolidate a user's pageviews into a single session recording.
_ga	1 year 1 month 4 days	Google Analytics sets this cookie to calculate visitor, session and campaign data and track site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognise unique visitors.
_ga_*	1 year 1 month 4 days	Google Analytics sets this cookie to store and count page views.
_gcl_au	3 months	Provided by Google Tag Manager to experiment advertisement efficiency of websites using their services.
CLID	1 year	Microsoft Clarity set this cookie to store information about how visitors interact with the website. The cookie helps to provide an analysis report. The data collection includes the number of visitors, where they visit the website, and the pages visited.
CONSENT	2 years	YouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.
hubspotutk	1 year 24 days	HubSpot sets this cookie to keep track of the visitors to the website. This cookie is passed to HubSpot on form submission and used when deduplicating contacts.
MR	7 days	This cookie, set by Bing, is used to collect user information for analytics purposes.
SM	session	Microsoft Clarity cookie set this cookie for synchronizing the MUID across Microsoft domains.

Cookie	Duration	Description
_fbp	3 months	This cookie is set by Facebook to display advertisements when either on Facebook or on a digital platform powered by Facebook advertising, after visiting the website.
ANONCHK	10 minutes	The ANONCHK cookie, set by Bing, is used to store a user's session ID and verify ads' clicks on the Bing search engine. The cookie helps in reporting and personalization as well.
fr	3 months	Facebook sets this cookie to show relevant advertisements to users by tracking user behaviour across the web, on sites that have Facebook pixel or Facebook social plugin.
MUID	1 year 24 days	Bing sets this cookie to recognise unique web browsers visiting Microsoft sites. This cookie is used for advertising, site analytics, and other operations.
test_cookie	15 minutes	The test_cookie is set by doubleclick.net and is used to determine if the user's browser supports cookies.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
VISITOR_PRIVACY_METADATA	6 months	YouTube sets this cookie to store the user's cookie consent state for the current domain.
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt.innertube::nextId	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.

Cookie	Duration	Description
__Secure-ROLLOUT_TOKEN	6 months	Description is currently not available.
AnalyticsSyncHistory	1 month	No description
ar_debug	session	Description is currently not available.
LanguageCode	session	No description available.
li_gc	2 years	No description