Embedded Vision Summit 2026
The premier event for engineers designing edge AI and vision-related solutions for embedded devices.
Event Details:
11 May, 2026
Santa Clara Convention Centre
IMAGINATION AT EMBEDDED VISION SUMMIT
The Embedded Vision Summit 2026, taking place May 11–13 in Santa Clara, California, is a premier industry event focused on the latest advancements in embedded vision, edge AI, and perception technologies. The Summit features more than 90 sessions spanning tutorials, case studies, panel discussions, and leading-edge research, along with keynotes from industry and academic thought leaders.
With the industry accelerating toward increasingly sophisticated edge inference and complex vision workloads, the event mirrors Imagination’s commitment to pushing the boundaries of embedded AI and visual processing.
James Imber, Director of Research at Imagination Technologies, will present “Self‑Compression for Edge Inference” on Monday, May 11 at 13:30. James brings 14 years of deep expertise across network compression, embedded AI inference, and algorithm design for resource‑constrained systems to the Summit. His session will explore innovative techniques that allow neural networks to compress themselves, enabling more efficient deployment on edge devices.
He will be joined by David Doyle, VP of Sales, who is available for meetings at the show – click the button below to get booked in.
Self-Compression for Edge Inference
Monday 11 May, 1:30PM
Self-compression is a quantization-aware training technique to reduce neural network size and optimize performance for edge inference. By learning optimal bit depths for weights and activations during training, self-compression achieves significant reductions in memory footprint and bandwidth consumption while maintaining accuracy. The method employs high sparsity alongside low-bit representations, enabling efficient deployment on CPUs, GPUs, DSPs and NPUs without specialized hardware.
Unlike traditional compression approaches, self-compression removes redundant weights and minimizes bits required for remaining parameters. Experiments demonstrate floating-point accuracy across applications including perception CNNs (as few as 3% of the original bits and 18% of weights retained) and LLMs (outperforming ternary compression in transformer-based language models). In this presentation, we explain how self-compression works, its practical implementation and real-world benefits for embedded systems, offering a simple yet powerful solution to reduce inference costs (execution time, power consumption, bandwidth and memory usage).

James Imber, Director of Research, Imagination Technologies
Dr. James Imber is Director of Research at Imagination Technologies, with 14 years of experience in the semiconductor IP industry. His team’s work spans network compression, embedded AI inference and algorithm design for resource-constrained systems. James has expertise in neural graphics, quantization-aware training, edge perception, numerical optimization and classical computer vision. Prior to his current role, he worked extensively with NPUs, ISPs and GPUs, driving innovation in embedded AI and advanced imaging technologies. James holds a PhD from the University of Surrey’s Centre for Vision, Speech and Signal Processing.