Understanding PowerVR Series5XT: Multithreading, multitasking ALUs, the MicroKernel and core scalability

Share on linkedin
Share on twitter
Share on facebook
Share on reddit
Share on digg
Share on email

A unified architecture means that the ALUs (Arithmetic Logic Units – the bits of the hardware that do all the number crunching as part of shader or compute program processing) are generic and not dedicated to specific processing tasks.

This type of ‘universal (or unified) shading’ approach is unanimously recognised as the most efficient. Unified shaders execute tasks efficiently, irrespective of the balance between vertex and pixel (or compute) processing tasks, by always using all the ALU resource available.

Typically these ALU designs in GPUs are multithreaded to help hide memory latency (the delay between the hardware requesting data for processing and the data actually becoming available for processing) where threads are different vertices, pixels or compute elements. By running over many elements, the hope is that by the time you get back to the first element, the data it needs will have become available.

Multithreading and multitasking ALUs for efficiency

Imagination takes the efficiency concept to the next level by not only allowing multiple ‘threads,’ but also by allowing these threads to belong to different ‘tasks’. What this means is that latency is not only absorbed by having lots of pixels or vertices or compute operations in-flight (threads of a single task), but by actually allowing all of these multiple ‘tasks’ to be operated upon based on data availability.

By enabling multitasking, the hiding of latency and efficiency of processing become far more effective than by simply having lots of elements devoted to the same single task. Independent tasks have a higher chance of having data available for processing than just a single task (basically if data goes missing for one element, it’s likely it goes missing for all elements of that same task). With independent tasks, this condition is avoided.

Some vendors falsely claim that non-unified processing is more power–efficient. Their flawed argument in support of this claim is based around the ability to turn vertex processing blocks off when geometry processing is completed. This reasoning ignores the simple fact that unified designs can also turn off fixed function processing blocks related to geometry processing. And the ALUs in such a unified design ultimately speed up pixel processing, which thus completes more quickly and allows the complete GPU core to be turned off sooner than when using separate processing units.

This efficiency concept is illustrated below:

Dedicated Shader vs Unified Architecture Modules-1024x569Dedicated shader processing modules (competing GPUs) vs. unified shader processing modules (PowerVR GPUs)

Multi-pipeline first, multi-core last

Scalability in GPU IP design is extremely important, as there is no one-size-fits-all scheme for the range of devices we need to support, from low-cost mobile phone requirements (with maybe a VGA screen) to the high-end set-top box market where gaming on a 4Kx2K screen is on the key feature requirements list.

Driven by efficiency, Imagination has focussed first on pipeline scalability in our PowerVR Series5/5XT cores. By scaling pipelines, the design scales up only the processing blocks which deliver extra ALU (GFLOPS) and texturing (Pixel Drawing) abilities, thus ensuring that support logic is not blindly increased.

PowerVR USSE multithreading scheduling

The reasons behind this approach become clear when you consider multi-core scaling. With multi-core, a whole core design is copied across multiple instances. This scales absolutely everything, and is very inefficient since all support logic is copied. With multi-core scaling, silicon area investment basically doubles to achieve double the performance, whereas with pipeline scaling, it’s possible to double performance without having to double the silicon area invested.

This concept is illustrated below:

PowerVR Series5XT scalability

Imagination’s approach to scalability: multithreading to multi-pipeline to multi-core

It’s worth noting that this approach results in GPU cores with different levels of performance, as we scale the number of processing pipelines. Our Series5/5XT cores are ‘not equal’ amongst each other, as we have some cores with less performance and other cores with more performance, as they have more or less pipelines. This is important to understand when wading through the marketing hype you might see claiming that ‘higher numbers of GPU cores must be better’.

PowerVR SGX Series5XT Multicore

Keep in mind that the performance offered by a single PowerVR SGX544 core is universally far higher than the performance offered by any of our competitors ‘cores’ across GFLOPS and fillrate, as well as benchmark performance.

MicroKernel firmware flexibility

In an earlier article about Series5XT’s market leading feature set, I explained the importance of keeping cores in line with market requirements, but equally balancing this effort with other factors such as power consumption.

A key hardware feature which sits at the basis of our capabilities in this area is our PowerVR GPU’s MicroKernel based operation. The MicroKernel is software firmware which controls the GPU operation and data flow. This has two key benefits: first, the GPU becomes truly autonomous and events are handled locally rather than having to keep the main system CPU busy through costly interrupts. The second benefit is that this is a software-based approach and not a rigid hardware based state machine. This has given us the flexibility to adapt to new market requirements over time.

A nice example of the capability of the MicroKernel and how it has allowed us to deliver better user experiences over time is support for Priority Based Rendering. Today’s Android devices run every pixel on the device through the GPU, and this means that there are multiple concurrent applications rendering. It also means that all of these applications are also being composited by the Android Surface Flinger generating the actual Android user interface on a phone’s display.

PowerVR MicroKernel

All of this is done through the GPU using OpenGL ES API interfaces. Simplistic competitive implementations handle all of these concurrent renders through software scheduling, rendering each application, and compositing passes sequentially. This means that the GUI smoothness can easily degrade when badly-behaved or excessively complex applications run in the background. Through the Microkernel firmware, we can assign priorities to different render contexts, and this allows the firmware to influence the scheduling of tasks on our GPU to give higher priority to the compositor tasks. This leads to an always-smooth GUI experience even when expensive tasks run in the background. With a fixed state machine design, we would never have been able to adapt. Furthermore, offering a solid, smooth, fluid user experience would have been far more complex and taxing, possibly even forcing the GPU to become managed by the CPU to work around fixed state machine issues and limitations.

The usage of the Microkernel firmware not only provides ultimate flexibility, it also offers low power consumption. This is extremely important, since one of the biggest consumers of power in today’s SoCs is the CPU, and anything which can be moved away from the CPU is a good thing to help keep its power consumption in check.

In the next blog post, I will discuss how PowerVR SGX supports texture compression and why PVRTC and PVRTC2 are vital not just for reducing application size but for keeping power consumption and memory traffic at a minimum.

If you have any questions or feedback about Imagination’s graphics IP, please use the comments box below. To keep up to date with the latest developments on PowerVR, follow us on Twitter (@GPUCompute, @PowerVRInsider and @ImaginationTech) and subscribe to our blog feed.

‘Understanding PowerVR’ is an on-going, multi-part series of blog posts from Kristof Beets, Imagination’s Senior Business Development Manager for PowerVR. These articles not only focus on the features that make PowerVR GPUs great, but also provide a detailed look at graphics hardware architectures and software ecosystems in mobile markets.

If you’ve missed any of the posts, here are some backlinks:

Kristof Beets

Kristof Beets

Kristof Beets is the senior director of product management for PowerVR Graphics at Imagination Technologies where he drives the product roadmaps to ensure alignment with market requirements. Prior to this, he was part of the business development group and before this, he led the in-house demo development team and the competitive analysis team. His engineering background includes work on SDKs and tools for both PC and mobile products as a member of the PowerVR Developer Relations Team. His work has been published in many guides game and graphics programmers, such as Shader X2, X5 and X6, ARM IQ Magazine, and online by the Khronos Group, Beyond3D.com and 3Dfx Interactive. Kristof has a background in electrical engineering and received a Master's degree in artificial intelligence. He has spoken at GDC, SIGGRAPH, Embedded Technology, MWC and many other conferences.

3 thoughts on “Understanding PowerVR Series5XT: Multithreading, multitasking ALUs, the MicroKernel and core scalability”

  1. PVRTC2 produces good results for the bits used. Unlike the S3TC/block-encoders, it doesn’t suffer from the same edge disparities ( gradient chunking ) as others and needs a smaller border for our Texture Atlases. However, PVR is only usable on a small number of ‘fixed, shipping’ images. The lack of a run-time compression approach is a problem for us as our apps use a lot of ‘user-acquired’ images and we do not have source code, a dynamic-library or a ‘shipping executable’ to run on iOS (client target). Many of us coders have asked for this capability since Imagination first started offering the compression. Imagination seems to have good developer relations but i’m curious why this hasn’t been addressed. ( Note: i was an early ‘PowerVR’ developer on the PC in the 90s w/ the Kyro series and tried to lobby Joe Kreiner to support a mobile GLES compression feature several years ago ). Any thoughts? Thanks…and good luck+

    • Hi Jake,
      In the interest of quickly getting replies from the guys behind PVRTC, could you please repost your question on our forum?
      Our developer technology and R&D experts will be there to answer your questions.


Please leave a comment below

Comment policy: We love comments and appreciate the time that readers spend to share ideas and give feedback. However, all comments are manually moderated and those deemed to be spam or solely promotional will be deleted. We respect your privacy and will not publish your personal details.

Blog Contact

If you have any enquiries regarding any of our blog posts, please contact:

United Kingdom

[email protected]
Tel: +44 (0)1923 260 511

Search by Tag

Search by Author

Related blog articles

Apple M1 image

Why you should be running your iOS apps on your new Apple M1 laptop

Towards the end of last year, Apple released the latest version of its Apple MacBook Pro and Macbook Air laptops. This release was notable as with these brand-new laptops Apple made a significant change – the processor inside was based on its own M1 chip rather than the Intel architecture that it had been using since 2006. Since its release, the Apple M1 has been widely hailed for its performance, outstripping Intel in all the major benchmarks and all in a cool, quiet package with low power consumption.

Read More »
android background

The Android Invasion: Imagination GPU IP buddies up with Google-powered devices

Google Android continues to have the lion share of the mobile market, powering around 75% of all smartphones and tablets, making it the most used operating system in the world. Imagination’s PowerVR architecture-based IP and the Android OS are bedfellows, with a host of devices based on Android coming to market all the time. Here we list a few that have appeared in Q4 2020.

Read More »
bseries imgic technology

Back in the high-performance game

My first encounter with the PowerVR GPU was helping the then VideoLogic launch boards for Matrox in Europe. Not long after I joined the company, working on the rebrand to Imagination Technologies and promoting both our own VideoLogic-branded boards and those of our partners using ST’s Kyro processors. There were tens of board partners but only for one brief moment did we have two partners in the desktop space: NEC and ST.

Read More »


Sign up to receive the latest news and product updates from Imagination straight to your inbox.