PowerVR 'Rogue': Designing an optimal architecture for graphics and GPU compute

Share on linkedin
Share on twitter
Share on facebook
Share on reddit
Share on digg
Share on email

When designing our PowerVR ‘Rogue’ architecture, all components were reworked and optimised for efficiency (more on this in a future article). Part of that effort came from a deeper consideration of the GPU compute angle. Therefore, in this article, I will focus on just two highlighted key features of the PowerVR Series6 GPU design that is linked to this effort.

PowerVR ‘Rogue’ GPUs feature scalar ALUs for highest compute resources utilization and easy programming

The PowerVR ‘Rogue’ architecture is built around scalar processing engines rather than the vector engines used in older GPU designs. There are numerous benefits in going to a scalar processing architecture – most notably easier optimal software development. This ease of development benefits both our compiler teams (no need for complex and expensive vectorisation efforts at the compiler level). It equally it is far easier for developers, since it no longer matters if they vectorise their algorithms or not. This benefit is illustrated in the graph below:

GPU compute on PowerVR 'Rogue': ALU utilization scalar vs vector architecture

As can be seen in the graph, the scalar architecture does not care if the algorithm is written with scalar ops (R), vec2 ops (RG), vec3 (RGB) or a full vec4 (RGBA), where the vector-based architecture is highly sensitive to vectorisation. With vector-based architectures, the problem of efficiency is shifted to the software developer, rather than tackling efficiency directly through a modern architecture with a scalar design.

This architectural efficiency is essential for optimising image processing algorithms. This is one of the most popular and sensible usages of GPU compute in the mobile segment, where many algorithms reject colour information as a first step, and limit processing to intensity information only. Such an approach on a scalar architecture is no problem at all.

On a poorly-implemented vector architecture however, the developer is faced with 25% of peak performance or the expensive option of vectorising the entire algorithm to process multiple intensity values in one go (e.g. 4 pixels).

Vectorising algorithms may be manageable for simple algorithms but quickly becomes a lot more complex as algorithms commonly mix vector widths thus significantly complicating this effort. Typically developers focus on optimising for the most common dominant architecture and, given current market share ratios, it seems extremely likely that scalar based architectures, like PowerVR ‘Rogue’, will be dominant in the mobile market (not a surprise given the gain in efficiency). Further strengthening this developer focus is the PC market, where compute architectures are also using scalar pipelines. Algorithms ported from this market to mobile will already have been optimised for scalar and not vector based GPU designs.

PowerVR ‘Rogue’ GPUs have improved support for local memory

Compute APIs have the opportunity to expose different memory types, which, depending on the implementation, may provide different performance levels. Typically this is referred to as local memory (fast memory) and global memory.

When writing algorithms using just global memory, you just address data as you would normally do, and access goes to system memory through a standard cache infrastructure. With local memory, however, algorithms can be rewritten to pre-load data into the local memory (a kind of on-chip cache). Then the algorithm accesses this fast local memory store during its compute processing, and at the end, the results again are burst-written to system memory.

It should already be clear that the latter approach sounds far more bandwidth- and power-effective, as data is fetched into local memory once followed by making all accesses on-chip. This is unlike the first approach where it is all left up to chance (any cache implementation depends on luck: if you are lucky, the data is still in the cache; if you are unlucky, the data has already been flushed by other data accesses and hence you need to re-fetch).

If you remember our graphics approach of Tile Based Deferred Rendering (TBDR) from other posts, you will remember that by using our tile sorting, we ensure that caches become 100% effective (see link). It comes as no surprise then that Imagination has implemented the equivalent concept of compute using the efficiency of fast on-chip memory.

GPU compute on PowerVR 'Rogue': memory hierarchy in OpenCL

Within the PowerVR ‘Rogue’ architecture, there are numerous optimisations linked to compute usage scenarios. We also continue to make our architecture more efficient and effective by studying actual practical mobile compute use cases as they come to market from third parties.

If you have any questions or feedback about Imagination’s graphics IP, please use the comments box below. To keep up to date with the latest developments on PowerVR, follow us on Twitter (@GPUCompute, @PowerVRInsider and @ImaginationTech) and subscribe to our blog feed.

Kristof Beets

Kristof Beets

Kristof Beets is the senior director of product management for PowerVR Graphics at Imagination Technologies where he drives the product roadmaps to ensure alignment with market requirements. Prior to this, he was part of the business development group and before this, he led the in-house demo development team and the competitive analysis team. His engineering background includes work on SDKs and tools for both PC and mobile products as a member of the PowerVR Developer Relations Team. His work has been published in many guides game and graphics programmers, such as Shader X2, X5 and X6, ARM IQ Magazine, and online by the Khronos Group, Beyond3D.com and 3Dfx Interactive. Kristof has a background in electrical engineering and received a Master's degree in artificial intelligence. He has spoken at GDC, SIGGRAPH, Embedded Technology, MWC and many other conferences.

1 thought on “PowerVR 'Rogue': Designing an optimal architecture for graphics and GPU compute”

  1. Wow! This was a fantastic read, and I learned something about Scalar vs. Vector ALUs (which makes complete sense in hindsight). Imgtec continues to lead the charge in terms of efficiency, and I simply can’t wait to see what Rogue is capable of doing. Thank you so much for this.


Please leave a comment below

Comment policy: We love comments and appreciate the time that readers spend to share ideas and give feedback. However, all comments are manually moderated and those deemed to be spam or solely promotional will be deleted. We respect your privacy and will not publish your personal details.

Blog Contact

If you have any enquiries regarding any of our blog posts, please contact:

United Kingdom

[email protected]
Tel: +44 (0)1923 260 511

Search by Tag

Search by Author

Related blog articles

Beaglebone Black

Fun with PowerVR and the BeagleBone Black: Low-Cost Development Made Easy

Development boards are cool and the BeagleBone® Black (BBB) is one of the more interesting ones around. This widely available tiny board costs around £35 and will boot Linux is only 10 seconds so anyone interested in development can get stuck in quickly. The Introduction to Mobile Graphics course has been recently revamped for 2020 for the Imagination’s University Programme and the widely available, low-cost BBB is an ideal platform for student teaching and exercises based on OpenGL® ES2.0, instead of an expensive standard PC.

Read More »
Apple M1 image

Why you should be running your iOS apps on your new Apple M1 laptop

Towards the end of last year, Apple released the latest version of its Apple MacBook Pro and Macbook Air laptops. This release was notable as with these brand-new laptops Apple made a significant change – the processor inside was based on its own M1 chip rather than the Intel architecture that it had been using since 2006. Since its release, the Apple M1 has been widely hailed for its performance, outstripping Intel in all the major benchmarks and all in a cool, quiet package with low power consumption.

Read More »
android background

The Android Invasion: Imagination GPU IP buddies up with Google-powered devices

Google Android continues to have the lion share of the mobile market, powering around 75% of all smartphones and tablets, making it the most used operating system in the world. Imagination’s PowerVR architecture-based IP and the Android OS are bedfellows, with a host of devices based on Android coming to market all the time. Here we list a few that have appeared in Q4 2020.

Read More »


Sign up to receive the latest news and product updates from Imagination straight to your inbox.