Mobile GPU compute must be practical GPU compute

Share on linkedin
Share on twitter
Share on facebook
Share on reddit
Share on digg
Share on email

By definition, mobile application scenarios must be power efficient. The reason for this is simple: they run from a battery. The target is to allow a consumer to enjoy the full functionality of a device for as long as possible from a single charge. This means that any usage scenario must be practical and useful, and not just something which burns through battery life thus leaving an unhappy consumer carrying around an unusable device.

In terms of mobile GPU compute, any compute scenario must be a practical, useful GPU compute scenario. The key characteristics explained in my previous article immediately come to mind: only consider tasks suitable for the GPU. This ideally means parallel compute tasks with minimal divergence and not serial divergent tasks, a perfect representation of ensuring that we are using the appropriate compute resource for the right task.

But even the task itself has to be practical and the overall usage scenario of the device has to be practical.

Examples of practical and impractical mobile GPU compute applications

When running a game with console-quality graphics, using GPU compute for some physics calculations does not make sense. While physics are parallel and have limited divergence, in this usage scenario the GPU is already very busy delivering stunning graphical quality to a high resolution screen, hence further loading (or perhaps more accurately overloading) the GPU with a physics task will typically just lead to an overall reduced consumer experience (e.g. lower framerate and/or lower image quality).

On the other hand, when snapping multi-megapixel pictures with your mobile phone camera, you may want to run some image enhancement routines. Loading this onto the GPU makes sense, as this is a parallel non-divergent type of task. Additionally, during the processing, the user is basically just waiting to see his picture and hence the GPU will not be very busy – apart from probably just showing an idle/waiting animation in the GUI.

So two different scenarios both pass the type of processing check but only one passes the practical usage scenario.

There are other usage scenarios that pass the processing type check but may be at odds with the practical check. Video encode and decode fall somewhere in this grey area, where most devices have dedicated resources for executing these tasks in the form of hardware blocks (video processing units). For example, PowerVR VPUs (essentially, fixed function hardware) are far more power and bandwidth efficient than using a programmable, generalized compute resource such as a PowerVR GPU. However, for platforms that do not include dedicated hardware resources for video transcoding, video transcoding using GPU compute might be a more realistic and efficient way of performing these tasks compared to, for example, using the CPU.

A failed usage scenario for mobile would be extreme types of compute which require massive processing time and precision, e.g. folding molecules or other scientific tasks. These fail the practical check as these are things you should never even consider doing on a mobile device. While you may want to view the results on your mobile device, this type of compute should be run on dedicated servers in the cloud.

GPU compute_impractical examplesBiomedical simulations and weather pattern distributions are some examples of impractical use cases for mobile GPU compute

Most compute usage scenarios for mobile battery-powered devices, at least in the near-term, will be practical, common-sense usage scenarios dominated by image and video post-processing and camera vision type of tasks. All of these meet the checks for types of compute as well as the practicality requirement.

GPU compute_practical examplesImage processing, camera vision and augmented reality applications are some examples of practical use cases for mobile GPU compute

A basic rule to remember: just because a task is parallel and non-divergent doesn’t mean that it should run on a mobile GPU – it must always be a sensible usage of your valued battery life.

If you have any questions or feedback about Imagination’s graphics IP, please use the comments box below. To keep up to date with the latest developments on PowerVR, follow us on Twitter (@GPUCompute, @PowerVRInsider and @ImaginationTech) and subscribe to our blog feed.

Kristof Beets

Kristof Beets

Kristof Beets is the senior director of product management for PowerVR Graphics at Imagination Technologies where he drives the product roadmaps to ensure alignment with market requirements. Prior to this, he was part of the business development group and before this, he led the in-house demo development team and the competitive analysis team. His engineering background includes work on SDKs and tools for both PC and mobile products as a member of the PowerVR Developer Relations Team. His work has been published in many guides game and graphics programmers, such as Shader X2, X5 and X6, ARM IQ Magazine, and online by the Khronos Group, and 3Dfx Interactive. Kristof has a background in electrical engineering and received a Master's degree in artificial intelligence. He has spoken at GDC, SIGGRAPH, Embedded Technology, MWC and many other conferences.

22 thoughts on “Mobile GPU compute must be practical GPU compute”

  1. Alexandru Voica, you ‘re wrong:
    PowerVR G6200 (MediaTek) :
    16USSE2 x 2 Clusters x 0.280MHz x 9 = 80GFLOPS
    This is tantamount to SGX554 MP4 (80 GFLOPS).
    eeTimes says :

    • Hi,
      How does that chart contradict any of the statements in the article above (or my comments)?
      Please try to stay on topic, this article is about mobile GPU compute applications and makes no reference to PowerVR G6200 or any other specific GFLOPS numbers.
      Best regards,

      • Below said that the formula was wrong . On the contrary .
        Although in that article does not mention, it is difficult to know the specific results . Ultimately, it is almost certain that things look as the formula indicates . Of course, G Series allows much higher clock frequency, but it is harmful to the battery in PDA devices .
        Thanks for the reply !

        • On the contrary, PowerVR Series6 GPUs introduce a number of hardware features designed to keep power consumption to a minimum (lossless image compression, PVRTC/PVRTC2, etc.).
          Please read the articles carefully before jumping to conclusions or speculating on performance numbers.

          • Whether USSE2 or USC, there’s nothing more important. The important thing is that there are 16 pipelines!!
            Ultimately PowerVR G6200 (MediaTek) is a 80GFLOPS (280MHz). eeTimes says.

  2. Because our PowerVR ‘Rogue’ cluster architecture scales linearly in performance, PowerVR G6200 (2 clusters) is 2x the GFLOPS performance of PowerVR G6100 (1 cluster), for the same frequency.
    Another great advantage of PowerVR ‘Rogue’ GPUs is that the cluster-based structure avoids replicating coherency-related overhead resources that competing multicore GPUs still need to maintain.

  3. Always write that the new PowerVR G6XX0 has a small area and power consumption. But…
    1. What is REAL area of PowerVR G6100/6200/6400 ?
    2. And what is frequency of GPU PowerVR G6400/6200/6100 now (for example G6200 in MT8135)?

    • 1. The resulting die size of a core implemented onto a System-On-a-Chip can differ depending on whether the semiconductor company optimized the implementation for a smaller area, higher frequency, or laid it out in a way to better control power consumption and heat dissipation (at the expense of extra die area). Choices to vary on-die buffer sizes or considerations to the support resources for outfitting the core with better bandwidth can also be made and effect the resulting die area.
      I assume PowerVR cores might end up somewhat larger than some competing cores, but trading some extra area for better thermals/power efficiency is the right design choice to make for mobile designs.
      2. Speculation is that the G6200 in the MT8135 may end up getting targeted at around 300 MHz. That’s a low target compared to what some other semiconductor companies have been considering with their Rogue implementations: 400 MHz to 600 MHz and even beyond.

      • Thanks for the answer!
        Can you clarify about the PowerVR G6200 in MT8135:
        1. About the area.
        If you compare the G6200 with SGX544 and SGX554, so its area will be like MP1 or MP2 or MP4 (or between some of them)?
        2. About the frequency.
        MediaTek “said” about 80 GFLOPS for PowerVR G6200 in MT8135. But if you look at the “formula”:
        16USSE2 x 2Clusters x 0.300GHz x 9 = 86,4 GFLOPS
        So frequency of PowerVR G6200 in MT8135 is less than 300 MHz ? Or what is the TRUE formula to GFLOPS-calculate (PowerVR G6xxx) ?

      • Hi,
        Indeed, all PowerVR G6x30 GPUs have been optimised for maximum efficiency but still manage to keep power consumption to a minimum even with an incremental increase in area; PowerVR G6x00 GPUs are designed to deliver the best performance at the smallest area possible
        An example feature included in PowerVR G6x30 cores is lossless compression which reduces GPU bandwidth usage thus enabling higher performance and reduced power consumption.
        As always, I am unable to comment on the specific GPU frequency for any application processor unless explicitly stated by the silicon vendor.

  4. The microkernal of PowerVR GPUs should assist them in getting better results from Renderscript even with the APIs lack of functionality in that regard. Future enhancements to Renderscript as well as more targeted implementations with Filterscript should bring improvement, too.
    Determining whether moibile GPU compute makes sense on a task level requires evaluating the trade-offs involved. Processing game physics when it’ll be taking away from the visual splendor you were after in the first place can be a bad trade-off, but seeing realistic physical behavior of in-game objects could also sometimes be more impressive than just layering on additional scene complexity to the graphics.

  5. Great post! Succinct, clear, and spot on.
    Now if only Android’s Renderscript allowed for the distinction of executing hardware. This, and lack of features is somewhat frustrating when considering Android’s het-compute API — I sincerely hope that it improves to take advantage the wonderful characteristics of your GPUs.

      • Thanks Alex! Any additional insight into Renderscript and imagination GPUs (and CPUs) would be very welcome — I’m looking forward to the post! I’m actually excited about the API, though it’s progress seems somewhat slow.


Please leave a comment below

Comment policy: We love comments and appreciate the time that readers spend to share ideas and give feedback. However, all comments are manually moderated and those deemed to be spam or solely promotional will be deleted. We respect your privacy and will not publish your personal details.

Blog Contact

If you have any enquiries regarding any of our blog posts, please contact:

United Kingdom

[email protected]
Tel: +44 (0)1923 260 511

Search by Tag

Search by Author

Related blog articles

Beaglebone Black

Fun with PowerVR and the BeagleBone Black: Low-Cost Development Made Easy

Development boards are cool and the BeagleBone® Black (BBB) is one of the more interesting ones around. This widely available tiny board costs around £35 and will boot Linux is only 10 seconds so anyone interested in development can get stuck in quickly. The Introduction to Mobile Graphics course has been recently revamped for 2020 for the Imagination’s University Programme and the widely available, low-cost BBB is an ideal platform for student teaching and exercises based on OpenGL® ES2.0, instead of an expensive standard PC.

Read More »
Apple M1 image

Why you should be running your iOS apps on your new Apple M1 laptop

Towards the end of last year, Apple released the latest version of its Apple MacBook Pro and Macbook Air laptops. This release was notable as with these brand-new laptops Apple made a significant change – the processor inside was based on its own M1 chip rather than the Intel architecture that it had been using since 2006. Since its release, the Apple M1 has been widely hailed for its performance, outstripping Intel in all the major benchmarks and all in a cool, quiet package with low power consumption.

Read More »
android background

The Android Invasion: Imagination GPU IP buddies up with Google-powered devices

Google Android continues to have the lion share of the mobile market, powering around 75% of all smartphones and tablets, making it the most used operating system in the world. Imagination’s PowerVR architecture-based IP and the Android OS are bedfellows, with a host of devices based on Android coming to market all the time. Here we list a few that have appeared in Q4 2020.

Read More »


Sign up to receive the latest news and product updates from Imagination straight to your inbox.