Micro-benchmark your render on PowerVR Series5, Series5XT and Series6 GPUs

Share on linkedin
Share on twitter
Share on facebook
Share on reddit
Share on digg
Share on email

Benchmarking the performance of applications running on PowerVR GPUs isn’t as simple as collecting time stamps as various points in your render. The reason for this is that the graphics driver submits work to the GPU independently from the API calls that an application has made, i.e. very few graphics API calls are blocking operations.

Additionally, PowerVR GPUs will try to process as much vertex and fragment work in parallel as possible to keep idle time to a minimum and make optimal use of the resources available. This means that to accurately measure performance you will need to render a number of frames and calculate the average frame-time to understand the true cost of your render.

PowerVR Graphics SDK v3.0

Because of this behaviour, it’s best to write micro-benchmarks to accurately measure performance. The aim of a micro-benchmark it is to understand the cost of rendering a static frame. Writing a benchmark for a dynamic scene (e.g. a fly-through mode in a game) is beyond the scope of this guide.

The following sections describe a simple, generic micro-benchmark. For the sake of simplicity, it assumes that the benchmark is written using the OpenGL ES 2.0 graphics API.

This benchmark guide makes use of glReadPixels() to force renders to complete. This is a very expensive operation as it removes all parallelism between the CPU and GPU so we recommend only using glReadPixels() when it’s absolutely necessary.

If you are not already familiar with the PowerVR GPU architecture, you should check out our PowerVR Series5 Architecture Guide for Developers document.

Platform setup

Before you begin benchmarking your application, you need to ensure that your target platform is setup appropriately.

Disabling V-Sync

V-Sync is a feature enabled on most platforms that synchronises the display’s refresh rate with GPU’s frame rate to avoid tearing (an artefact caused by the GPU updating a surface the display is still reading from). As V-Sync limits the number of frames that the GPU will process, it prevents you from accurately calculating the cost of your render. You have two options:

1. Disable V-Sync: If possible, you should disable V-Sync on your platform. This will remove the limit and will allow the GPU to render frames as fast as possible
2. Rendering off-screen: If you cannot disable V-Sync on you platform, you should repeatedly render to off-screen surfaces (e.g. OpenGL ES FBOs) to keep the GPU busy. Rendering off-screen is beyond the scope of this guide

Ensure no other application is using the PowerVR GPU

When benchmarking, you must ensure that the GPU is only processing work submitted by your application. If you’re unsure which processes are utilising the GPU, you can use PVRTune to profile the GPU and identify the processes that are submitting work to it.

If you cannot disable other processes that are using the GPU but they have a fixed cost (for example, the SurfaceFlinger compositor on some Android devices), you can factor this cost into your calculations and still run your benchmark on the device. Keep in mind that even with a fixed cost, your benchmark will be less accurate when other applications are using the GPUs resources.

You should not run your benchmark on the device if other processes using the GPU have a varying cost, as this will severely impact the accuracy of your tests.

What Should I Be Benchmarking?

Static scenes

A micro-benchmark should render a static scene so that the results are well defined. There should be no dynamic parts to the render. Additionally, the graphics API calls made in each frame should be consistent. Ideally, the benchmark should render identical frames over and over again to understand their average cost.

Asset Warm-Up

When writing a benchmark, the first thing to remember is that drivers don’t have to upload textures or compile shaders at the point that they were submitted to the graphics API. The graphics driver may, instead, defer this work until the first time that the resource is referenced by a draw call (this allows the driver to avoid uploading redundant resources that are never actually used in the render).

An asset warm-up phase allows you to force the driver to upload the resources that you will be using in your micro-benchmark.

How can I make the driver do that?

As the driver will upload and compile assets at the point that they are first used, the easiest way to force the operations is to do the following:

1. Render your static scene a number of times (~10 frames should do)
2. Call glReadPixels() before the final eglSwapBuffers(). This will force the driver to complete all renders that has been submitted so far. Reading back a 1×1 region is sufficient, as you don’t need the returned data
a. You only need to call glReadPixels() once here

Benchmarking the scene

Now that the driver has warmed up the required assets and we’re happy with the platform’s setup, it’s time to start benchmarking!

To get an accurate measure of the cost of your render, you should send a large number of frames to the GPU between your first timestamp and your last (the more frames, the better!). Processing a large number of frames allows the GPU to keep multiple frames in flight at a time (as it would to in a standard, well written application) and it also reduces the impact of any setup and shutdown costs caused by the benchmark. Here’s an overview of how this should be implemented:

1. Collect a time stamp after the asset warm-up frame has completed (eglSwapBuffers() has returned)
2. Render your static scene a large number of times (at least 10 seconds worth of rendering)
3. Call glReadPixels() before the last eglSwapBuffers() to force a render.
a. You only need to call glReadPixels() at the end of the benchmark. Do not call this every frame, as it will severely impact the benchmark performance.
4. Collect a time stamp after glReadPixels() returns
5. Calculate the average frame time (the elapsed time divides by the number of frames that were rendered)

…and you’re done!

My benchmark is more complex than this. What should I do?

You can use our PowerVR dedicated forum. We’re more than happy to help anyone who would like to accurately measure the performance of their graphics rendering on PowerVR hardware. If you would rather discuss your benchmark privately, you can email [email protected].

Fore more announcements and news from Imagination’s PowerVR ecosystem, follow us on Twitter (@PowerVRInsider and @ImaginationTech) and keep coming back to the PowerVR developers’ blog.

Joe Davis

Joe Davis

Joe Davis leads the PowerVR Graphics developer support team. He and his team support a wide variety of graphics developers including those writing games, middleware, UIs, navigation systems, operating systems and web browsers. Joe regularly attends and presents at developer conferences to help graphics developers get the most out of PowerVR GPUs. You can follow him on Twitter @joedavisdev.

Please leave a comment below

Comment policy: We love comments and appreciate the time that readers spend to share ideas and give feedback. However, all comments are manually moderated and those deemed to be spam or solely promotional will be deleted. We respect your privacy and will not publish your personal details.

Blog Contact

If you have any enquiries regarding any of our blog posts, please contact:

United Kingdom

[email protected]
Tel: +44 (0)1923 260 511

Search by Tag

Search by Author

Related blog articles

Beaglebone Black

Fun with PowerVR and the BeagleBone Black: Low-Cost Development Made Easy

Development boards are cool and the BeagleBone® Black (BBB) is one of the more interesting ones around. This widely available tiny board costs around £35 and will boot Linux is only 10 seconds so anyone interested in development can get stuck in quickly. The Introduction to Mobile Graphics course has been recently revamped for 2020 for the Imagination’s University Programme and the widely available, low-cost BBB is an ideal platform for student teaching and exercises based on OpenGL® ES2.0, instead of an expensive standard PC.

Read More »
Apple M1 image

Why you should be running your iOS apps on your new Apple M1 laptop

Towards the end of last year, Apple released the latest version of its Apple MacBook Pro and Macbook Air laptops. This release was notable as with these brand-new laptops Apple made a significant change – the processor inside was based on its own M1 chip rather than the Intel architecture that it had been using since 2006. Since its release, the Apple M1 has been widely hailed for its performance, outstripping Intel in all the major benchmarks and all in a cool, quiet package with low power consumption.

Read More »
android background

The Android Invasion: Imagination GPU IP buddies up with Google-powered devices

Google Android continues to have the lion share of the mobile market, powering around 75% of all smartphones and tablets, making it the most used operating system in the world. Imagination’s PowerVR architecture-based IP and the Android OS are bedfellows, with a host of devices based on Android coming to market all the time. Here we list a few that have appeared in Q4 2020.

Read More »


Sign up to receive the latest news and product updates from Imagination straight to your inbox.