Gnomes per second in Vulkan and OpenGL ES

Share on linkedin
Share on twitter
Share on facebook
Share on reddit
Share on digg
Share on email

It’s been a while since we first showed off our Vulkan* driver for PowerVR Rogue GPUs. Since then, our PowerVR driver and graphics demo teams have been working hard to synchronize with the spec as it evolves towards its final form.

Today we are excited to show you a new demo we have been working on that better highlights the specific benefits we believe this API should bring to developers and devices.

Vulkan and OpenGL ES in Gnome Horde

This new demo is called Gnome Horde and runs under Android on the Intel-based Nexus Player, a consumer device integrating a PowerVR G6430 GPU; it uses the latest prototype Vulkan API driver for PowerVR GPUs (final performance may differ).

On the left-hand side of the video, we are showing Vulkan and on the right we have OpenGL® ES 3.0. We have attempted to ensure both versions run equivalent code and both run without extensions. The demos are not using instancing either, each draw call could be a different piece of geometry with a different material or texture and the CPU performance would be very similar.

Before reading any further, please note that this is an exaggerated scenario that is intended to highlight and amplify Vulkan’s strengths. It is not intended to show OpenGL ES in a bad light – we are deliberately using OpenGL ES in a way that it was not designed for. We are also aiming to be GPU bound using the Vulkan API; this means the GPU and CPU are being used as effectively as possible, which is a great thing for developers and vendors alike.

The implementation details

Using Vulkan we batch draw calls into tiles and render a tile at a time. Each time a tile goes out of view, comes in to view or changes its level of detail we regenerate a command buffer (more on this later). By avoiding changes in the command buffer, we reduce overall CPU usage significantly compared to OpenGL ES.

This is explained in more detail below.

Tiled renderingTiled rendering

In OpenGL ES, all draw calls are submitted dynamically according to the tiles in view, with no opportunity to cache draw calls that have already been executed.

Lower CPU usage

As you can see from the CPU usage graph in the bottom left of the video, CPU usage is very low for this many draw calls in the first mode. In the highest zoom level we are drawing around 400,000 gnomes (and other objects) per second. Each object has a different transformation, and there are many different materials, textures, blend modes and shaders being used.

The reason that the OpenGL ES API struggles with these tasks is because OpenGL ES requires many calls into kernel mode to change the state of the driver, along with validating that state and any extra work that goes on behind the scenes, all during an applications render loop.

This is in contrast to Vulkan where we can pre-generate these commands. Executing pre-generated commands in Vulkan is very fast, with little CPU overhead and no need for the driver to validate or compile anything inside the render loop. These pre-generated commands are called command buffers.

Vulkan demos CPU usage vs OpenGL ESVulkan CPU usage (left) and OpenGL ES CPU usage (right) for Gnome Horde

The lower line is the process CPU usage and the top line represents system CPU usage. Both are reduced in Vulkan due to the ability to process command buffers before submission.

Command buffer re-use

Being able to re-use command buffers proves useful in some circumstances. This feature will not be a panacea, but it will be possible to use it to a great extent in many games and applications. In this specific instance we decided that being able to re-use command buffers for each tile would reduce the overall CPU usage.

Before drawing in both APIs, the driver needs to compile a set of commands for the GPU to execute, validate those commands, and do other work – all before actually starting the GPU. With OpenGL ES, this needs to be performed for each draw call, during the render loop. In Vulkan we can compile and validate this list of commands ahead of time, and then have the GPU execute these pre-generated commands.

Vulkan demo screenshotVulkan in action: Gnome Horde demo screenshot

In this screenshot there are 300 tiles with a total of 13,500 draw calls being run at roughly 30fps with very little CPU usage, this is approaching half-a-million draw calls per second without instancing.

Parallel command buffer generation

In the next demo modes, watching the CPU graph we can see that we can go from very little CPU usage to using nearly the whole of every CPU core. What’s happening here is the camera is moving much faster and therefore needs to regenerate command buffers more frequently (a slightly unrealistic use-case). In OpenGL ES we are CPU bound and cannot feed the GPU with enough commands. However with Vulkan we have the opportunity to distribute the regeneration of the tiles command buffers to different threads. This is not possible with OpenGL ES which was designed before multi-threading was widely available. In a real application, the workload will be somewhere between the two extremes of dynamic draw calls and static draw calls.

In this case we are sacrificing CPU usage for memory usage. We could store all of the command buffers for the entire scene in memory. However on mobile devices, memory is often limited so we only store the command buffers that are in the viewable frustum instead. With Vulkan we are purposefully bound by GPU performance which goes to show that we are using the the CPU effectively and feeding the GPU with enough commands.

Vulkan CPU vs OpenGL ES CPU: Note how OpenGL ES cannot do multi-threadingVulkan CPU vs OpenGL ES CPU: Note how OpenGL ES cannot do multi-threading

For equivalent performance, the Vulkan demo could have the CPU run at a much lower clock frequency, increasing efficiency and battery life compared to OpenGL ES.

In this mode there are roughly 80 command buffers being re-created each frame distributed between the cores of the CPU. Each command buffer has 45 draw calls and other state setting information. With all this work going on, it is good to see the frame rate stays the same as in the previous mode.

Memory allocation strategies

One advantage of Vulkan over OpenGL ES is that the developer has more visibility of the memory that needs to be allocated. With OpenGL ES the driver handles most of the allocation and hides it away from the developer. With Vulkan the memory that the driver allocates is very minimal and the developer can use different memory allocation strategies. For example, if an image is not in use by the GPU, the developer could decide to use that memory for other purposes like uploading a texture.

Render pass – pixel local storage

In Vulkan there is a structure called a render pass; each render pass has one or more sub passes. These sub passes can be exploited to utilise pixel local storage to store intermediate values for shaders between sub passes.

Being a tile-based deferred renderer, PowerVR can execute multiple shaders for the same pixel in an FBO effectively using fast on-chip memory. This is a good idea in rendering techniques such as
deferred rendering. The benefit of doing this is that it avoids wastefully writing intermediate values back to main memory, saving bandwidth and therefore power. However this functionality is an extension in OpenGL ES, requiring more code to check if the extension exists.

In Vulkan this functionality is a core feature that will benefit battery life and the efficiency of applications and devices. Vulkan also allows the driver to handle out-of-memory issues gracefully with respect to deferred renderers and the transient memory they use.


All of the features above require implementation in code, so the use of Vulkan does come with added code complexity compared to OpenGL ES. However, Imagination is committed to continuing full support for OpenGL ES for a long time to come alongside developing a new Vulkan API driver for PowerVR Rogue GPUs.

Devices with the new Vulkan API should bring new optimisation opportunities and increased efficiency to application developers.

If you are heading to SIGGRAPH 2015 this week, drop by the Khronos Group BoF meeting on Wednesday to see this demo in action and get an explanation of what is going on.

Stay tuned to our blog as we will bring you more details after the BoF.

Remember to also follow us on Twitter (@ImaginationTech, @PowerVRInsider) for  the latest news and announcements from the PowerVR Insider team.

Editor’s Note

* The prototype Vulkan driver for PowerVR Rogue GPUs is based on an internal draft Khronos Specification, which may change prior to final release. Conformance criteria for this Specification have not yet been established.

PowerVR Rogue GPUs are based on published Khronos specifications, and are expected to pass the Khronos Conformance Testing Process. Multiple PowerVR Rogue GPU cores have already achieved OpenGL ES conformance. Current conformance status can be found at

OpenGL is a registered trademark and the OpenGL ES logo is a trademark of Silicon Graphics Inc. used by permission by Khronos.

32 thoughts on “Gnomes per second in Vulkan and OpenGL ES”

  1. I don’t know if this is the right place to ask this but… OpenGL has things like glGenBuffers, glGenFramebuffers, etc. Vulkan will keep this kind of API call or it will use a completely different approach regarding data storage into the GPU?

  2. Are these kinds of improvements available for geometry which changes position, or only static (I notice the gnomes do not move)

  3. Interesting test and comparison.
    I’m hoping a future version of
    Gnome Hordes comes out with other optimization features and CPU-saving
    tricks (post Vulkan 1.0, say Vulkan 1.1 or something).
    Being able to see the impact of those features on
    the different API’s would be very interesting.

    • Hey, each object has various levels of detail. From 13K vertices to ~300. In the zoomed out screen shot there are about 1M triangles. Each type of object has a different texture, so ~10 different textures including the shadows. There is alpha-testing for the plants and alpha blending for the shadows and each object type is using a different shader.

  4. May I asked a stupid question? Since so huge difference in performance between these 2 APIs why there is still the need to support Open GL ES? The latter seems not efficient at all..

    • That’s not a stupid question at all. Vulkan is not a replacement for OpenGL (ES), but a complementary API for developers who want to get close to the metal (e.g. game engine guys).
      OpenGL ES will still be available for a lot of high-level, quick-and-easy rendering. For example, simple UIs or 2D/3D games will likely not see any benefit from Vulkan.

    • Consoles (and the console developers) have always had the ability to directly access hardware, which is the main reason why the xbox 360 and PS3 could have such good graphics on hardware that is very old, and PCs required much better GPUs to run at an equivalent quality.
      Since vulcan is coming to PCs and mobiles, this will add the ability for developers to release more optimized games for android.

    • Thank you, we hope to achieve even better results when the spec is finalized and the PowerVR driver achieves conformance.

  5. Yeah, i’d also really like to see this demo open sourced… We need more open demos in the ecosystem that test the capabilities of API’s

    • I’d love to open source it. Unfortunately we’d have to wait until the Vulkan spec has been released. I might be able to show you the code from the OpenGL side of things only.

    • I’m working on an open source implementation (desktop for now). However it seems that the performance depends heavily on the poligon count of the rendered objects and/or the actual hardware.


Please leave a comment below

Comment policy: We love comments and appreciate the time that readers spend to share ideas and give feedback. However, all comments are manually moderated and those deemed to be spam or solely promotional will be deleted. We respect your privacy and will not publish your personal details.

Blog Contact

If you have any enquiries regarding any of our blog posts, please contact:

United Kingdom

[email protected]
Tel: +44 (0)1923 260 511

Search by Tag

Search by Author

Related blog articles

Beaglebone Black

Fun with PowerVR and the BeagleBone Black: Low-Cost Development Made Easy

Development boards are cool and the BeagleBone® Black (BBB) is one of the more interesting ones around. This widely available tiny board costs around £35 and will boot Linux is only 10 seconds so anyone interested in development can get stuck in quickly. The Introduction to Mobile Graphics course has been recently revamped for 2020 for the Imagination’s University Programme and the widely available, low-cost BBB is an ideal platform for student teaching and exercises based on OpenGL® ES2.0, instead of an expensive standard PC.

Read More »
android background

The Android Invasion: Imagination GPU IP buddies up with Google-powered devices

Google Android continues to have the lion share of the mobile market, powering around 75% of all smartphones and tablets, making it the most used operating system in the world. Imagination’s PowerVR architecture-based IP and the Android OS are bedfellows, with a host of devices based on Android coming to market all the time. Here we list a few that have appeared in Q4 2020.

Read More »
pvrtextoolgui sunrise 2

PowerVR SDK and Tools 2020 Release 2 now available

We all know that 2020 has been quite a challenging year, but we hope you’re doing well. Over the last few months, here in DevTec, we’ve been working hard to get this new release out for you and the time has now come for our second release of the PowerVR SDK and Tools of 2020. So, what’s new with this latest release? Let’s take a quick look at a few of the major changes.

Read More »


Sign up to receive the latest news and product updates from Imagination straight to your inbox.