Dwarf Hall: physically based rendering on a PowerVR GPU

Share on linkedin
Share on twitter
Share on facebook
Share on reddit
Share on digg
Share on email

Physically based rendering (PBR) is becoming more prevalent on mobile class GPUs. In this blog post I will give a quick overview of what PBR is, what the advantages and disadvantages of using it are, and some tips on how to use PBR and deferred rendering when running on a PowerVR GPU. I’ll also show you how we used PBR in Dwarf Hall, a recent OpenGL ES demo we produced.

Physically based rendering has no precise definition, it is really a set of guidelines to follow to achieve a more scientific and intuitive way of rendering a scene. A good example is specular lighting. Before PBR, many graphical applications used ad-hoc inputs such as a shininess value as material parameters. This value would be in the range of zero to infinity and artists would be required to view several different values to see which one looks the best.

This method is not intuitive and produced varying levels of quality. It also could produce results that were physically impossible, i.e. more light would be output than was input. PBR rendering makes input values for materials more understandable to both artists and programmers.

Image based lighting

The specular contribution in a scene is an important factor and one way to store the specular contributions for a point in space for a scene (irradiance) is by using a cubemap. An offline renderer can be used to create these cubemaps and we merely need to look up a direction in that cubemap when shading a surface in the application at runtime.

An example of image based lighting in a physically based rendererAn example of image based lighting in a physically-based renderer

In our demos we use a ray tracer to create the offline cubemaps from multiple points in space in the demo. We also use the mip maps of the cubemap to represent the blurry, glossy effects that rough surfaces have (Prefiltered Mipmaped Radiance Environment Maps). We used the method that Sébastien Lagarde used in his blog post here.

This process of encoding the irradiance at a point in space is called Image based lighting and is commonly combined with PBR. We also encode these irradiance values in HDR for use with post processing effects later on and use a tool to store this in an efficient PVRTC format.

Material system

As inputs to a PBR system there are usually a handful of parameters. We used the below inputs to our demo:

  • Albedo colour
  • Roughness (or smoothness)
  • Specular colour
  • Tangent normals

These inputs are enough to give to the PBR shader to produce good looking results and are intuitive for artists and programmers to understand.


One thing that is difficult to get right in 3D rendering is gamma. There are two main spaces colour can live in: gamma space or linear space. Most monitors or output devices expect a gamma space input. The two things to look out for are banding and incorrect gamma compensation.

Banding happens when we quantise from a higher precision to a lower precision and then do more computations on that result. In this situation we want to stay in higher precision while we do the computations and then quanitise as late as possible to avoid banding.

Input textures are the main area where incorrect gamma compensation can happen. Make sure to know what colour space a texture is stored in and apply the correct gamma compensation to it when sampling the texture. Also make sure that texture filtering takes into account gamma. This can be achieved by using the extension EXT_pvrtc_sRGB when working with efficient PVRTC encoded textures.

Deferred rendering

In the demo we wanted to have many moving lights and we used deferred rendering to achieve this. We still want the same inputs to our PBR rendering, so we started with the following layout for our G-buffer:

  • RGB Light accumulation
  • F32 Depth
  • RGB Tangent normals
  • 0-1 Roughness
  • RGB Albedo
  • RGB Specular colour
  • 0-1 Metalicness

Dwarf hall render targetsA visualisation of the G-buffer

Metallicness vs Specular

It is possible to reduce the number of inputs, in order to reduce the size of the G-buffer, if we use the assumption that all metals will have the same specular colour and all dielectrics use the same diffuse colour. Meaning that the albedo colour of a material and the specular colour can be stored in the same image. This is the “Metalicness” workflow we use below.

Drone input texturesDrone input textures. TL: albedo/specular colour, TR: metalicness, BL: normals, BR: roughness.

This gives us a final G-buffer layout that looks like the following:

  • rgb10a2: Light accumulation
  • r32f: Depth
  • rgba8: Normals & roughness
  • rgba8: (Albedo or reflectance) & metallicness

This fits nicely into the 128 bits of on-chip memory we have available per pixel for our PowerVR Series6 target device. (Note that 256 bits are available in PowerVR Series6XT and later GPUs.)

Post processing

Because we have the light accumulation information in HDR format, we can employ many different post processing effects as a last step before drawing to the back buffer. In the demo we have used: bloom, tone mapping, lens flare and a film grain effect. Many physically based renderers use post processing to tune the final image to an artists taste.


In a traditional deferred rendering system, the GPU will read in the G-buffer from main memory and then write out the light accumulation to another render target in main memory. This works well for desktop GPUs with many watts worth of power available and large fast dedicated, power hungry DDR RAM.

However for power constrained devices this becomes a problem. The problem is that a mobile device may only have access to slow system memory for power reasons. So we want to avoid using main memory bandwidth as much as possible.

PowerVR GPUs have a small amount of dedicated, fast local memory because they work in tiles. Either 128 or 256 bits per pixel are available, depending on the hardware. We can utilise this to avoid using main memory bandwidth in a deferred renderer. Some extensions that allow us to do this are:

Framebuffer fetch

The framebuffer fetch extension allows us to read back a fragments value that was written from a previous shader. This all happens while these shaders and triangles are being processed inside a tile. So these writes and reads are to local memory so are faster than using system memory. There is a good explanation of this extension here.

Pixel local storage (PLS)

The pixel local storage extension is much like the previous framebuffer fetch extension, but allows us to specify the format of the intermediate variables to store in local memory. This means less format conversions need to happen. We used the following layout in the demo:

layout(rgb10a2) highp vec4 lightAccumulationpadding;
layout(r32f) highp float depth;
layout(rgba8) highp vec4 normalsroughness;
layout(rgba8) highp vec4 albedoOrReflectancemetallicness;


The pixel local storage 2 extension allows us to write to the back buffer instead of to an intermediate buffer. This means that we do not have to render to an intermediate render target living in system memory, then blit this to the back buffer. We could instead write directly to the back buffer. This avoids using more main memory bandwidth.


An analysis of our demo using the PVRTune tool shows that the optimised renderers (Framebuffer Fetch and PLS) execute fewer rendering tasks.

This means that the G-buffer generation, lighting and tonemapping stages are properly merged in to one task. It also shows a clear reduction in memory bandwidth usage between the on-chip and the main memory: a 53% decrease in reads and a 54% decrease in writes.

All these optimisations result in a slightly lower frame time but in much lower power consumption, as shown below. This means longer battery life on mobile devices.

EfficiencyEfficiency in total system power

Future work


Vulkan allows us to specify the load and store operations explicitly. We have seen very good efficiency results when we ran our Gnome Horde demo using Vulkan and OpenGL ES with regards to the CPU. This means good things for mobile devices – less CPU usage means less power which means longer battery life and more features. I will be looking at combining PLS and Vulkan in a later article.

Ray tracing

Image based lighting is an approximation. If a light needs to move in the scene then the offline render of the scene will be incorrect. Recomputing this is a costly expense. If we used ray tracing to generate these IBL probes we could have completely dynamic IBL at a fraction of the cost of recomputing them using rasterisation.

Imagine the low CPU overhead of Vulkan, issuing commands to a power efficient PowerVR GPU using raytracing to achieve fully dynamic lighting and shadows!

This is something we are looking at right now and I hope to give you more information soon.

4 thoughts on “Dwarf Hall: physically based rendering on a PowerVR GPU”

  1. Is this demo running on the same specs as the last Dwarf Hall demo? I believe it was an 8-cluster 7XT with an unspecified clock speed?

      • Any chance that you could tell us it’s clock rate so we can gauge it’s peak GFLOPs? I’m curious how it compares to a theoretical 1GHz 16-cluster 7XT monster to get a sense of how much improvement can be made over the visuals in this demo.


Please leave a comment below

Comment policy: We love comments and appreciate the time that readers spend to share ideas and give feedback. However, all comments are manually moderated and those deemed to be spam or solely promotional will be deleted. We respect your privacy and will not publish your personal details.

Blog Contact

If you have any enquiries regarding any of our blog posts, please contact:

United Kingdom

[email protected]
Tel: +44 (0)1923 260 511

Search by Tag

Search by Author

Related blog articles

android background

The Android Invasion: Imagination GPU IP buddies up with Google-powered devices

Google Android continues to have the lion share of the mobile market, powering around 75% of all smartphones and tablets, making it the most used operating system in the world. Imagination’s PowerVR architecture-based IP and the Android OS are bedfellows, with a host of devices based on Android coming to market all the time. Here we list a few that have appeared in Q4 2020.

Read More »
bseries imgic technology

Back in the high-performance game

My first encounter with the PowerVR GPU was helping the then VideoLogic launch boards for Matrox in Europe. Not long after I joined the company, working on the rebrand to Imagination Technologies and promoting both our own VideoLogic-branded boards and those of our partners using ST’s Kyro processors. There were tens of board partners but only for one brief moment did we have two partners in the desktop space: NEC and ST.

Read More »
pvrtune complete

What is PVRTune Complete?

PVR Tune Complete highlights exactly what the application is doing at the GPU level, helping to identify any bottlenecks in the compute stage, the renderer, and the tiler.

Read More »


Sign up to receive the latest news and product updates from Imagination straight to your inbox.