Flow Control on PowerVR – Optimising Shaders

Share on linkedin
Share on twitter
Share on facebook
Share on reddit
Share on digg
Share on email

We’re back again with another excerpt from our new documentation website. Today, we’ll be looking at flow control and branching in shaders. This series has already covered a range of topics from mipmapping to balancing GPU workloads and we’re really glad you’ve been enjoying these little titbits so far. Judging by the traffic stats, you’re taking the time to have a read of the site afterwards, so that’s fantastic to see.

For those of you who are unaware, docs.imgtec.com is packed with plenty of information for graphics developers, both new and more experienced, including:

Those of you who are developing for PowerVR hardware are obviously going to get the most out of this site but everybody is sure to find something to interest them, so why not take a look?

With that preamble out of the way, let’s take a quick look at another one of our PowerVR performance recommendations: flow control in shaders.


So we’ll start with the good news: PowerVR hardware supports flow control in both vertex and fragment shaders by default, i.e. without having to explicitly enable any extensions.

That’s great, but what is flow control?

Well, flow control is simply controlling the execution path in a shader through branching or looping using statements like ifelsefor, and so on. This can often lead to multiple branching paths within a shader, which are executed based on some kind of condition. Flow control is a very basic concept when programming for a CPU, but it’s slightly more complicated in shaders because of the highly parallelised nature of GPUs.

An example of this branching can be seen in the fragment shader of one of our PowerVR SDK examples, GaussianBlur.

mediump float imageCoordX = gl_FragCoord.x - 0.5;
mediump float windowWidth = config.x;
mediump float xPosition = imageCoordX / windowWidth;
mediump vec3 col = vec3(0.0);
if(xPosition < 0.5)
    col = texture(sOriginalTexture, vTexCoords[NumGaussianWeightsAndOffsets]).rgb;
else if(xPosition > 0.497 && xPosition < 0.503)
    col = vec3(1.0);
    col = Weights[0] * texture(sTexture, vTexCoords[0]).rgb +
          Weights[1] * texture(sTexture, vTexCoords[1]).rgb +
          Weights[2] * texture(sTexture, vTexCoords[2]).rgb +
          Weights[3] * texture(sTexture, vTexCoords[3]).rgb +
          Weights[4] * texture(sTexture, vTexCoords[4]).rgb +
          Weights[4] * texture(sTexture, vTexCoords[5]).rgb +
          Weights[3] * texture(sTexture, vTexCoords[6]).rgb +
          Weights[2] * texture(sTexture, vTexCoords[7]).rgb +
          Weights[1] * texture(sTexture, vTexCoords[8]).rgb +
          Weights[0] * texture(sTexture, vTexCoords[9]).rgb;
oColor = vec4(col , 1);

In the example above, the execution path depends on the value of xPositionxPosition measures the horizontal position along the screen, so conditional branching can be used to perform different processing on the second half of the image. The result of this branching can be seen clearly in the image of this example below.

Screenshot demonstrating how branching is used in a PowerVR SDK example, GaussiaBlur

In general, when we’re talking about flow control in shaders, we’re usually referring to one of two things:

Static Flow Control

This is a case where a shader has two or more branching paths in code which are conditionally selected depending on the value of some uniform variable. Uniform variables are the same across all vertices/fragments, so the same shader path is executed across all vertices/fragments in a single draw call.

Static flow control is sometimes used to combine many smaller shaders into one large shader (an uber shader!). The shader that is going to be executed is then conditionally selected during runtime. However, often a better solution is just to use preprocessor directives to generate multiple shaders from the uber shader during compilation. This means you can create many shaders from a single source file.

Dynamic Flow Control

This one’s a bit more tricky. Again, a shader with dynamic flow control has multiple branching execution paths but this time the condition which controls the branching changes on a per-vertex or per-fragment basis, often based on texture or vertex attributes.  This means the shader could potentially have to execute different paths from one vertex or fragment to the next.

So why is this a problem? Well, a graphics core uses a single instruction, multiple data (SIMD) architecture, which means all processors in the core must execute the same instruction at the same time. If a graphics core is executing a group of shader invocations (for example when a fragment shader is processing a set of fragments) then all of the invocations must follow the same path. This means that during branching the processors will spend time executing instructions that they don’t really need to. This has much more of an unpredictable effect on performance than static flow control.

Recommendations for PowerVR GPUs

So now we’ve covered a little about flow control, here are a few of our recommendations:

Avoid using discard in conditional branches

It is usually best to avoid branching to discard when developing for PowerVR devices. Using discard in the fragment shader negates some of the key benefits of PowerVR’s TBDR architecture.

This mainly affects hidden surface removal (HSR), as this operation assumes all of the fragments of an opaque object are going to be drawn, occluding anything behind them. If fragments can potentially be discarded in the fragment shader the hardware can no longer assume this, meaning the GPU has to wait until the fragment shader has finished before determining which fragments are visible. This invalidates the “deferred” part of the tile-based deferred rendering (TBDR) and can reduce the performance of an application on PowerVR.

Our advice is to use alpha blending instead.

Avoid sampling textures in conditional branches (PowerVR Series5 and Series 5XT only)

When developing for PowerVR Series5 and Series5XT, avoid branching to a texture read, as using a sampler in a dynamic branch qualifies as a dependent texture read.

A dependent texture read occurs when the coordinates used to sample the texture depend on some calculation in the shader rather than on a varyings. In a normal texture read, the hardware can fetch texture data before the fragment shader starts, reducing latency from sampling. In dependent texture reads, the texture coordinates can’t be predicted ahead of time, so texture data can’t be pre-fetched, leading to greater latency and stalls. This can have a really noticeable effect on performance.

From PowerVR Series 6 onwards, dependent texture reads are much more efficient, so this isn’t as important for these architectures, but every little performance boost helps when you’re trying to optimise your application.

Try to use branching to skip unnecessary operations

Finally, it is a good idea to use conditional branching to skip unnecessary operations. This will have the greatest impact on performance when there are a significant number of cases where the condition is met.

Optimising shaders for OpenGL ES 3.0

If you’re using OpenGL ES 3.0 and want to optimise any branching in your shaders, it might be worth taking a look at the extension GL_EXT_shader_group_vote.

To illustrate how this extensions works, consider some basic branching like this:

if (condition)
    result = do_fast_path();
    result = do_general_path();

As mentioned before, sets of shader invocations in a graphics core must all execute the same code path. In the example above, if the condition is true for a single invocation in that group then do_fast_path() will be called on that particular invocation. This leaves the rest of the invocations dormant while waiting for do_fast_path() to return. Once do_fast_path() returns a value, the rest of the invocations can call do_general_path().

This is a bit of a pain because the shader is wasting resources by executing both the fast and the general path. Instead, we can modify the above code using the new built-in functions from this extension:

if (allInvocationsEXT(condition))
    result = do_fast_path();
    result = do_general_path();

The function allInvocationsEXT() only returns true if the given condition is met across the entire set of invocations. This is really useful because it will return the value for all invocations in the group, restricting the group to either executing do_fast_path() or do_general_path() but not both.

GL_EXT_shader_group_vote also provides two other built-in functions like alInvocationsEXT(), which return the same value across all invocations in the same group.

These are:

  • anyInvocationEXT(bool value) – This returns true if value is true for at least one of the invocations in the group.
  • allInvocationsEqualEXT(bool value) – This returns true if value is the same for all invocations in the group.

And finally…

For more PowerVR performance recommendations, and other useful developer information, take a look at our regularly-updated website at docs.imgtec.com.

Do feel free to leave feedback through our usual forums or ticketing systems.

You can also follow @tom_devtech on Twitter for developer technology-related news, or @powervrinsider for the latest on PowerVR!

Benjamin Anuworakarn

Benjamin Anuworakarn

Ben Anuworakarn is a technical author in the PowerVR Developer Technology team and has a computer science engineering background. Primarily responsible for producing and maintaining both internal and external documentation, he has a knack for coming up with solutions to problems that don't exist yet. You can find him either clacking away at his keyboard or shuffling trading cards every few hours.

2 thoughts on “Flow Control on PowerVR – Optimising Shaders”

  1. Is it possible to avoid using group_vote but still skip a branch if all the invocations/threads in a group/wavefront/warp happen to have the same dynamic (not uniform) condition?

    For example, there can be an EXEC thread mask and an EXECZ (indicating EXEC bits are all 0), s_cbranch_execz will skip instructions of a branch based on EXECZ.

  2. “Avoid using discard in conditional branches”

    I think the original advice is avoiding “discard” as much as possible, anyway, if using “discard”, it is always in a branch.


Please leave a comment below

Comment policy: We love comments and appreciate the time that readers spend to share ideas and give feedback. However, all comments are manually moderated and those deemed to be spam or solely promotional will be deleted. We respect your privacy and will not publish your personal details.

Blog Contact

If you have any enquiries regarding any of our blog posts, please contact:

United Kingdom

[email protected]
Tel: +44 (0)1923 260 511

Search by Tag

Search by Author

Related blog articles

Beaglebone Black

Fun with PowerVR and the BeagleBone Black: Low-Cost Development Made Easy

Development boards are cool and the BeagleBone® Black (BBB) is one of the more interesting ones around. This widely available tiny board costs around £35 and will boot Linux is only 10 seconds so anyone interested in development can get stuck in quickly. The Introduction to Mobile Graphics course has been recently revamped for 2020 for the Imagination’s University Programme and the widely available, low-cost BBB is an ideal platform for student teaching and exercises based on OpenGL® ES2.0, instead of an expensive standard PC.

Read More »
pvrtextoolgui sunrise 2

PowerVR SDK and Tools 2020 Release 2 now available

We all know that 2020 has been quite a challenging year, but we hope you’re doing well. Over the last few months, here in DevTec, we’ve been working hard to get this new release out for you and the time has now come for our second release of the PowerVR SDK and Tools of 2020. So, what’s new with this latest release? Let’s take a quick look at a few of the major changes.

Read More »
pvrtune complete

What is PVRTune Complete?

PVR Tune Complete highlights exactly what the application is doing at the GPU level, helping to identify any bottlenecks in the compute stage, the renderer, and the tiler.

Read More »


Sign up to receive the latest news and product updates from Imagination straight to your inbox.