Understanding OpenGL ES: Multi-thread and multi-window rendering

Share on linkedin
Share on twitter
Share on facebook
Share on reddit
Share on digg
Share on email

As the CPUs and GPUs in mobile devices have become more powerful and devices with one or more high-resolution screen have become ubiquitous, the demand for complex interactions with the graphics driver has increased. In this blog post, I’ll discuss what multi-thread and multi-window rendering means to developers, and I’ll describe if and when these techniques should be used in your apps.

What is multi-thread rendering?

Traditionally, OpenGL ES applications only render to one surface from one thread. However, as the complexity of 3D rendering engines has increased, the CPU overhead of graphics API operations has become a common bottleneck – particularly when loading assets. This is where multi-threaded rendering becomes interesting.

A rendering thread is one CPU thread associated with one graphics context. By default, each graphics context will not be able to access the resources (textures, shaders and vertex buffers) of another context. For this reason, shared contexts are used so one or more background loading threads can access the resources of a primary thread. There are two reasons why this rendering model is extremely useful:

  1. The primary thread won’t block
    By their nature, graphics API calls that upload data have to block until the transfer between application and driver memory has completed. Additionally, shader compilation is a blocking operation in many graphics drivers. This blocking introduces a costly overhead that results in the GPU being starved of work. By moving all upload operations to a background thread, the primary thread can maintain a consistent framerate
  2. Parallel work distribution on multi-core CPUs
    As the graphics driver is processed on the CPU, splitting the work into multiple rendering threads enables the OS to issue work to multiple CPU cores in parallel. This results in the driver’s workload being processed faster than a single rendering thread would be capable of

When should I use multi-threaded rendering?

 OpenGL data upload - Unoptimized OpenGL data upload - Optimized

OpenGL ES Data Upload – Unoptimized

OpenGL ES Data Upload – Optimized

Multi-threaded rendering is best suited to applications that are CPU limited when compiling shaders or uploading data to the graphics driver. Multi-threaded rendering (when done sensibly) enables better distribution of driver work and allows applications to maintain consistent frame rates.

In the simple example above, the transition from Level 1 to Level 2 in a game requires additional textures, VBOs and shader programs to be uploaded. Assuming a seamless transition between the two is required (i.e. splash screens, videos etc. can’t be used to hide the upload cost), the game must upload the new resources to the driver while Level 1 is still being rendered.

In the unoptimized case, the time spent issuing calls to the driver each frame is erratic due to the additional overhead of upload/compilation operations. The increased frame submission time may cause V-sync’s to be missed, which could cause the game to feel jerky as the frame rate stutters.

In the optimized case, a secondary thread is used to upload resources in the background. This allows the primary thread to retain a consistent call submission time and, in turn, a smooth frame rate.

Best practices

For the best possible performance, rendering threads should be created at start up. A primary thread should be used for all rendering. Additional threads (created with a shared context) should only be used for shader compilation and buffer data upload. The number of background threads should be kept to a minimum (e.g. one thread per-CPU core). Creating threads in excess will lead to unmaintainable, hard to debug code.

Calls to eglMakeCurrent() should be kept to a minimum due to its cost (the EGL specification states that all outstanding operations must be flushed before the context is bound to a new thread).

When shouldn’t I use multi-threaded rendering?

When you’re not CPU limited or load times are not a concern

If you’re not CPU limited by the graphics driver, you should avoid multi-threaded rendering. It will increase the complexity of your rendering engine and may even reduce performance if it’s implemented badly.

When trying to “simplify” your rendering engine

The worst use case is to frequently bind a single graphics context to different threads (using eglmakeCurrent()). This is bad for two reasons:

  1. The cost of context binding
    As discussed above, calling eglMakeCurrent() forces the driver to kick all outstanding operations
  2. API calls are serialized
    As a graphics context can only be bound to a single CPU thread at any point in time, all API calls will be submitted serially

So, the API calls have the same cost as a single threaded render (as API call submission is serialized), but there is the additional overhead of the context switch…which means that performance will be less optimal than a single threaded renderer 😐

It may seem like a good design decision, but rendering in this way always results in complex, messy code, where submission order is very difficult to understand (and even more difficult to debug!).

Don’t do this!

What is multi-window rendering?

Multi-window rendering is when an application renders into more than one window surface. These surfaces are then composed into a surface by the OS’s window compositor (for example, Surface Flinger on Android or X11 on many Linux distros) that can be presented to the device’s screen.

In a multi-window application, there is a one to one mapping of CPU threads and graphics contexts. Each graphics context is used to render into its own windows surface.

When should I use multi-window rendering?

Multi-window rendering is best suited to use cases where an application needs to render to more than one screen, for example when a TV is used as a second screen.

When shouldn’t I use multi-window rendering?

To compose layers
 Application Layer Composition - UnoptimizedLayer Composition – Unoptimized

Application Layer Composition - Optimized

Layer Composition – Optimized

In the unoptimized example above, the game scene, touch controls and mini map are rendered to individual surfaces. The application then relies on the OS’s compositor to combine them into surface that can be displayed. This approach is wasteful as memory has to be allocated for a number of surfaces, the compositor will process transparent pixels and the GPU’s Hidden Surface Removal (HSR) isn’t being used to its full potential (i.e. fragments that are occluded by opaque UI elements will be redundantly coloured).

In the optimized case, the game scene is rendered first, and then the touch controls and mini map are rendered directly into the same surface. In cases where this approach isn’t suitable, FBOs can be used to perform the composition within the application. For example, the game scene could be rendered to a lower resolution FBO, blit into the app’s window surface and the UI elements could be drawn on top at the native resolution (this technique is commonly used to increase the performance per-pixel when rendering game scenes).

Multi-thread, multi-window support in PVRTrace

As of our PowerVR Graphics 3.5 SDK, PVRTrace (our OpenGL ES capture and analysis tool) supports applications that rely on these complex graphics driver interactions. This includes per-thread render state inspectors, per-thread filtering in the Call View and Frame Selector, and a thread usage timeline graph. The combination of all of these features makes multi-threaded OpenGL ES much easier for you (and us!) to debug. Additionally, the multi-thread support in our PVRVFrame OpenGL ES emulator has been significantly improved.


Multi-thread, multi-window rendering makes it very easy to shoot yourself in the foot by creating complex, hard to debug rendering engines. However, it also provides a lot of power and flexibility when used correctly. If you stick to the guidelines outlined in this post, you can improve the performance of your resource loading without introducing unnecessary headaches.

Remember to follow us on Twitter (@ImaginationTech, @PowerVRInsider) and keep coming back to our blog as we continue to bring you the latest news and announcements on PowerVR.

Joe Davis

Joe Davis

Joe Davis leads the PowerVR Graphics developer support team. He and his team support a wide variety of graphics developers including those writing games, middleware, UIs, navigation systems, operating systems and web browsers. Joe regularly attends and presents at developer conferences to help graphics developers get the most out of PowerVR GPUs. You can follow him on Twitter @joedavisdev.

6 thoughts on “Understanding OpenGL ES: Multi-thread and multi-window rendering”

  1. I was able to do multi-window shared contexts for screen casting to a video encoder’s input surface. Excellent result.
    I had a go at shared context for loading textures. As I understand it, since this does not involve a call to swapBuffers, no call to makeCurrent is required. So the loader thread’s texture and buffer handles are shared to the render context as they are bound. No context switching involved, the essence of a sharing. Awesome.

  2. At least “When should I use multi-threaded rendering?” section shows obvious issues in the OpenGL driver implementation. The driver should be able to launch a separate thread per compiler invocations.
    Also DMA controllers can live their lifes on they own, so the driver should not wait on transfers.
    But then, that sounds like working workarounds for implementation defects.

    • “But then, that sounds like working workarounds for implementation defects”
      Now that devices with multi-core CPUs are common place, this optimization is being considered for future drivers.
      “Also DMA controllers can live their lifes on they own, so the driver should not wait on transfers.”
      The efficiency of the texture transfer queue has been improved in Series6 and beyond. The recommendation in this post is still valid though as the blocking operation of copying data from application to driver memory can be moved to another thread.

  3. good article. I already used threaded rendering for loading assets progressively on iOS. iOS has a great API with a good documentation for that.


Please leave a comment below

Comment policy: We love comments and appreciate the time that readers spend to share ideas and give feedback. However, all comments are manually moderated and those deemed to be spam or solely promotional will be deleted. We respect your privacy and will not publish your personal details.

Blog Contact

If you have any enquiries regarding any of our blog posts, please contact:

United Kingdom

[email protected]
Tel: +44 (0)1923 260 511

Search by Tag

Search by Author

Related blog articles

android background

The Android Invasion: Imagination GPU IP buddies up with Google-powered devices

Google Android continues to have the lion share of the mobile market, powering around 75% of all smartphones and tablets, making it the most used operating system in the world. Imagination’s PowerVR architecture-based IP and the Android OS are bedfellows, with a host of devices based on Android coming to market all the time. Here we list a few that have appeared in Q4 2020.

Read More »
bseries imgic technology

Back in the high-performance game

My first encounter with the PowerVR GPU was helping the then VideoLogic launch boards for Matrox in Europe. Not long after I joined the company, working on the rebrand to Imagination Technologies and promoting both our own VideoLogic-branded boards and those of our partners using ST’s Kyro processors. There were tens of board partners but only for one brief moment did we have two partners in the desktop space: NEC and ST.

Read More »
pvrtune complete

What is PVRTune Complete?

PVR Tune Complete highlights exactly what the application is doing at the GPU level, helping to identify any bottlenecks in the compute stage, the renderer, and the tiler.

Read More »


Sign up to receive the latest news and product updates from Imagination straight to your inbox.