Debugging OpenCL programs with Oclgrind

Share on linkedin
Share on twitter
Share on facebook
Share on reddit
Share on digg
Share on email

James Price is currently completing a PhD degree at the Department of Computer Science, University of Bristol.

When developing programs that utilise GPU compute via OpenCL, we can’t use our traditional CPU development tools. This can make debugging complex OpenCL kernels challenging. As part of my PhD, funded by Imagination Technologies, I’ve developed an OpenCL device simulator called Oclgrind, which enables a raft of tools that can make debugging OpenCL kernels a much simpler task.


At its core, Oclgrind simulates how an OpenCL device executes a kernel. It does this in a manner which is independent from any specific architecture, which enables it to uncover many portability issues that arise during OpenCL development. By exposing a simple plugin interface, Oclgrind enables the creation of a wide variety of tools that can be used to analyse or debug OpenCL kernels.

Using Oclgrind is straightforward, as it implements the full OpenCL 1.2 runtime API. This means that existing OpenCL programs can be simulated without requiring modifications. An alternative interface for simulating specific kernels in isolation is also provided.

Detecting invalid memory accesses

Accessing memory locations that have not been properly allocated is a common problem that can cause headaches for developers using GPU compute. This varies between different platforms, but many GPUs don’t have the capability to detect these errors at the hardware level and can’t produce meaningful feedback about what went wrong.

Oclgrind provides a simple plugin that will check each memory access that your OpenCL kernels perform to ensure they do not access memory that they shouldn’t, and provides clear diagnostics to aid the developer in finding the problem when something does go wrong. Take the following kernel as an example – a trivial parallel vector addition.

kernel void vecadd(global float *a, global float *b, global float *c)
  int i = get_global_id(0);
  c[i] = a[i] + b[i];

Running this kernel through Oclgrind with a global work size of 1024 but using buffers that are only 4000 bytes in size produces the following error message:

Invalid write of size 4 at global memory address 0x3000000000fa0
  Kernel: vecadd
  Entity: Global(1000,0,0) Local(8,0,0) Group(31,0,0)
    store float %7, float addrspace(1)* %8, align 4, !dbg !21
At line 8 of
  c[i] = a[i] + b[i];

This shows us that we were trying to write 4 bytes (one 32-bit float) to global memory. It gives us the specific kernel and line of code, and also tells us which work-item was responsible. In this case, it’s clear that we need to guard these memory accesses by checking that the index is within the bounds of our problem.

Detecting race conditions

Another common bug that arises when writing parallel programs is when insufficient synchronisation is provided when memory is accessed concurrently from multiple units of execution, resulting in a race condition. These defects can be difficult to spot when dealing with complex programs, and might only cause problems on certain devices.

Consider the simple kernel depicted below. Here we have a shared variable named ‘temp’ which is allocated in the local address space, and is therefore shared between work-items in the same work-group. In this kernel, the work-item with a local ID of 1 assigns a value to this variable, and then the work-item with local ID 0 tries to read and use the variable.

kernel void racy(global float *input, global float *output)
  local float temp;
  int lid = get_local_id(0);

  if (lid == 1)
    temp = input[lid];
  if (lid == 0)
  output[0] = temp;

We can enable the data-race detection plugin in Oclgrind by passing the –data-races flags. If we simulate the above kernel, Oclgrind will inform us of the potential race with a message like this:

Read-write data race at local memory address 0x1000000000000
  Kernel: racy
  First entity: Global(1,0,0) Local(1,0,0) Group(0,0,0)
    store float %5, float addrspace(3)* @racy.temp, align 4, !dbg !23
  At line 7 of
    temp = input[lid];
  Second entity: Global(0,0,0) Local(0,0,0) Group(0,0,0)
    %7 = load float, float addrspace(3)* @racy.temp, align 4, !dbg !26
  At line 10 of
    output[0] = temp;

This tells us that we have a data-race on a local memory address, between two work-items with IDs 0 and 1. It also shows the lines of code that were responsible for the problem. Adding a barrier(CLK_LOCAL_MEM_FENCE) between these two statements and then running the code again gives a clean bill of health.

Interactive source-line debugging

Sometimes, getting error messages after-the-fact is not enough to get to the root of a problem, and you need to really delve into a kernel to find out why it is misbehaving. Oclgrind provides a plugin that allows a developer to interactively step through kernel execution in a GDB-like command-line environment. The plugin can be enabled by passing the –interactive flag when we launch Oclgrind. When kernel execution begins, we will be provided with a prompt at which we can enter a variety of commands. Much like GDB, we can step through our kernel line-by-line, print out the values of variables or memory addresses, or instruct the plugin to break at a particular line of code (type help for a full list of commands). This plugin also interacts with the other plugins, and will automatically break into a prompt if an error is encountered.

Checking for runtime API errors

Writing OpenCL programs isn’t just about developing kernels, and there’s plenty of issues that can arise when using the various runtime API functions from the host. The error messages that these API calls return can sometimes be confusing, or they could indicate multiple possible causes. By passing the –check-api flag to Oclgrind we can get it to give us human-readable error messages that describe the exact problem that an API call is trying to report:

Oclgrind - OpenCL runtime error detected
  Function: clEnqueueNDRangeKernel
  local_work_size[0]=32 does not divide global_work_size[0]=1000

Additional information

Oclgrind is an open source tool, and is provided under a BSD license. Binaries for Windows, Linux and OS X are available.

More information about using Oclgrind can be found on its Wiki.

Please leave a comment below

Comment policy: We love comments and appreciate the time that readers spend to share ideas and give feedback. However, all comments are manually moderated and those deemed to be spam or solely promotional will be deleted. We respect your privacy and will not publish your personal details.

Blog Contact

If you have any enquiries regarding any of our blog posts, please contact:

United Kingdom

[email protected]
Tel: +44 (0)1923 260 511

Search by Tag

Search by Author

Related blog articles

android background

The Android Invasion: Imagination GPU IP buddies up with Google-powered devices

Google Android continues to have the lion share of the mobile market, powering around 75% of all smartphones and tablets, making it the most used operating system in the world. Imagination’s PowerVR architecture-based IP and the Android OS are bedfellows, with a host of devices based on Android coming to market all the time. Here we list a few that have appeared in Q4 2020.

Read More »
bseries imgic technology

Back in the high-performance game

My first encounter with the PowerVR GPU was helping the then VideoLogic launch boards for Matrox in Europe. Not long after I joined the company, working on the rebrand to Imagination Technologies and promoting both our own VideoLogic-branded boards and those of our partners using ST’s Kyro processors. There were tens of board partners but only for one brief moment did we have two partners in the desktop space: NEC and ST.

Read More »
pvrtune complete

What is PVRTune Complete?

PVR Tune Complete highlights exactly what the application is doing at the GPU level, helping to identify any bottlenecks in the compute stage, the renderer, and the tiler.

Read More »


Sign up to receive the latest news and product updates from Imagination straight to your inbox.