Instagram-like computational photography with PowerVR SGX GPUs

Share on linkedin
Share on twitter
Share on facebook
Share on reddit
Share on digg
Share on email

Mobile phones have been visually-oriented devices since the appearance of the first camera phones, and today user-produced content has been a driving force behind everything from network utilisation to app creation. However, developers have recently begun to use sophisticated manipulation of the image data to create a wide range of new user experiences, from intelligent vision systems, through augmented reality apps to avatar creation plus many others. This burst of creativity is limited only by the amount of processing which can be performed on the handset; and therein lies the challenge.

Computational photography: one of the main applications for mobile compute

Imaging workloads have long been the domain of the CPU, which works well enough for still images and small format video, but with HD now standard and 4K arriving very shortly, the ability to run interesting apps on the CPU has become severely limited by the thermal envelope of the device – simply adding more cores and running at higher frequencies does no good if the transistors overheat and shut down. This is a problem crying out for a solution.

Fortunately a solution is here: all modern application processors contain many compute engines such as a GPU, DSP and ISP (image synthesis processor) which can efficiently perform the required imaging tasks. As a result, developers are turning to heterogeneous computing, which is the art of combining these blocks together to provide high performance within restricted power and thermal budgets.

 Graphics video processing - Exynos 5410 PowerVR SGX544MP3 (6)The PowerVR SGX GPU is running a saturation filter on Full HD (1080p) real-time video at 70 fps

PowerVR GPUs can be found in many of the world’s most iconic mobile products, delivering world-class 3D graphics at low power. Thanks to an extremely flexible architecture, devices incorporating our industry-leading PowerVR GPUs can be deployed as efficient compute engines to accelerate image and video processing tasks. Apps can delegate imaging kernels to the GPU using the OpenCL EP (Embedded Profile) API, which provides the optimal precision for visual computation, while keeping power consumption to a minimum.

This leaves the issue of how to share the image data between the various compute engines without redundant operations. The rest of this blog is devoted to describing how that can be done using tools developed by Imagination which were recently released to developers.

The challenge with sharing – ensuring zero copy

Apps designed to take advantage of heterogeneous computing require efficient interoperability between the APIs used to program the different compute engines. In a teleconferencing app that performs real-time airbrushing, for example, input frames captured by the camera might first be inspected by the GPU to determine the position of a face and its individual features (i.e. eyes, lips, nose and possibly others), passing these coordinates to the CPU to analyse. The CPU can then determine a set of image filters for the GPU to apply, such as removing blemishes or wrinkles, with the GPU generating a transformed frame for input to a video codec. In this scenario, up to four different system components each require access to the same image data in memory.

Until now, all OpenCL implementations in the market created behind-the-scenes copy of the image data when transferring its ownership between the camera and GPU. This operation unnecessarily increases system memory traffic, reduces performance and consumes power, negating (and in some cases eliminating) the benefit of offloading a task to the GPU. Imagination has been working with its lead partners over the last year to eliminate this barrier to efficiently enable GPU camera-based applications using OpenCL.

How did we solve this challenge?

We developed a set of extensions that allow images to be shared between multiple components which share the same system memory. These extensions are based on Khronos EGL images, which provide an interface between multiple Khronos APIs and the native platform windowing system, and handle issues related to binding and synchronization.

Many cameras generate image data in YUV format, where the Y and UV data are stored in separate planes. The PowerVR SGX GPU can automatically, in its fixed function texturing hardware, perform colour space conversion from YUV (NV21) to RGB and each pixel can be processed by the GPU as a vector operation, enabling efficient operation on R, G and B pixel values in parallel. This leads to OpenCL kernels similar in structure to the example below. The input and output images are represented as OpenCL Image data types, with individual pixels processed as float4 data types that match the native SGX vector width.

/* Generic 3x3 linear filter kernel. */
__kernel void convolve3x3(__read_only  image2d_t  srcImage,
                          __write_only image2d_t  dstImage,
                          sampler_t               sampler,
                          __constant float        *kVals,
                          float                    normalizationVal)
        int2 coords = (int2)(get_global_id(0), get_global_id(1));

        float4 colour;
        float4 colours[9];

        colours[0] = read_imagef(srcImage, sampler, coords + (int2)(-1, -1));
        colours[1] = read_imagef(srcImage, sampler, coords + (int2)( 0, -1));
        colours[2] = read_imagef(srcImage, sampler, coords + (int2)( 1, -1));
        colours[3] = read_imagef(srcImage, sampler, coords + (int2)(-1,  0));
        colours[4] = read_imagef(srcImage, sampler, coords + (int2)( 0,  0));
        colours[5] = read_imagef(srcImage, sampler, coords + (int2)( 1,  0));
        colours[6] = read_imagef(srcImage, sampler, coords + (int2)(-1,  1));
        colours[7] = read_imagef(srcImage, sampler, coords + (int2)( 0,  1));
        colours[8] = read_imagef(srcImage, sampler, coords + (int2)( 1,  1));

        colour  = colours[0] * kVals[0] + colours[1] * kVals[1] + colours[2] * kVals[2];
        colour += colours[3] * kVals[3] + colours[4] * kVals[4] + colours[5] * kVals[5];
        colour += colours[6] * kVals[6] + colours[7] * kVals[7] + colours[8] * kVals[8];

        colour /= normalizationVal;
        write_imagef(dstImage, coords, colour);

An illustration of the complete system solution is shown below. We convert the camera YUV data to RGB using the Khronos extension OES_EGL_image_external, and then use PowerVR’s zero-copy functionality to directly access the camera texture data from OpenCL, without requiring any intermediate copy from OpenGL ES to OpenCL. The filtered output image is then inserted into another zero-copy texture which can be written to the screen in 2D or 3D using OpenGL ES.

PowerVR SGX GPU - Instagram computational photography OpenCL GPU computeZero copying data between multiple system components

OpenCL video processing in action

In a series of recent videos, Imagination demonstrates video-rate computational photography running at Full HD (1080p). This demonstration shows how an ‘Instagram-like’ app written for the Samsung Exynos 5410, the application processor found in the Samsung Galaxy S4 i9500 and Meizu MX3 smartphones as well as the Hardkernel ODROID-XU development board, is able to leverage the PowerVR GPU to achieve significantly better performance (30-70 fps) compared to running the image processing tasks on the CPU (4-7 fps), including reduced power consumption.

The PowerVR SGX544MP3 GPU inside the Exynos 5410-based Galaxy S4 is able to process real-time, Full HD (1080p) video

To enable developers to create their own high-performance GPU compute applications, Imagination will be soon releasing a GPU compute dev program where registered users will have the opportunity to access the PowerVR GPU compute SDK and programming guidelines for PowerVR GPUs. The extensions discussed in this post are fully supported in the OpenCL driver that ships with the low-cost Hardkernel ORDOID-XU board. Furthermore, developers looking to take full advantage of the compute capabilities of our PowerVR Series6 GPU will be available to use tools like PVRTune which will include enhanced profiling for OpenCL and API timing data, allowing developers to better profile their apps.

What are your first impressions on the demos shown above? Leave us your feedback in the comments box below and follow us on Twitter (@ImaginationTech, @GPUCompute and @PowerVRInsider) for more news and announcements.

Alex Voica

Alex Voica

Before deciding to pursue his dream of working in technology marketing, Alexandru held various engineering roles at leading semiconductor companies in Europe. His background also includes research in computer graphics and VR at the School of Advanced Studies Sant'Anna in Pisa. You can follow him on Twitter @alexvoica.

11 thoughts on “Instagram-like computational photography with PowerVR SGX GPUs”

  1. Hi, I make Video Filter Application using OpenGL ES 2.0 Shader at Android )”(Samsung Galaxy S4, PowerVR SGX 544 MP)
    1) glReadPixels is so slow.
    2) “new GraphicBuffer” is not working on PowerVR SGX 544 MP
    What can I do to get rendered Image fast!! from GPU to CPU SGX 544 MP?
    Please Help me.

      • Thank you for your response.
        I wonder if “a set of extensions that allow images to be shared between multiple components” in this article solves my problem.
        That’s why I asked here.

        • The extensions work within the OpenCL / OpenGL ES / EGL paradigm described above. If you register on the forum, the Developer Technology team will be able to help you understand what your issue is.

  2. Hi, is this instagram-like app available within your sdk as a sample or elsewhere? Would really like to show this off to my colleagues.

  3. “Imagination will be soon releasing a GPU compute dev program where registered users will have the opportunity to access the PowerVR GPU compute SDK and programming guidelines for PowerVR GPUs.”
    – Any ETA?
    Best Regards!
    Roosemberth Palacios

  4. Saving almost half the power while boosting performance by several whole multiples is a real testament to how efficiently these new API extensions can get all of the separate hardware and software components to work together.

    • The best part is these gains are achievable on current-gen hardware (PowerVR Series5XT). With PowerVR Series6 GPUs, the delta in performance will be even more noticeable.


Please leave a comment below

Comment policy: We love comments and appreciate the time that readers spend to share ideas and give feedback. However, all comments are manually moderated and those deemed to be spam or solely promotional will be deleted. We respect your privacy and will not publish your personal details.

Blog Contact

If you have any enquiries regarding any of our blog posts, please contact:

United Kingdom

[email protected]
Tel: +44 (0)1923 260 511

Search by Tag

Search by Author

Related blog articles

Apple M1 image

Why you should be running your iOS apps on your new Apple M1 laptop

Towards the end of last year, Apple released the latest version of its Apple MacBook Pro and Macbook Air laptops. This release was notable as with these brand-new laptops Apple made a significant change – the processor inside was based on its own M1 chip rather than the Intel architecture that it had been using since 2006. Since its release, the Apple M1 has been widely hailed for its performance, outstripping Intel in all the major benchmarks and all in a cool, quiet package with low power consumption.

Read More »
android background

The Android Invasion: Imagination GPU IP buddies up with Google-powered devices

Google Android continues to have the lion share of the mobile market, powering around 75% of all smartphones and tablets, making it the most used operating system in the world. Imagination’s PowerVR architecture-based IP and the Android OS are bedfellows, with a host of devices based on Android coming to market all the time. Here we list a few that have appeared in Q4 2020.

Read More »
bseries imgic technology

Back in the high-performance game

My first encounter with the PowerVR GPU was helping the then VideoLogic launch boards for Matrox in Europe. Not long after I joined the company, working on the rebrand to Imagination Technologies and promoting both our own VideoLogic-branded boards and those of our partners using ST’s Kyro processors. There were tens of board partners but only for one brief moment did we have two partners in the desktop space: NEC and ST.

Read More »


Sign up to receive the latest news and product updates from Imagination straight to your inbox.