Skip to main content

Improve Machine Vision up to 20 Times with Renesas DRP!

Image
Zhao Feng
Feng Zhao
Senior Specialist
Published: November 10, 2020

It is essential to adopt machine vision for industrial automation applications. The Renesas RZ/A2M provides ultra-high graphics and image processing capabilities with Renesas’ unique DRP (Dynamically Reconfigurable Processor) technology integrated. DRP can offer flexibility as well as excellent image processing capabilities by providing hardware-level computing capabilities and the ability to modify the computing logic in real time from the software level at the same time. In this article, we will focus on the advantages of RZ/A2M in industrial automation applications, including higher image processing capabilities and ultra-low power consumption.

The reason why the robotic arm in the video can recognize and grasp the target object so flexibly is that the unique DRP module integrated into the RZ/A2M accelerates the image processing process of machine vision. We can see on the external monitor that the entire image processing process (including Bayer to RGB, shadow correction and white balance, RGB to binary image, contour search, image noise reduction, Bayer to grayscale sending display, etc.) only takes less than 3ms and achieved the excellent performance of 60fps under VGA (640x480) resolution.

Image
Picture1

What is DRP?

DRP stands for Dynamically Reconfigurable Processor. It is a unique architecture developed by Renesas, which can dynamically adjust the logic circuit of the hardware arithmetic unit to realize various arithmetic functions.

DRP has six independent units called "Tiles". They can load multiple configuration data (that is, algorithm libraries) and execute them in parallel. These algorithm libraries are stored in the system memory, and the CPU issues instructions and loads into each Tile when needed.

Image
gif-1

The algorithm library in each Tile can be modified at any time, and the operation of other Tiles will not be affected during the modification process.

Image
gif-2

With dynamic loading, DRP offers high-speed processing to different image processing algorithms that applications require with minimal hardware resources.

Image
Picture2

Why DRP can offer excellent performance?

DRP is a type of hardware resource which implements all computing logic at the hardware level. Each Tile is an independent computing unit that has limited hardware resources, so it comes to a need that two or more Tiles to work together when algorithm libraries use more hardware resources. The following are the hardware resources owned by DRP.

Image
Picture3

During the operation of DRP, it will automatically integrate hardware resources according to the complexity of the algorithm library, such as combining two 16-bit multipliers into one 32-bit multiplier in a Tile; or A 16-bit multiplier is used in combination with a counter to further expand the computing power of DRP.

We all know the computing power of hardware very well. For example, the computing power of an FPGA, which is widely used at present, is not at the same order of magnitude as that of CPU. However, the limitation of the FPGA is also obvious. The computing scale is directly linked to the number of gate circuits. Once the number of gates required by the algorithm exceeds the FPGA that was selected at the beginning of the project, the FPGA must be replaced with a larger-scale version, which is very inconvenient.

In this case, the flexibility of DRP allows not only switching among different libraries but also dynamically adjusting the logic among arithmetic circuits within the same library and different clock cycles. This allows for a variety of calculation methods. Through this time-sharing multiplexing method, DRP maximizes computing performance and provides infinite possibilities via a small footprint.

Image
Picture4

Dynamic reconfiguration can modify the combination of arithmetic circuits by each clock, while dynamic loading can reload the entire new algorithm library within 1ms

DRP can even run the same algorithm library through multiple Tiles to increase processing speed. For example, the performance can be directly improved by six times on the original basis only by dividing a picture into six equal parts and handed over to six Tiles for image processing!

Image
Picture5

In general, performance improvements are often accompanied by an increase in power consumption. However, DRP provides a unique approach to improve image processing capabilities which allows much lower energy consumption than CPU.

How to Use DRP

I have introduced many advantages of DRP so far and you may have concerns if DRP is hard to get started with. However, Renesas provides you with a complete solution that allows you to jump into development with ease. At present, we have developed over 50 algorithm libraries to use. Most of the algorithm libraries have similar function and interface as the OpenCV library which allows developers to use DRP in an ordinary project conveniently.

Image
Picture6

Switch between CV Library and DRP Library

Let us take the Bayer to RGB library as an example to see what needs to be done when using DRP.

First let us consider the function interface. The parameters that need to be provided include input/output address, image width and height, and whether tinning is required (the image size can be compressed at the same time during the conversion process).

Image
Picture7

Inside the function, you need to first load the DRP library into the DRP hardware which is the compiled binary format DRP library stored in the array g_drp_lib_bayer_binning2rgb. To use six Tiles to process a picture in parallel, you need to load this library into all six Tiles.

Image
Picture8

Next, pass the calculation parameters to each Tile and then start it when loading is complete. Since one complete picture is cut into six sections, each Tile is only responsible for 1/6 of a picture, so here you need to calculate the starting position and output position of each Tile separately. After the calculation is completed, the Start command is issued to start the Tile operation.

Image

Finally, wait for computing to complete in all six Tiles.

Image
Picture10

After the robot arm is powered on and initialized, the target is initially not found, so it transitions to object detection mode. The workflow of this mode is as follows:

Image

We can see that after the camera collects a frame of an image,

  1. A Bayer to RGB conversion is done by DRP. This conversion also compresses the width and height of the image to 1/4 of the original, which can speed up the subsequent process without losing accuracy; this library only occupies one Tile, so you can split the picture into six at the same time, which takes 0.4ms.
  2. Load shadow correction and white balance library in DRP to correct the results of the previous step. This library uses more resources and occupies two Tiles, so the image needs to be split into three operations which takes 0.8ms.
  3. Convert the RGB image to HSV image and extract the value of V. It is convenient for subsequent moving object detection and contour extraction, which takes 0.2ms.
  4. Use a "weighted moving average method" to extract moving objects. This algorithm takes 0.6ms.
  5. Find the contour and center point according to the object information obtained in the previous step. (Done by CPU)
  6. If the target is found, it will switch to the object tracking process, otherwise, repeat the process.
  7. Some other display-related processing procedures.

If the target is found in the above process, it will enter the process of object tracking. The workflow of the object tracking mode is as follows:

Image
  1. We can see the processing methods of the first two steps are the same, the Bayer to RGB + shadow correction and white balance take a total of 1.2ms.
  2. Since it has been determined that there are objects in the screen under this mode, the image is directly binarized, and the RGB to Binary library in DRP is called for conversion, which takes 0.8ms.
  3. The CPU separately calculates the deviation values of the coordinates, the angle, and the center of the screen, and adjusts the control amount of each motor according to this value, which takes 0.7ms.
  4. Since the image size was compressed in the previous sequence, the coordinates calculated in the previous step are not actual coordinates, and a coordinate conversion is required.
  5. A library of Bayer to grayscale graphs for display and other display-related processes is used, taking 0.3ms.

At present, over 50 libraries we provide can achieve performance improvement ranging from 2 to 80 times compared with the Cortex®-A9 with 528MHz given in the RZ/A2M. The performance improvement of the DRP library is generally between 10 and 20 times. The reasons for the insignificant improvement effect of libraries below 10 times is mainly because the algorithm itself is too simple and too small, room for optimization. We listed some of the existing libraries and a comparison of computing power in the below table for your reference.

Image

Of course, we can also develop new libraries based on customer demands and needs.

Or, if a customer needs to implement their own confidential algorithms using DRP, we can also provide relevant training on DRP library development and teach you how to develop a DRP library.

Share this news on