Background
R-Car SoC is our proposed SoC with high performance and low power consumption. However, in order to run trained DNN models by using popular deep learning frameworks such as PyTorch, TensorFlow, etc. on R-Car SoC, non-equivalent approximation such as pruning (*1) and quantization (*2) model compression is required. We provide the R-Car CNN tool, which not only performs the above approximation procedures to run trained deep learning model on the R-Car SoC, but also provides simulators with accuracy and speed to suit your application so that you can experience verification and performance estimation even if you do not have the R-Car SoC hardware. (*3)
This paper proposes debugging analysis and accuracy improvement of models using Accurate Simulator, which is especially designed to obtain output results equivalent to operations on an actual R-Car SoC, we can use it to identify the cause of the unexpected results by sequentially following the intermediate outputs in the model, which cannot be checked on the actual R-Car SoC.
Assumed case
In order to convert trained DNN model into execution formatthat can be executed on an R-Car SoC, non-equivalent approximate model compression, such as pruning and quantization, is required. Quantization is a method of approximating a trained DNN model written in floating-point arithmetic to an integer arithmetic model. In this process, the maximum and minimum values of the output tensor of each layer are estimated from sufficiently numerous input images, and the quantization parameters (scale and zero point) are determined (calibration). When validating this quantized model on an actual R-Car SoC or a simulator (*4), some input images may give unexpected results in comparison with the original trained model. In such cases, model analysis using Accurate Simulator is effective, as you can sequentially follow the intermediate outputs in the model, which cannot be checked on the actual R-Car SoC by using Accurate Simulator .
Flow of model analysis using Accurate Simulator
In the above case, the quality or quantity of the input image data at the time of calibration was not sufficient, which may be the cause of (a) poor calibration or (b) quantization failure in a layer with a large intermediate output swing. In such cases, the first step is to determine whether the cause is (a) or (b), and then either (a) add or update the input image data and perform calibration again, or (b) identify the layer where the problem is occurring and increase the bit width of that layer, which will improve the accuracy of the quantized DNN model.
Accurate Simulator is a simulator designed to ensure that the output results are an exact match with the actual R-Car SoC. The Accurate Simulator allows the user to extract the intermediate outputs of each layer of the model, one by one, from the layer side where the image data is inputted, and compare them to the intermediate outputs of the original trained model. The Accurate Simulator is used to check for errors bycomparing the intermediate output to the intermediate output of the original trained model after extracting the intermediate output of the layers one by one from input of image data.
Example
When using our R-Car SoC, you convert the trained DNN model to R-Car SoC execution format using our R-Car CNN tool and execute it. The following assumes that the original trained DNN model (say TensorFlow) and the R-Car SoC output results do not match at runtime, and that the cause is to be identified and resolved. We explain how to estimate the quantization error by directly comparing the intermediate outputs in the original TensorFlow model and the R-Car execution format model in the model using Accurate Simulator.
- Convert your trained TensorFlow model to ONNX and convert it into a format that can be run on Accurate Simulator using our R-Car CNN tool, together with quantization conditions and a sufficient number of images for calibration.
- Run your TensorFlow model and extract the intermediate output of the layers to be compared.
- Run the execution format for Accurate Simulator created in (1) above using R-Car SDK runtime. The intermediate output of the layer you want to compare can be extracted.
- The Accurate Simulator outputs calculated in (2) and (3) are represented in integer precision because of the quantization assumption, but an de-quantization tool is also available. The graph in the figure directly compares the intermediate output tensor components generated by TensorFlow and Accurate Simulator. In this example, the comparison results are almost identical and there is no problem with this layer. Repeat steps (1) through (4) to identify the layer for which the approximation is broken. By increasing the bit precision width of the quantization parameter of the layer in question (e.g., from 8 bit to 16 bit), the output result accuracy of the quantized model can be improved.
Summary
In this blog, we introduced the Accurate Simulator, which is designed to obtain equivalent calculational results to those of the actual R-Car SoC, and allows the user to examine the intermediate outputs of the model that cannot be checked on the actual R-Car SoC device. Accurate Simulator can be used to examine intermediate outputs of the model that cannot be checked. We hope you will make use of this tool for debugging evaluation and improvement of model accuracy. Renesas will continue to enhance the R-Car CNN tool for users' model evaluation and validation.
(*1) Reduces the amount of computation and memory usage by setting weights with small contribution to the result to 0 and skipping the computation for those weights.
(*2) Approximate conversion of inference processing, which is usually computed in floating point, to integer operations such as 8-bit. Quantization here is called PTQ (post training quantization), which optimizes quantization parameters (scale and zero point) by performing calibration using multiple input images.
(*3) Blog article "Introduction to R-Car DNN Simulator"
(*4) In addition to Accurate Simulator, Renesas also offers Instruction Set Simulator (ISS), which is designed to obtain equivalent calculational result to that of the actual R-Car SoC device. ISS simulates not only the calculational result but also the calculation process itself,