Skip to main content

High-Speed Simulator for R-Car S4 - Renesas QEMU Environment

Image
Kota Shima
Kota Shima
Software Engineer
Published: October 31, 2023

High-Speed Simulator for ECU Level Simulation

In recent years, as advanced driving systems, autonomous driving, and other functions have been realized in automobiles, the software installed in ECUs has become larger and larger. Such ECU-level simulations for software tend to be long. Simulators with faster execution speed are required to resolve this problem.

Renesas QEMU Environment

In order to realize high-speed simulation, Renesas is developing the Renesas QEMU Environment for the R-Car S4 automotive system-on-chip (SoC) simulator (hereinafter referred to as this simulator) which combines QEMU models with SystemC models. Only one CPU architecture enables simulation on the QEMU framework. However, R-Car S4 has three types of CPU architectures (CA55, CR52, and G4MH). This simulator needs to simulate them at the same time.

Therefore, the CPU cores are realized using a wrapper of SystemC, shown in Figure 1. In this way, QEMU models can be simulated in parallel as instances of SystemC. This wrapper converts the interfaces prepared by QEMU (C language) into the SystemC interface and controls the timing between QEMU and SystemC. As a result, this simulator realizes simultaneous simulation of the three types of CPUs.

Image
Figure 1. Structure of the Renesas QEMU Environment

QEMU Features

  • Dynamic Binary Translation (DBT) for faster execution
  • High abstraction model for faster execution

Among these features, DBT is an important technology for accelerating the simulation speed of QEMU's CPU models. DBT divides an execution code on QEMU's CPU (guest code) into short code blocks. Then, DBT translates the guest code blocks into execution codes (host code) and caches them on a CPU that executes this simulator (host CPU). The execution procedure of DBT is shown below.

  1. The translation function of DBT divides the guest code into a basic block (BB) for every branch instruction, and the BB is decoded to an intermediate code. The intermediate code is an intermediate representation for translating a guest code into a host code.
  2. Tiny Code Generator (TCG) converts the intermediate code to a translation block (TB), which is a binary block for the host code. The converted TB is stored in the translation cache.
  3. The host CPU executes the TB which is saved in the translation cache. If the next TB to be executed doesn’t exist in the translation cache, it is executed again from the process described in (1). If the TB is in the translation cache, the host CPU executes it from the cache. If the next branch destination address is fixed, the branch destination TB is also fixed. Therefore, a code that indicates TB continuity will be embedded so that the next TB can be executed without checking the translation cache. Such a mechanism is called direct block chaining.
Image
Figure 2. Dynamic Binary Translation Mechanism

When executing the guest code, a series of instructions converted to TBs is executed on the host CPU by repeating (1) to (3). Only if the BB is executed for the first time, the conversion processes of (1) and (2) are executed. If an instruction at an executed address has already been executed, steps (1) and (2) can be omitted because the next TB is stored in the translation cache.

In addition, by connecting the TBs using direct block chaining, shown in (3), the process of determining the TB conversion request of the guest code and the process of referencing the next TB from the translation cache are omitted.

In this way, DBT helps the CPUs to execute guest code faster by omitting conversion processes and using direct block chaining.

CPU Performance for the Renesas QEMU Environment

The results of the CPU benchmark for the actual R-Car S4 and Renesas QEMU Environment are shown in Figure 3. From the result of Figure 3, it is shown that the processing performance (number of instructions executed) of this simulator is approximately 0.8 instructions per one instruction of the actual machine.

Image
Figure 3 CPU Benchmark Result

Learn more about the R-Car S4 automotive system-on-chip (SoC).

Future Prospects

We introduced the overview of the simulator for R-Car S4, the Renesas QEMU Environment, and CPU performance. The simulator is based on QEMU 7.0 and will support QEMU 8.0. In addition, we will improve this simulation speed including the CPU peripheral functions. Renesas will prepare next-generation platforms so that they can be used for our customers' software development and system studies.

Share this news on