Introduction
In the world of embedded devices, there's a growing demand for advanced machine learning and signal processing capabilities. ARM® Cortex®-M85, the latest general-purpose core, aims to meet these demands with its 32-bit Armv8.1-M architecture, offering high performance and power efficiency. The core's Helium technology, M-profile vector extension (MVE), provides significant uplift for ML/DSP applications (4x ML performance and 3x DSP performance vs. Cortex-M7). In this 2 series blog, we'll explore Helium technology and the features that make Cortex-M85 stand out in the market, followed by the 2nd blog explaining the demos presented at Embedded World 2023.
Helium-MVE
Helium technology is a microarchitecture extension for Cortex-M85 and Cortex-M55 cores, designed to replace lower to mid-tier DSP cores with on-chip processing. Helium consists of eight 128-bit vector registers and supports a wide range of vector data types for various applications. Features such as overlapping pipelines, improved branch prediction, and looping optimization contribute to its performance. In addition, enhanced memory access instructions and support for complex value processing make Helium a powerful addition to Cortex-M85.
Cortex-M85 Features
Cortex-M85 outshines other Cortex-M cores with a wealth of features, including enhanced security options like pointer authentication, branch target identification (PABTI), and unprivileged debug extensions (DUE). The core also offers a 7-stage scalar pipeline and 9-10 stage vector and floating-point pipeline, with support for various data types. A detailed comparison of features between the top 3 high-end ARM microcontroller cores is illustrated below.
Table 1. Comparison between CM7 vs CM55 vs CM85
Cortex-M7 | Cortex-M55 | Cortex-M85 | |
---|---|---|---|
Architecture | Arm v7-M | Arm v8.1-M | Arm v8.1-M |
Security | PACBTI | ||
Unprivileged Debug Extension | Unprivileged Debug Extension | ||
Stack limit checking | Stack limit checking | ||
TrustZone | TrustZone | ||
MPU (PMSAv7) | MPU (PMSAv8) | MPU (PMSAv8) | |
Pipeline | 6-stage superscalar and branch prediction | 4-stage (for main integer pipeline) | 7-stage scalar pipeline and 9-10 stage vector and floating-point pipeline |
Helium (MVE) | Not supported | Supported | Supported |
FPU | fp32, fp64 FPv5 | fp16, fp32, fp64 FPv5 | fp16, fp32, fp64 FPv5 |
MACs per cycle | 1 32bx32b | 8 8bx8b 4 16bx16b 2 32bx32b | 8 8bx8b 4 16bx16b 2 32bx32b |
CoreMarks/MHz | 5.29 | 4.4 | 6.28 |
DMIPS/MHz | 2.31/3.23/6.78 | 1.69/2.16/5.32 | 3.13/4.52/8.76 |
Benchmarks
Cortex-M85 achieves significant performance uplifts compared to other Cortex-M cores, outperforming Cortex-M7 in AI/ML performance by 4 times and Cortex-M55 by 20%.
Figure 1. Performance uplift of CM85 vs CM7 and CM55 [Data Source: Arm]
Empirical data indicates that Helium technology improves the performance of some ML kernels by up to 787%, and up to 57% and 64% for fast Fourier transform and finite impulse response for floating points data types, respectively. However, do note that since Helium natively supports multiple data types, the performance uplift would be significantly higher in those instances, as will be shown in 2nd part of this blog series.
Figure 2. Benchmark performance for MVE vs non-MVE aware devices.
(a) CMSIS-NN with ARM compiler AC6.15 averaged result performance over a fully connected layer
(b) CMSIS-FFT&FIR for floating point with ARM Compiler AC6.16 (normalized performance)
[Data source: Arm]
In conclusion, the Cortex-M85 with Helium can contribute to a significant uplift in AI/ML and DSP performance while outshining the rest of the Cortex-M cores in scalar performance. This makes it an ideal choice for more complex processing tasks. Stay tuned for the second part of this blog series, where we will discuss the AI demos showcased by Renesas Electronics at Embedded World 2023 and how CM85 performed in real-life applications.