Skip to main content

Leveraging Helium and ARM® Cortex®-M85 for Unprecedented DSP and AI Performance on an MCU Core

Image
Eldar Sido
Eldar Sido
AI Technical Marketing
Published: April 20, 2023

Introduction

In the world of embedded devices, there's a growing demand for advanced machine learning and signal processing capabilities. ARM® Cortex®-M85, the latest general-purpose core, aims to meet these demands with its 32-bit Armv8.1-M architecture, offering high performance and power efficiency. The core's Helium technology, M-profile vector extension (MVE), provides significant uplift for ML/DSP applications (4x ML performance and 3x DSP performance vs. Cortex-M7). In this 2 series blog, we'll explore Helium technology and the features that make Cortex-M85 stand out in the market, followed by the 2nd blog explaining the demos presented at Embedded World 2023.

Helium-MVE

Helium technology is a microarchitecture extension for Cortex-M85 and Cortex-M55 cores, designed to replace lower to mid-tier DSP cores with on-chip processing. Helium consists of eight 128-bit vector registers and supports a wide range of vector data types for various applications. Features such as overlapping pipelines, improved branch prediction, and looping optimization contribute to its performance. In addition, enhanced memory access instructions and support for complex value processing make Helium a powerful addition to Cortex-M85.

Cortex-M85 Features

Cortex-M85 outshines other Cortex-M cores with a wealth of features, including enhanced security options like pointer authentication, branch target identification (PABTI), and unprivileged debug extensions (DUE). The core also offers a 7-stage scalar pipeline and 9-10 stage vector and floating-point pipeline, with support for various data types. A detailed comparison of features between the top 3 high-end ARM microcontroller cores is illustrated below.

Table 1. Comparison between CM7 vs CM55 vs CM85

 Cortex-M7Cortex-M55Cortex-M85
ArchitectureArm v7-MArm v8.1-MArm v8.1-M
Security  PACBTI
  Unprivileged Debug
Extension
Unprivileged Debug
Extension
  Stack limit checkingStack limit checking
  TrustZoneTrustZone
MPU (PMSAv7)MPU (PMSAv8)MPU (PMSAv8) 
Pipeline6-stage superscalar and branch prediction4-stage (for main integer pipeline)7-stage scalar pipeline and 9-10 stage vector and floating-point pipeline
Helium (MVE)Not supportedSupportedSupported
FPUfp32, fp64
FPv5
fp16, fp32, fp64
FPv5
fp16, fp32, fp64
FPv5
MACs per cycle1 32bx32b8 8bx8b
4 16bx16b
2 32bx32b
8 8bx8b
4 16bx16b
2 32bx32b
CoreMarks/MHz5.294.46.28
DMIPS/MHz2.31/3.23/6.781.69/2.16/5.323.13/4.52/8.76

Benchmarks

Cortex-M85 achieves significant performance uplifts compared to other Cortex-M cores, outperforming Cortex-M7 in AI/ML performance by 4 times and Cortex-M55 by 20%.

Image

Figure 1. Performance uplift of CM85 vs CM7 and CM55 [Data Source: Arm]

Empirical data indicates that Helium technology improves the performance of some ML kernels by up to 787%, and up to 57% and 64% for fast Fourier transform and finite impulse response for floating points data types, respectively. However, do note that since Helium natively supports multiple data types, the performance uplift would be significantly higher in those instances, as will be shown in 2nd part of this blog series.

Image

Figure 2. Benchmark performance for MVE vs non-MVE aware devices.
(a) CMSIS-NN with ARM compiler AC6.15 averaged result performance over a fully connected layer
(b) CMSIS-FFT&FIR for floating point with ARM Compiler AC6.16 (normalized performance)
[Data source: Arm]

In conclusion, the Cortex-M85 with Helium can contribute to a significant uplift in AI/ML and DSP performance while outshining the rest of the Cortex-M cores in scalar performance. This makes it an ideal choice for more complex processing tasks. Stay tuned for the second part of this blog series, where we will discuss the AI demos showcased by Renesas Electronics at Embedded World 2023 and how CM85 performed in real-life applications.

Share this news on