Whisper System Information

Code analysis of the system information printed by the whisper library

Conrad Gomes • January 21, 2025

Why is NEON set to -1?

In the last post the system information displayed a set of fields depicting the features of the CPU on which it was running. We saw that NEON is set to -1 and ARM_FMA is set to 1.

Transcribe the JFK sample

My initial impression was that the NEON field was not getting set correctly and possibly leading to a degradation in performance. This assumption was incorrect.

What is NEON?

As see on the ARM website

Arm Neon is an advanced single instruction multiple data (SIMD) architecture extension for the Arm Cortex-A and Arm Cortex-R series of processors with capabilities that vastly improve use cases on mobile devices, such as multimedia encoding/decoding, user interface, 2D/3D graphics and gaming.

Looking at the above definition it would make sense to use it (if it existed on a platform) in order to accelerate the speed of computation of machine learning algorithms.

Review code displaying the system information

The system information is retrieved in the whisper_print_system_info function of whisper.cpp.

whisper.cpp::whisper_print_system_info

Checking the definition of ggml_cpu_has_neon function which exists in ggml-cpu.cpp we can see that it depends on the definiton of __ARM_ARCH and __ARM_NEON.

ggml-cpu.cpp::ggml_cpu_has_neon

According to Android’s NDK documentation:

All ARMv8-based (“arm64”) Android devices support Neon. Almost all ARMv7-based (“32-bit”) Android devices support Neon, including all devices that shipped with API level 21 or later. The NDK enables Neon by default for both.

So it looks like this feature is defined as -1 because it is predefined by the ggml library.

ggml-cpu.cpp::ggml_arm_arch_features

What about ARM_FMA?

The Fused Multiply Add (FMA) intrinsics are only available on Cortex-A5 and Cortex-M4 processors. As seen in the documentation:

Performing the calculation with a single rounding step, rather than multiplying and then adding with two roundings, can result in a better degree of accuracy.

So our Amlogic A311D processor with 4 Cortex-A73 cores and 4 Cortex-A53 cores has this enabled also.

ggml-cpu.cpp::ggml_arm_arch_fma_feature