Preface: Matrices help break down large, complex datasets into digestible chunks. Matrix multiplication allows machine learning models to identify complex patterns. By updating these matrices during training, the AI system continually improves and becomes more accurate.
Background: The New Armv9 architecture feature offers significant performance uplifts for AI and ML-based applications, including generative AI. SME (Scalable Matrix Extension) is an Instruction Set Architecture (ISA) extension introduced in the Armv9-A architecture, which accelerates AI and ML workloads and enables improved performance, power efficiency, and flexibility for AI and ML-based applications running on the Arm CPU.
Technology focus: AMX was Apple’s proprietary design, it basically takes over CPU work for ML where something hasn’t been programmed for or isn’t able to be accelerated by the neural engine itself, that is bleeding edge experimental ML that hasn’t been “baked in” to the hardware. It makes the CPU less bad at sparse matrices.
Ref: The Sparse matrices are widely used in the various fields particularly in the machine learning and data science: Recommendation Systems: In collaborative filtering for the recommendation systems user-item interaction matrices are often sparse as users typically interact with the only a small subset of items.
SME is ARM’s version which is now industry standard which can be addressed by standard ARMv9 toolchains. The new feature on M4 shown that apple targeted this industry standard.
Official announcement: Apple introduces M4 chip – https://www.apple.com/hk/en/newsroom/2024/05/apple-introduces-m4-chip/