Avalanche: Optimizing Cache Utilization via Matrix Reordering for Sparse Matrix Multiplication Accelerator
Gwangeun Byeon
Seongwook Kim
Hyungjin Kim
Sukhyun Han
Jinkwon Kim
Prashant Nair
Taewook Kang
Seokin Hong
Sparse Matrix Multiplication (SpMM) is essential in various scientific and engineering applications but poses significant challenges due to irregular memory access patterns. Many hardware accelerators have been proposed to accelerate SpMM. However, they have yet to focus on on-chip memory utilization. In this paper, we highlight the underutilization of the on-chip memory in the SpMM accelerators. Then we propose Avalanche, a novel hardware accelerator that optimally utilizes the on-chip memory to efficiently cache both matrices B and C. Avalanche incorporates three key techniques: Matrix Reordering (Mat-Reorder), Dead-Product Early Eviction (DP-Evict), and Reuse Distance-Aware Matrix Caching (RM-Caching). Mat-Reorder enhances data locality by reordering the columns of matrix A, ensuring early completion of computations for matrix C. DP-Evict optimizes on-chip memory usage by promptly evicting fully computed (dead) products from on-chip memory. RM-Caching maximizes data reuse by caching frequently accessed elements of matrix B based on their reuse distance. Experimental results demonstrate that Avalanche achieves an average performance improvement of 1.97x compared to the state-of-the-art SpMM accelerator, with a chip area of 6.15 mm2.
Keywords