COMPASSLAB | CAESAR: A CNN Accelerator Exploiting Sparsity and Redundancy Pattern

Publications

CAESAR: A CNN Accelerator Exploiting Sparsity and Redundancy Pattern

2023 International Technical Conference on Circuits/Systems, Computers, and Communications (ITC-CSCC)

Seongwook Kim
Yongjun Kim
Gwangeun Byeon
Seokin Hong

Abstract

Convolutional Neural Networks (CNN) have shown outstanding performance in many computer vision applications. However, CNN Inference on mobile and edge devices is challenging due to high computation demands. Recently, many prior studies have tried to address this challenge by reducing the data precision with quantization techniques, leading to abundant redundancy in the CNN models. This paper proposes CAESAR, a CNN accelerator that eliminates redundant computations to reduce the computation demands of CNN inference. By analyzing the computation pattern of the convolution layer, CAESAR predicts the location where the redundant computations occur and removes them in the executions. After that, CAESAR remaps the remaining effectual computations on the processing elements originally mapped to the redundant computations so that all processing elements are fully utilized. Based on our evaluation with a cycle-level microarchitecture simulator, CAESAR achieves an overall speedup of up to 2.13x and saves energy by 78% over the TPU-like baseline accelerator.

Keywords

Performance evaluation

Energy consumption

Quantization (signal)

Microarchitecture

Convolution

Termination of employment

Energy efficiency

Accelerator

Convolution Neural Network

Computation Reuse