Publications

In-Cache Processing with Power-of-Two Quantization for Fast CNN Inference on CPUs

2023 International Technical Conference on Circuits/Systems, Computers, and Communications (ITC-CSCC)

  • Joseph Woo

  • Seungtae Lee

  • Seongwook Kim

  • Gwangeun Byeon

  • Seokin Hong

Abstract

Convolutional Neural Networks (CNN) demand high computational capabilities, motivating researchers to leverage Processing-In-Memory (PIM) technology to achieve significant performance improvements. However, implementing complex arithmetic operations such as multiplication within memory is a significant challenge in the PIM architecture. To address this challenge, this paper proposes a PIM-enabled cache (PEC) architecture that utilizes shifters for performing multiplication operations at a low cost. We also introduce a filter-wise hardware-friendly Power-of-Two (POT) quantization scheme that quantizes weights into power-of-two values for specific filters to accelerate convolution operations with the PEC. Our experimental results demonstrate that the proposed PEC, together with the POT quantization, achieves 2.28x performance improvement on average with an accuracy degradation of 0.784%.

Keywords

  • Degradation
  • Computers
  • Quantization (signal)
  • Costs
  • Multicore processing
  • Convolution
  • Computer architecture
  • Convolutional Neural Network
  • Power-of-Two Quantization
  • Hardware-Friendly Quantization
  • Processing in Memory