Publications

Outlier Matters: A Statistical Analysis of LLM Tensor Distributions and Quantization Effects

40th International Technical Conference on Circuits/Systems, Computers and Communications (ITC-CSCC 2025)

  • Taein Kim

  • Seongwook Kim

  • Sukhyun Han

  • Woojin Cho

  • Youngjae Choi

  • Youngseok Bae

  • Seokin Hong

Abstract

As transformer-based Large Language Models (LLMs) grow, deploying them under resource constraints has become increasingly complex, making quantization a vital technique for efficient inference. However, unlike convolutional neural networks (CNNs), LLMs exhibit unique tensor distribution characteristics, particularly in activations, significantly hindering low-bit quantization. This paper uses a statistical analysis grounded in standard distribution theory to reveal that LLM activations contain rare but high-magnitude outliers significantly influencing model performance. Our empirical findings show that these outliers are not merely noise but contain semantically critical information, and their improper handling during quantization leads to severe accuracy degradation. To address this, we propose an efficient Outlier-Rescaled quantization method that preserves expressive outlier representations using a lightweight shift-based mechanism within a 4-bit format. Evaluations demonstrate that our method substantially restores performance lost under INT4 quantization, particularly in LLMs, without requiring additional hardware or mixed-precision schemes. This study underscores the importance of activation-aware design in LLM quantization and provides a practical path forward for ultra-low-bit deployment.

Keywords

  • Large language models
  • Convolutional neural networks
  • Deep learning
  • Accuracy
  • Quantization
  • Outlier
  • Data distribution