Outlier Matters: A Statistical Analysis of LLM Tensor Distributions and Quantization Effects
Taein Kim
Seongwook Kim
Sukhyun Han
Woojin Cho
Youngjae Choi
Youngseok Bae
Seokin Hong
As transformer-based Large Language Models (LLMs) grow, deploying them under resource constraints has become increasingly complex, making quantization a vital technique for efficient inference. However, unlike convolutional neural networks (CNNs), LLMs exhibit unique tensor distribution characteristics, particularly in activations, significantly hindering low-bit quantization. This paper uses a statistical analysis grounded in standard distribution theory to reveal that LLM activations contain rare but high-magnitude outliers significantly influencing model performance. Our empirical findings show that these outliers are not merely noise but contain semantically critical information, and their improper handling during quantization leads to severe accuracy degradation. To address this, we propose an efficient Outlier-Rescaled quantization method that preserves expressive outlier representations using a lightweight shift-based mechanism within a 4-bit format. Evaluations demonstrate that our method substantially restores performance lost under INT4 quantization, particularly in LLMs, without requiring additional hardware or mixed-precision schemes. This study underscores the importance of activation-aware design in LLM quantization and provides a practical path forward for ultra-low-bit deployment.
Keywords