Universal Properties of Activation Sparsity in Modern Large Language Models
Published in International Conference on Learning Representations (ICLR), 2026
Methods relying on exact zero activations do not apply to modern LLMs that use SiLU or GELU, leading to fragmented strategies and a gap in general understanding. We introduce a general framework for evaluating sparsity robustness and conduct a systematic investigation across diverse model families and scales. Our results uncover universal properties of activation sparsity, notably that the potential for effective sparsity grows with model size, and present the first study of activation sparsity in diffusion-based LLMs.
