Low-Rank Compression of Language Models via Differentiable Rank Selection
Sidhant Sundrani,
Francesco Tudisco,
Pasquale Minervini,
In: Language Resources and Evaluation Conference (LREC) 2026,
(2026)
Abstract
While low-rank decomposition offers a straightforward approach to compressing large language models, selecting optimal ranks for each layer remains challenging. Prior work using activation and loss-aware SVD has improved compression-performance tradeoffs, but choosing the right ranks across layers is still difficult. We propose LLRC (Learning to Low-Rank Compress), a gradient-based method that optimizes mask weights to select singular values without requiring post-compression fine-tuning. Using only a calibration dataset, our method trains masks to progressively eliminate singular values while preserving intermediate activation patterns from the original model. LLRC consistently outperforms competing rank-selection methods across various compression rates on reasoning and question-answering benchmarks, including a 12% improvement over STRS on MMLU at 20% compression for Llama-2-13B.
Please cite this paper as:
@inproceedings{sundrani2026lowrank,
title={Low-Rank Compression of Language Models via Differentiable Rank Selection},
author={Sundrani, Sidhant and Tudisco, Francesco and Minervini, Pasquale},
booktitle={Language Resources and Evaluation Conference (LREC)},
year={2026}
}
Links:
arxiv
Keywords:
low-rank compression
large language models
SVD
model compression
NLP