Francesco Tudisco

Low-Rank Compression of Language Models via Differentiable Rank Selection

Sidhant Sundrani, Francesco Tudisco, Pasquale Minervini,
In: Language Resources and Evaluation Conference (LREC) 2026, (2026)

Abstract

While low-rank decomposition offers a straightforward approach to compressing large language models, selecting optimal ranks for each layer remains challenging. Prior work using activation and loss-aware SVD has improved compression-performance tradeoffs, but choosing the right ranks across layers is still difficult. We propose LLRC (Learning to Low-Rank Compress), a gradient-based method that optimizes mask weights to select singular values without requiring post-compression fine-tuning. Using only a calibration dataset, our method trains masks to progressively eliminate singular values while preserving intermediate activation patterns from the original model. LLRC consistently outperforms competing rank-selection methods across various compression rates on reasoning and question-answering benchmarks, including a 12% improvement over STRS on MMLU at 20% compression for Llama-2-13B.

Please cite this paper as:

@inproceedings{sundrani2026lowrank,
  title={Low-Rank Compression of Language Models via Differentiable Rank Selection},
  author={Sundrani, Sidhant and Tudisco, Francesco and Minervini, Pasquale},
  booktitle={Language Resources and Evaluation Conference (LREC)},
  year={2026}
}

Links: arxiv

Keywords: low-rank compression large language models SVD model compression NLP