← All projects Efficient ML Algorithm · ICLR 2026

CARE: Covariance-Aware and Rank-Enhanced Decomposition for Enabling Multi-Head Latent Attention

Lab-led project Converts pretrained GQA/MHA into MLA

Abstract

CARE is a covariance-aware low-rank decomposition that converts pretrained GQA/MHA models into Multi-Head Latent Attention (MLA) — achieving up to 215× lower one-shot perplexity and 1.70× mean accuracy at matched KV budgets on Llama-3.1-8B/70B and Qwen3-4B/30B. By estimating activation covariance and scheduling the singular spectrum under KV purity, it preserves activation geometry for a far stronger one-shot initialization with less healing.

Authors

Zhongzhu Zhou, Fengxiang Bie, Ziyan Chen, Zhenyu Zhang, Yibo Yang, Junxiong Wang, Ben Athiwaratkun, Xiaoxia Wu, Shuaiwen Leon Song

Links

Paper Code ↗ Project page