Large Language Models Can Control Their Own Attention SpanMay 22, 2026ยท[P2] Namgyu Ho*, Huzama Ahmad*, Woosung Koh*, Se-Young Yun, Tal Schuster, Cicero Nogueira Dos Santosยท 0 min readTypeConference paperPublicationPre-printLast updated on May 29, 2026Long-Context Efficient Inference Sparse Attention mSFT: Addressing Dataset Mixtures Overfiting Heterogeneously in Multi-task SFT Mar 25, 2026 →