1
GEM: Geometric Entropy Mixing for Optimal LLM Data Curation
用几何熵混合优化LLM数据筛选,新方法已提交ICML 2026
arXiv:2605.26121v1 Announce Type: cross Abstract: LLM pre-training efficacy increasingly depends on data composition rather than sheer volume. Yet, op…