1
Decouple Searching from Training: Scaling Data Mixing via Model Merging for Large Language Model Pre-training
提出通过模型合并解耦数据混合搜索与训练,高效扩展LLM预训练的数据配比策略。
arXiv:2602.00747v2 Announce Type: replace-cross Abstract: Determining an effective data mixture is a key factor in Large Language Model (LLM) pre-trai…