Positional Failures in Long-Context LLMs: A Blind Spot in Reasoning Benchmarks
揭示长上下文LLM因位置偏差导致推理失败的盲点,挑战现有基准评估体系。
arXiv:2605.23170v1 Announce Type: cross Abstract: Position-controlled evaluation is standard for retrieval tasks such as Needle-in-a-Haystack and RULE…