1
Is One Score Enough? Rethinking the Evaluation of Sequentially Evolving LLM Memory
现有LLM记忆评估靠最终准确率,但会掩盖关键失败模式,本文提出新视角
arXiv:2605.15384v1 Announce Type: cross Abstract: Memory plays a central role in enabling large language models (LLMs) to operate over sequential task…