1
Parallel Context Compaction for Long-Horizon LLM Agent Serving
针对长时LLM Agent的上下文溢出问题,提出并行压缩方法,减少数十秒推理阻塞。
arXiv:2605.23296v1 Announce Type: new Abstract: Long-horizon LLM agents accumulate growing conversation histories that eventually exceed the model's c…