1
Conformal Selective Acting: Anytime-Valid Risk Control for RLVR-Trained LLMs
新方法保障LLM在线部署每轮风险可控,基于共形预测与RLVR训练,安全认证更可靠。
arXiv:2605.20270v1 Announce Type: new Abstract: A local specialist LLM, fine-tuned with reinforcement learning from verifiable rewards (RLVR) on opera…