VRPRM: Process Reward Modeling via Visual Reasoning
通过视觉推理提升过程奖励建模精度,为复杂任务训练提供新思路。
arXiv:2508.03556v3 Announce Type: replace Abstract: Process Reward Model (PRM) is widely used in the post-training of Large Language Model (LLM) becau…
通过视觉推理提升过程奖励建模精度,为复杂任务训练提供新思路。
arXiv:2508.03556v3 Announce Type: replace Abstract: Process Reward Model (PRM) is widely used in the post-training of Large Language Model (LLM) becau…
揭秘大语言模型RLVR训练中的线性动力学机制,为强化学习优化提供新视角。
arXiv:2601.04537v3 Announce Type: replace-cross Abstract: Reinforcement learning with verifiable rewards (RLVR) has driven significant performance gai…
当云端API成本飙升,本地运行开源模型完成编程任务或将成主流趋势。
There's extreme price escalation on part of Anthropic, with token spend now approaching levels that have made many-an-enterprise scratch their heads. …
新方法保障LLM在线部署每轮风险可控,基于共形预测与RLVR训练,安全认证更可靠。
arXiv:2605.20270v1 Announce Type: new Abstract: A local specialist LLM, fine-tuned with reinforcement learning from verifiable rewards (RLVR) on opera…
揭示RLVR训练中参数轨迹的秩一结构,仅需极小规模训练即可外推LLM推理能力,颠覆传统认知。
arXiv:2605.21468v1 Announce Type: new Abstract: Reinforcement learning with verifiable rewards (RLVR) has become a dominant paradigm for improving rea…
提出逐步评分奖励机制,优化LLM推理的中间步骤监督,突破传统仅奖励最终答案的局限。
arXiv:2605.17291v1 Announce Type: new Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) is widely used to improve reasoning in large lan…
提出CuSearch课程采样法,通过搜索深度优化Agentic RAG的强化学习训练,提升效率
arXiv:2605.11611v2 Announce Type: replace Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as a promising paradigm for trai…
Muon优化器在视觉语言对齐与强化学习微调中暴露频谱失效问题,作者提出高通滤波器补救方案,刷新大模型训练认知。
arXiv:2605.19282v1 Announce Type: new Abstract: Muon is a matrix-aware optimizer that leverages Newton-Schulz (NS) iterations to enforce spectral grad…
提出“推理可移植性”新概念,为多模态大模型在强化学习时代的持续学习指明方向。
arXiv:2605.18903v1 Announce Type: new Abstract: Vision-Language Models in Continual Learning (VLM-CL) aim to continuously adapt to new multimodal task…
探索强化学习与可验证奖励在知识密集型领域对LLM推理能力的提升,填补研究空白。
arXiv:2605.18261v1 Announce Type: new Abstract: Reinforcement learning with verifiable rewards (RLVR) has demonstrated promising potential to enhance …
用奖励模型突破测试用例限制,实现代码大模型训练与推理阶段的可扩展强化学习。
arXiv:2602.17684v2 Announce Type: replace Abstract: Reinforcement Learning from Verifiable Rewards (RLVR) has driven recent progress in code large lan…
揭示RLVR训练中LLM对困难样本无法学习的反直觉现象,挑战现有认知
arXiv:2605.16787v1 Announce Type: new Abstract: Reinforcement Learning with Verifiable Reward (RLVR) has proven effective in improving Large Language …
手指上的AI戒指实时翻译手语,还有望拓展VR/AR交互新场景。
Article URL: https://spectrum.ieee.org/sign-language-interpreter Comments URL: https://news.ycombinator.com/item?id=48181012 Points: 1 # Comments: 0
提出双令牌约束方法,稳定知识并提升推理能力,解决RLVR中令牌均匀优化问题
arXiv:2507.15778v2 Announce Type: replace Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) has become an effective post-training method…
用React Native构建AR/VR应用并直接部署到设备,简化跨平台沉浸式开发流程。
Build AR/VR Apps in React Native + ship directly to devices Discussion | Link
一篇教你用8位微控制器(AVR芯片)搭建迷你Web服务器的硬核动手指南,代码与硬件全公开,极客味十足
Article URL: https://maurycyz.com/projects/mcusite/ Comments URL: https://news.ycombinator.com/item?id=48165295 Points: 211 # Comments: 17