What Developers Don’t Say in Interviews—but Show on GitHub
从研究论文看GitHub上的开发者行为,揭示面试中难以言说的真实一面
When I started working on my usability study project with KServe, I interacted with KServe users to understand the challenges they were experiencing w…
从研究论文看GitHub上的开发者行为,揭示面试中难以言说的真实一面
When I started working on my usability study project with KServe, I interacted with KServe users to understand the challenges they were experiencing w…
提出逐步评分奖励机制,优化LLM推理的中间步骤监督,突破传统仅奖励最终答案的局限。
arXiv:2605.17291v1 Announce Type: new Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) is widely used to improve reasoning in large lan…
最新研究揭示LLM中两类微妙偏见——刻板印象与偏离,量化评估方法出炉
arXiv:2508.06649v3 Announce Type: replace Abstract: Large language models (LLMs) are widely applied across diverse domains, raising concerns about the…
探讨强化学习能否教会大模型长程推理,关键在于表达力,为LLM能力扩展提供新视角。
arXiv:2605.06638v3 Announce Type: replace-cross Abstract: Reinforcement learning (RL) has been applied to improve large language model (LLM) reasoning…
研究ChatGPT对通信数据自动编码,验证不同子群体间编码一致性,为自动化内容分析提供可靠性依据。
arXiv:2510.20584v3 Announce Type: replace Abstract: Assessing communication and collaboration at scale depends on a labor-intensive task of coding com…
速览强化学习稀疏奖励的半监督解决方案,来自arXiv最新研究
arXiv:2501.19128v5 Announce Type: replace-cross Abstract: In many real-world scenarios, reward signal for agents are exceedingly sparse, making it cha…
重新审视大语言模型中智能体强化学习的范式转变,传统RL vs 开放任务新思考。
arXiv:2604.27859v3 Announce Type: replace Abstract: Reinforcement Learning (RL) has traditionally focused on training specialized agents to optimize p…