One Token to Fool LLM-as-a-Judge
只需一个token就能轻松骗过LLM评判者,揭示AI评估体系的安全软肋。
arXiv:2507.08794v3 Announce Type: replace-cross Abstract: Large language models (LLMs) are increasingly trusted as automated judges, assisting evaluat…
只需一个token就能轻松骗过LLM评判者,揭示AI评估体系的安全软肋。
arXiv:2507.08794v3 Announce Type: replace-cross Abstract: Large language models (LLMs) are increasingly trusted as automated judges, assisting evaluat…
新论文ReSum用强化学习协同LLM推理与摘要,解决长推理链低效问题,干货满满。
arXiv:2606.13316v1 Announce Type: new Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) is a central technique for improving long-horizo…
语义基础+固定惩罚约束优化,让大模型对齐过程获得可认证的安全保障
arXiv:2510.03520v2 Announce Type: replace-cross Abstract: Ensuring safety is a foundational requirement for large language models (LLMs). Achieving an…
用强化学习让大模型更诚实,TruthRL方法提升LLM回答真实性,含代码开源
arXiv:2509.25760v2 Announce Type: replace-cross Abstract: While large language models (LLMs) have demonstrated strong performance on factoid question …
非均匀令牌级信任区域优化,突破传统限制提升大模型强化学习训练稳定性。
arXiv:2606.10968v1 Announce Type: cross Abstract: Reinforcement learning with verifiable rewards (RLVR) has become standard for improving LLM reasonin…
LLM推理训练新突破:通过训练时分解攻克零奖励难题,让模型从失败轨迹中学习!
arXiv:2606.09883v1 Announce Type: cross Abstract: Large language models (LLMs) have made remarkable progress in reasoning tasks, largely driven by pos…
提出专家评分标准解决RLVR中复杂约束问题,为强化学习奖励设计提供新范式
arXiv:2606.09118v1 Announce Type: new Abstract: As LLM capabilities advance rapidly, the evaluation methods used to assess them increasingly lag behin…
CATPO方法通过批评增强的树策略优化,显著提升大语言模型推理中的密集奖励获取效率。
arXiv:2606.08346v1 Announce Type: cross Abstract: Reinforcement learning with verifiable rewards (RLVR) has become a dominant paradigm for improving t…
OpenAI冲刺上市,奥尔特曼旗下估值25亿美元的眼球扫描公司却陷入裁员与各国监管围堵。
IT之家 6 月 9 日消息,OpenAI 于当地时间周一宣布已秘密提交首次公开募股(IPO)申请,这或将成为近十年最具标志性的上市事件之一。另据 Business Insider 报道,OpenAI 首席执行官山姆 · 奥尔特曼旗下的另一家公司 Tools for Humanity 正进行裁员。 …
OpenAI冲刺IPO之际,Sam Altman的虹膜扫描公司World却因监管与商业困境进行裁员,反差揭示科技巨头生态的复杂性。
Tools for Humanity, Sam Altman's identity verification company, is reportedly struggling to generate revenue and will downsize its staff.
顶会论文揭示RLHF聚合偏好的根本缺陷,系统绘制人类对AI的真实多元需求图谱
arXiv:2606.06674v1 Announce Type: new Abstract: Large Language Models (LLMs) are often fine-tuned through Reinforcement Learning from Human Feedback (…
多引擎恶意文件检测平台,支持URL和文件扫描,社区共享威胁情报,助力白帽安全研究
IT之家 6 月 7 日消息,据科技媒体 Notebookcheck 今天报道,在全球网络安全行业人士的强烈反对之下,微软已正式收回此前针对白帽黑客“梦魇日蚀(Nightmare Eclipse)”的强硬法律威胁。 据报道,“梦魇日蚀”曾在此前绕过微软的传统漏洞提交流程,直接公开了多个 Window…
用LLM重构学习方式:Lathe让你深入新领域,而非走捷径,每个教程都透明记录来源、模型和提示。
Hey HN! Lathe is an experiment in using LLMs to teach me something new, instead of doing the work for me. It generates a hands-on, source-backed tutor…
粘贴YouTube视频链接到AI聊天框,即可自动生成视频摘要,无需手动观看,集成在AI工具中操作更便捷。
We added tooling to our chat to make it agentic. It can control our 40+ apps suite. One of the tools is url fetching with pagination. Comments URL: ht…
首个可解释AI平台Clarity,让你看到大模型使用了哪些概念并能追溯至训练数据。
Article URL: https://www.guidelabs.ai/post/meet-clarity/ Comments URL: https://news.ycombinator.com/item?id=48401606 Points: 3 # Comments: 1
系统研究强化学习对LLM的越狱攻击,揭示AI安全新风险,值得关注
arXiv:2605.07032v2 Announce Type: replace-cross Abstract: The evolution of generative models from next-token predictors to autonomous engines of compl…
一种灵活的群体训练框架,让智能体强化学习更高效协同。
arXiv:2606.04484v1 Announce Type: new Abstract: We present AgentJet, a distributed swarm training framework for large language model (LLM) agent reinf…
高效管理Agentic RL后训练资源的新方案Libra,降低训练成本、提升性能。
arXiv:2606.03077v1 Announce Type: cross Abstract: Reinforcement learning (RL) has become a standard post-training paradigm for large language models (…
AUGUSTE是面向5G URLLC的在线学习dApp,用AI预测调度实现1毫秒级超可靠低延迟通信
arXiv:2606.03664v1 Announce Type: cross Abstract: Ultra Reliable and Low Latency Communications (URLLC) was one of the main motivations behind 5G, wit…
快速将URL中的特殊字符进行编码或解码,确保链接安全无错误,支持多种编码格式
URL Encoding Explained: Special Characters and How to Handle Them 📅 May 25, 2026⏱️ 7 min read🔗 Network Tools Every character in a URL has a meaning. S…