Graph-Regularized Sparse Autoencoders for LLM Safety Steering
提出图正则化稀疏自编码器,提升大模型安全行为干预的精准度。
arXiv:2512.06655v3 Announce Type: replace-cross Abstract: Sparse autoencoders (SAEs) are increasingly used to extract activation directions for infere…
提出图正则化稀疏自编码器,提升大模型安全行为干预的精准度。
arXiv:2512.06655v3 Announce Type: replace-cross Abstract: Sparse autoencoders (SAEs) are increasingly used to extract activation directions for infere…
自监督双轨框架首次解耦结构与内容,精准评估LLM在Web系统中的表现。
arXiv:2601.19923v2 Announce Type: replace-cross Abstract: As Large Language Models (LLMs) evolve into the core of Web-based autonomous agents and comp…
首个统一多模态脑基础模型,横跨fMRI、EEG、MEG,打破单一模态局限。
arXiv:2602.23410v3 Announce Type: replace-cross Abstract: Brain foundation models have achieved remarkable advances across a wide range of neuroscienc…
从理论视角统一分数模型与漂移模型,揭示核诱导均值漂移差异的关键机制。
arXiv:2603.07514v3 Announce Type: replace-cross Abstract: Drifting models train one-step generators by optimizing a kernel-induced mean-shift discrepa…
LLM编码代理如何优化大型代码库?这篇论文提出FormulaCode基准,评估真实场景下的整体优化能力,超越传统合成任务与二值信号。
arXiv:2603.16011v2 Announce Type: replace-cross Abstract: Large language model (LLM) coding agents increasingly operate at the repository level, motiv…
这篇论文评估大模型在差分隐私推理中的表现,为自动设计隐私保护算法提供新基准。
arXiv:2604.15851v2 Announce Type: replace-cross Abstract: Differential privacy (DP) has a wide range of applications for protecting data privacy, but …
新数据集与图增强框架,突破3D PET/CT影像的自动报告生成瓶颈。
arXiv:2604.18145v2 Announce Type: replace-cross Abstract: Automated medical report generation for 3D PET/CT imaging is fundamentally challenged by the…
无需修改参数,用提示即可让大模型忘记敏感信息,高效又安全。
arXiv:2604.21251v5 Announce Type: replace-cross Abstract: Large language models (LLMs) trained on unfiltered corpora inherently risk retaining sensiti…
提出Agent技能作为可验证工件,用双条件正确性标准解决人机协作信任问题,LLM部署的新范式
arXiv:2605.00424v2 Announce Type: replace-cross Abstract: Agent skills - structured packages of instructions, scripts, and references that augment a l…
揭示视觉语言模型语言压倒视觉导致幻觉的根源,提出几何去偏方法提升可靠性。
arXiv:2605.08245v3 Announce Type: replace-cross Abstract: Vision-Language Models (VLMs) increasingly power high-stakes applications, from medical imag…
LLM极端量化中平滑性比数值拟合更重要,揭示性能下降新成因。
arXiv:2605.08894v2 Announce Type: replace-cross Abstract: Large language models (LLMs) achieve strong performance but incur high deployment costs, mot…
揭示FFN架构稀疏性如何重塑注意力计算,影响小型Transformer模型学习机制。
arXiv:2605.09403v2 Announce Type: replace-cross Abstract: Architectural choices inside the Transformer feedforward network (FFN) block do not merely a…
提出序数分解离散奖励的ODRPO方法,提升LLM对齐中策略优化的鲁棒性,直面自动评分器的随机挑战。
arXiv:2605.12667v2 Announce Type: replace-cross Abstract: The alignment of Large Language Models (LLMs) utilizes Reinforcement Learning from AI Feedba…
论文提出on-policy self-distillation方法,在不牺牲推理能力的前提下降低LLM安全对齐中的“安全税”。
arXiv:2605.15239v1 Announce Type: new Abstract: Safety alignment often improves robustness to harmful queries at the cost of reasoning ability, a trad…
用形状分析揭示数据增强如何重塑神经网络内部表征几何,为提升泛化能力提供新视角。
arXiv:2605.15306v1 Announce Type: new Abstract: Data augmentation is widely recognized for improving generalization in deep networks, yet its impact o…
提出首个统一框架,用强化学习后训练攻克分子图生成的可控性难题,解决原子级动作空间和化学无效中间态问题
arXiv:2605.15354v1 Announce Type: new Abstract: Despite the success of foundation models in language and vision, molecular graph generation still lack…
新方法LPDS通过保留逻辑改变实体,精准测试大模型鲁棒性,避免模型因细节变化而翻车。
arXiv:2605.15393v1 Announce Type: new Abstract: As large language models (LLMs) are increasingly deployed to perform tasks with minimal human oversigh…
揭示网络生长并非修剪逆过程,探讨结构可塑性中稳定增长的关键挑战与机制。
arXiv:2605.15435v1 Announce Type: new Abstract: Standard deep-learning pipelines usually choose the network architecture before training and keep it f…
用物理视角破解神经网络损失函数最小值采样难题,耗散黎曼力学方法创新点突出。
arXiv:2605.15459v1 Announce Type: new Abstract: The minima of modern neural network loss functions are typically not isolated, rather they form connec…
提出输入输出白化SVD方法,实现自适应秩的大语言模型压缩,提升推理效率。
arXiv:2605.15626v1 Announce Type: new Abstract: Large language models deliver strong performance across language and reasoning tasks, but their storag…