1
Finding Interpretable Prompt-Specific Circuits in Language Models
新方法ACC++精准追踪语言模型注意力头电路,揭示提示特定内部机制。
arXiv:2602.13483v2 Announce Type: replace-cross Abstract: Understanding the internal circuits that language models use to solve tasks remains a centra…