1
FeynmanBench: Benchmarking Multimodal LLMs on Diagrammatic Physics Reasoning
多模态大模型在物理图示推理上的首个专门基准,揭示模型读图理解物理学关键短板。
arXiv:2604.03893v2 Announce Type: replace Abstract: Current multimodal benchmarks for scientific reasoning primarily evaluate local information extrac…