1
BlueFin: Benchmarking LLM Agents on Financial Spreadsheets
大模型在金融电子表格领域的专项基准测试,评估LLM agent合成、操作和理解能力,填补专业金融场景评估空白。
arXiv:2605.30907v1 Announce Type: cross Abstract: We present BlueFin, a benchmark that tasks large language model (LLM) agents with synthesis, manipul…