1
From Guidelines to Guarantees: A Graph-Based Evaluation Harness for Domain-Specific Evaluation of LLMs
基于图的评估框架,让LLM领域评估更全面、抗污染、可维护
arXiv:2508.20810v3 Announce Type: replace Abstract: Rigorous evaluation of domain-specific language models requires benchmarks that are comprehensive,…