1
Soft-Prompt Tuning for Fair and Efficient LLM Benchmark Evaluation
LLM基准测试常因格式要求误判知识,Soft-Prompt Tuning让base模型公平展现真实能力。
arXiv:2606.12117v1 Announce Type: cross Abstract: Benchmark scores often misrepresent a large language model's (LLM's) knowledge, because they rely, e…