ยท
1 min read
Many benchmarking papers
Benefits:
reduce data contamination
decentralize benchmark construction
keep pace with evolving agents; benchmarks should also self-evolve
(2024 June) LiveBench: A Challenging, Contamination-Limited LLM Benchmark
(2024 July) AutoBencher: Towards Declarative Benchmark Construction