Model Benchmark
LLM performance rankings across key benchmarks
LMArena ELO Ranking
| Rank | Model | ELO |
|---|
LMArena
Crowdsourced ELO rating based on human preference. Users vote on model responses in blind tests. Most comprehensive real-world benchmark.
MMLU
Massive Multitask Language Understanding. 57 subjects from STEM to humanities. Tests broad knowledge and reasoning.