Model Benchmark

LLM performance rankings across key benchmarks

LMArena ELO Ranking

Rank Model ELO

LMArena

Crowdsourced ELO rating based on human preference. Users vote on model responses in blind tests. Most comprehensive real-world benchmark.

MMLU

Massive Multitask Language Understanding. 57 subjects from STEM to humanities. Tests broad knowledge and reasoning.