Two of these three AI benchmark competitors are the same company.
Arena (formerly LMArena, rebranded January 28, 2026) is the undisputed category leader in AI model evaluation, with $250M raised, a $1.7B valuation, and $30M ARR hit in just 4 months of commercial operation. OpenLM.ai is a lean, unfunded inference-and-leaderboard platform that aggregates Arena's own data alongside other benchmarks. These two don't compete for the same buyer. The real question is whether Arena's neutrality problem, documented in the April 2025 Leaderboard Illusion paper at NeurIPS 2025, creates a lasting opening for challengers.
Key takeaways
Winners by dimension
Side-by-side
| Arena AI (LMArena / Arena Intelligence Inc.) | Arena | OpenLM.ai | |
|---|---|---|---|
| Free tier | — | — | Chatbot Arena+ leaderboard and blog are free. No free inference tier publicly documented. |
| Starting paid price | — | — | Not publicly disclosed. Contact sales inferred from product pages. |
| Total funding raised | — | — | Not disclosed. No public funding rounds found. |
| Team size and hiring signal | — | — | Unknown. No public hiring page. 0 open roles detected anywhere. |
| Leaderboard methodology | — | — | Aggregates Arena Elo (6M+ votes), AAII v3 (10 evaluations), and ARC-AGI v2. Multi-source, not original data collection. |
| Benchmark integrity risk | — | — | Lower commercial conflict risk. No lab partnerships. But relies on Arena's underlying data, inheriting its integrity issues. |
| Enterprise data privacy | — | — | Complete data privacy and full model ownership claimed. Multi-cloud, no vendor lock-in across 7 cloud providers. |
| Agentic evaluation | — | — | No dedicated agentic evaluation leaderboard. Covers agentic models in static benchmark aggregation only. |
| Named customer social proof | — | — | Zero named customers. Cited by Stanford HAI, CMU, Capital Economics, and ACL Anthology as a data source, not as a vendor. |
| Open-source research output | — | — | Research-grade blog, free LLM course, curated open-source project directory with GitHub star tracking. No original peer-reviewed papers found. |
Who should pick whom
What we found
The elephant in the room: two of these three profiles are the same company
LMArena rebranded to Arena on January 28, 2026 (per arena.ai/blog). The two profiles labeled Arena AI (LMArena) and Arena describe the same entity, Arena Intelligence Inc., now operating at arena.ai. This isn't a minor naming issue. It means the real comparison here is between one dominant, well-funded platform and one lean, unfunded aggregator. Buyers researching this space should know they're not choosing between three independent vendors.
Arena's moat is real, but its neutrality problem is also real
The April 2025 Leaderboard Illusion paper (Cohere, AI2, Princeton, Stanford authors, accepted at NeurIPS 2025) documented that Google received 19.2% and OpenAI 20.4% of all Chatbot Arena data, while 83 open-weight models combined got only 29.7% (per arXiv:2504.20879). Meta tested 27 private Llama-4 variants and cherry-picked the best score. Arena disputed some findings but didn't resolve the core structural issue: you can't fully monetize the labs you rank without creating incentive misalignment.
OpenLM.ai is a research asset, not a commercial evaluation platform
OpenLM.ai's Chatbot Arena+ leaderboard is cited in Stanford HAI policy briefs and CMU academic papers as a data source, not as a vendor. It has no disclosed customers, no pricing, no open roles, and no visible funding. Its value is as a free, multi-benchmark aggregator for developers who want to triangulate Arena Elo scores against AAII v3 and ARC-AGI v2 in one view. That's genuinely useful. It's just not a commercial competitor to Arena's enterprise evaluation service.
Pricing strategy reveals who each platform is actually building for
Arena's freemium model is a flywheel. Free access to 400+ models drives 60M conversations per month, which generates the human preference data that powers the paid enterprise product. The $30M ARR run rate in 4 months (per TechCrunch) validates the strategy. OpenLM.ai's undisclosed pricing and zero-customer-logo approach suggests it's either pre-revenue or building toward a different exit. The 11-50 person Arena team running a $1.7B-valued platform is itself a signal: the data network effect does most of the work.
Who should actually pick whom
AI labs testing pre-release models need Arena. No other platform offers 5M+ real users evaluating models blind, and labs like xAI and Microsoft already use it. Enterprises in regulated industries wanting private evaluations with SLAs should start with Arena's commercial product, but push hard on data privacy terms. Developers building on open-source models who want free, aggregated benchmark data without sharing prompts publicly should bookmark OpenLM.ai's Chatbot Arena+ as a research tool, then deploy via their own infrastructure.
Sources & references
Every claim in this report was triangulated against 14 third-party sources (analyst reports, developer surveys, news coverage, and pricing pages). Sources are listed below in citation order.
- LMArena lands $1.7B valuation four months after launching its product(headline | key_stat | winners | narrative | side_by_side)
- LMArena Raises $150 Million to Build the World's Most Trusted AI Evaluation Platform(key_stat | winners | side_by_side | narrative)
- The Leaderboard Illusion (arXiv:2504.20879)(headline | winners | narrative | side_by_side | key_stat)
- Understanding the recent criticism of the Chatbot Arena (Simon Willison)(narrative | winners | side_by_side)
- LMArena is now Arena (arena.ai blog)(headline | narrative | side_by_side | takeaways)
- March 2026: Arena Updates across Product, Leaderboard Rankings and Research(side_by_side | narrative | winners)
- Chatbot Arena Plus | OpenLM.ai(side_by_side | winners | narrative)
- Products | OpenLM.ai(side_by_side | winners | personas)
- Jobs and Employment at LMArena | Simplify Jobs(side_by_side | winners | facts_strip)
- Leaderboard illusion: How big tech skewed AI rankings on Chatbot Arena (Computerworld)(narrative | key_stat | winners)
- LLM Benchmarks Are Junk Science (Towards AI, Oxford OII NeurIPS 2025 review)(narrative | winners)
- Gaming the System: Goodhart's Law Exemplified in AI Leaderboard Controversy (Collinear AI)(narrative | winners)
- China's Diverse Open-Weight AI Ecosystem and Its Policy Implications (Stanford HAI)(narrative | personas)
- Deepseek V3 and R1: An Overview Of Technology Innovations (CMU KiltHub)(narrative | side_by_side)
Want to share this?
I keep seeing LMArena vs Arena AI comparisons circulating. They rebranded in January 2026. It's one company. Here's what the real comparison actually looks like:
Want this kind of report on your competitors?
ClientCues runs deep AI scans + side-by-side comparisons every week.
Try ClientCues Free