ai-model-evaluation-and-benchmarkingPublished June 7, 2026

Arena AI (LMArena / Arena Intelligence Inc.)

Arena

OpenLM.ai

Two of these three AI benchmark competitors are the same company.

Arena (formerly LMArena, rebranded January 28, 2026) is the undisputed category leader in AI model evaluation, with $250M raised, a $1.7B valuation, and $30M ARR hit in just 4 months of commercial operation. OpenLM.ai is a lean, unfunded inference-and-leaderboard platform that aggregates Arena's own data alongside other benchmarks. These two don't compete for the same buyer. The real question is whether Arena's neutrality problem, documented in the April 2025 Leaderboard Illusion paper at NeurIPS 2025, creates a lasting opening for challengers.

Key takeaways

If you're evaluating

If you're an AI lab or enterprise needing third-party model evaluation with SLAs and auditable human-feedback data, pick Arena. If you're a developer or ML engineer wanting a free, aggregated multi-benchmark view of open-source models with full data sovereignty for inference, OpenLM.ai is worth bookmarking as a research starting point, not a production evaluation partner.

If you're building

Arena's path from free research tool to $30M ARR in 4 months shows that credibility-as-infrastructure is a real monetization strategy. The conflict-of-interest risk is also a product lesson: when your revenue comes from the same parties whose models you rank, you need structural separation between commercial and evaluation functions, or critics will fill that gap for you.

Winners by dimension

Brand authority and industry trust

Cited by OpenAI, Google, Anthropic, xAI, and Microsoft in official model launches; used as resolution source for Polymarket AI prediction markets.

Arena (arena.ai)

Free tier value

Completely free access to 400+ frontier models, 11 leaderboard types, Battle Mode, and Agent Mode with no usage caps or login required.

Arena (arena.ai)

Benchmark breadth and multi-modal coverage

11 dedicated leaderboards spanning Text, Code, Vision, Image, Video, Search, Document, and Agent. No other platform matches this breadth.

Arena (arena.ai)

Data privacy and model ownership for inference

Full model ownership, multi-cloud orchestration across 7 providers (AWS, Azure, GCP, IBM, Lambda, OCI, Cloudflare), zero data sharing with third parties.

OpenLM.ai

Benchmark aggregation depth

Chatbot Arena+ combines Arena Elo (6M+ votes), AAII v3 (10 evaluations), and ARC-AGI v2 in one free view, vs. Arena's single-methodology scores.

OpenLM.ai

Enterprise evaluation services

$30M ARR run rate within 4 months of Sept 2025 launch; SLA-backed evaluations with auditable human-feedback data across law, medicine, and software engineering.

Arena (arena.ai)

Pricing transparency

Free tier is fully public with clear boundaries. OpenLM.ai discloses no pricing, no tiers, and no ARR publicly.

Arena (arena.ai)

Neutrality and benchmark integrity

No commercial lab partnerships to defend. Arena's Leaderboard Illusion controversy showed Google got 19.2% and OpenAI 20.4% of all Arena data (per arXiv:2504.20879).

OpenLM.ai

Open-source model coverage

Tracks MIT-licensed open models with dedicated inference deployment. Arena's paper showed 83 open-weight models combined got only 29.7% of total Arena data.

OpenLM.ai

Commercial momentum and growth signal

$250M raised in 7 months, $1.7B valuation, 5 open roles vs. OpenLM.ai's 0 disclosed roles and no visible funding.

Arena (arena.ai)

Side-by-side

	Arena AI (LMArena / Arena Intelligence Inc.)	Arena	OpenLM.ai
Free tier	—	—	Chatbot Arena+ leaderboard and blog are free. No free inference tier publicly documented.
Starting paid price	—	—	Not publicly disclosed. Contact sales inferred from product pages.
Total funding raised	—	—	Not disclosed. No public funding rounds found.
Team size and hiring signal	—	—	Unknown. No public hiring page. 0 open roles detected anywhere.
Leaderboard methodology	—	—	Aggregates Arena Elo (6M+ votes), AAII v3 (10 evaluations), and ARC-AGI v2. Multi-source, not original data collection.
Benchmark integrity risk	—	—	Lower commercial conflict risk. No lab partnerships. But relies on Arena's underlying data, inheriting its integrity issues.
Enterprise data privacy	—	—	Complete data privacy and full model ownership claimed. Multi-cloud, no vendor lock-in across 7 cloud providers.
Agentic evaluation	—	—	No dedicated agentic evaluation leaderboard. Covers agentic models in static benchmark aggregation only.
Named customer social proof	—	—	Zero named customers. Cited by Stanford HAI, CMU, Capital Economics, and ACL Anthology as a data source, not as a vendor.
Open-source research output	—	—	Research-grade blog, free LLM course, curated open-source project directory with GitHub star tracking. No original peer-reviewed papers found.

Who should pick whom

AI lab or model team needing third-party pre-release evaluation

→ Arena (arena.ai)

300+ pre-release model tests conducted. Real-world human votes from 5M+ users across 150 countries, SLA-backed delivery, and auditable data samples. No other platform has this community scale or lab trust.

ML engineer comparing open-source models without sharing proprietary prompts

→ OpenLM.ai

Free Chatbot Arena+ aggregates Arena Elo, AAII v3, and ARC-AGI v2 in one view. No account required, no data shared, and the inference platform offers full model ownership with multi-cloud deployment.

Enterprise AI team in healthcare, legal, or finance needing production model selection

→ Arena (arena.ai)

Only platform with SLA-backed evaluations, auditable human-feedback samples, and named customers including NIH BiomedArena for biomedical AI. Private Arena deployments available for regulated-industry use cases.

What we found

The elephant in the room: two of these three profiles are the same company

LMArena rebranded to Arena on January 28, 2026 (per arena.ai/blog). The two profiles labeled Arena AI (LMArena) and Arena describe the same entity, Arena Intelligence Inc., now operating at arena.ai. This isn't a minor naming issue. It means the real comparison here is between one dominant, well-funded platform and one lean, unfunded aggregator. Buyers researching this space should know they're not choosing between three independent vendors.

Arena's moat is real, but its neutrality problem is also real

The April 2025 Leaderboard Illusion paper (Cohere, AI2, Princeton, Stanford authors, accepted at NeurIPS 2025) documented that Google received 19.2% and OpenAI 20.4% of all Chatbot Arena data, while 83 open-weight models combined got only 29.7% (per arXiv:2504.20879). Meta tested 27 private Llama-4 variants and cherry-picked the best score. Arena disputed some findings but didn't resolve the core structural issue: you can't fully monetize the labs you rank without creating incentive misalignment.

OpenLM.ai is a research asset, not a commercial evaluation platform

OpenLM.ai's Chatbot Arena+ leaderboard is cited in Stanford HAI policy briefs and CMU academic papers as a data source, not as a vendor. It has no disclosed customers, no pricing, no open roles, and no visible funding. Its value is as a free, multi-benchmark aggregator for developers who want to triangulate Arena Elo scores against AAII v3 and ARC-AGI v2 in one view. That's genuinely useful. It's just not a commercial competitor to Arena's enterprise evaluation service.

Pricing strategy reveals who each platform is actually building for

Arena's freemium model is a flywheel. Free access to 400+ models drives 60M conversations per month, which generates the human preference data that powers the paid enterprise product. The $30M ARR run rate in 4 months (per TechCrunch) validates the strategy. OpenLM.ai's undisclosed pricing and zero-customer-logo approach suggests it's either pre-revenue or building toward a different exit. The 11-50 person Arena team running a $1.7B-valued platform is itself a signal: the data network effect does most of the work.

Who should actually pick whom

AI labs testing pre-release models need Arena. No other platform offers 5M+ real users evaluating models blind, and labs like xAI and Microsoft already use it. Enterprises in regulated industries wanting private evaluations with SLAs should start with Arena's commercial product, but push hard on data privacy terms. Developers building on open-source models who want free, aggregated benchmark data without sharing prompts publicly should bookmark OpenLM.ai's Chatbot Arena+ as a research tool, then deploy via their own infrastructure.

Sources & references

Every claim in this report was triangulated against 14 third-party sources (analyst reports, developer surveys, news coverage, and pricing pages). Sources are listed below in citation order.

LMArena lands $1.7B valuation four months after launching its product(headline | key_stat | winners | narrative | side_by_side)
LMArena Raises $150 Million to Build the World's Most Trusted AI Evaluation Platform(key_stat | winners | side_by_side | narrative)
The Leaderboard Illusion (arXiv:2504.20879)(headline | winners | narrative | side_by_side | key_stat)
Understanding the recent criticism of the Chatbot Arena (Simon Willison)(narrative | winners | side_by_side)
LMArena is now Arena (arena.ai blog)(headline | narrative | side_by_side | takeaways)
March 2026: Arena Updates across Product, Leaderboard Rankings and Research(side_by_side | narrative | winners)
Chatbot Arena Plus | OpenLM.ai(side_by_side | winners | narrative)
Products | OpenLM.ai(side_by_side | winners | personas)
Jobs and Employment at LMArena | Simplify Jobs(side_by_side | winners | facts_strip)
Leaderboard illusion: How big tech skewed AI rankings on Chatbot Arena (Computerworld)(narrative | key_stat | winners)
LLM Benchmarks Are Junk Science (Towards AI, Oxford OII NeurIPS 2025 review)(narrative | winners)
Gaming the System: Goodhart's Law Exemplified in AI Leaderboard Controversy (Collinear AI)(narrative | winners)
China's Diverse Open-Weight AI Ecosystem and Its Policy Implications (Stanford HAI)(narrative | personas)
Deepseek V3 and R1: An Overview Of Technology Innovations (CMU KiltHub)(narrative | side_by_side)

Want to share this?

Two of these three AI benchmark competitors are the same company.

I keep seeing LMArena vs Arena AI comparisons circulating.
They rebranded in January 2026. It's one company.
Here's what the real comparison actually looks like:

Stat to drop in: Arena's Leaderboard Illusion controversy: Google got 19.2% and OpenAI got 20.4% of all Chatbot Arena evaluation data, while 83 open-weight models combined received only 29.7% (per arXiv:2504.20879, accepted NeurIPS 2025).

Share on LinkedIn →

Want this kind of report on your competitors?

ClientCues runs deep AI scans + side-by-side comparisons every week.

Try ClientCues Free