CAPTCHAv2 Leaderboard
Compare model performance across different CAPTCHA types
📤 Upload Results
Option A: Using browser-use Agent Framework
Start the CAPTCHA server:
python app.pyThe server will run on
http://127.0.0.1:7860Run the browser-use agent evaluation (default is their in house model BU1.0):
python -m agent_frameworks.browseruse_cli \ --url http://127.0.0.1:7860 \ --llm browser-use \Or with a different LLM:
python -m agent_frameworks.browseruse_cli \ --url http://127.0.0.1:7860 \ --llm openai \ --model gpt-4oThe evaluation will automatically save results to
benchmark_results.jsonin the project root. Each puzzle attempt is logged as a JSON object with fields:puzzle_type,puzzle_id,user_answer,correct_answer,correctelapsed_time,timestampmodel,provider,agent_framework
Option B: Using Other Agent Frameworks
Follow your framework's evaluation protocol. Ensure results are saved in benchmark_results.json format
(JSONL: one JSON object per line) with the same field structure.
Method 1: Convert to CSV Format (Recommended)
Use the provided conversion script (convert_benchmark_to_csv.py in the project root):
python convert_benchmark_to_csv.py benchmark_results.json leaderboard/results.csv
Method 2: Directly Upload to Leaderboard (Auto-conversion)
You can upload benchmark_results.json directly here. The system will automatically handle all.
Optionally provide metadata below if auto-detection fails:
- Model Name (e.g., "gpt-4", "claude-3-sonnet", "bu-1-0")
- Provider (e.g., "OpenAI", "Anthropic", "browser-use")
- Agent Framework (e.g., "browser-use", "crewai")
Supported file formats:
- ✅
benchmark_results.json- Per-puzzle results (JSONL format) - ✅
results.csv- Aggregated results Recommended - ✅ JSON files - Single object or array of aggregated results
File format requirements:
For benchmark_results.json (per-puzzle format):
{"puzzle_type": "Dice_Count", "puzzle_id": "dice1.png", "user_answer": "24", "correct_answer": 24, "correct": true, "elapsed_time": "12.5", "timestamp": "2025-01-01T00:00:00Z", "model": "bu-1-0", "provider": "browser-use", "agent_framework": "browser-use"}
For CSV (aggregated format):
- Required columns:
Model,Provider,Agent Framework,Type,Overall Pass Rate,Avg Duration (s),Avg Cost ($), and puzzle type columns (e.g.,Dice_Count,Mirror, etc.)