Search-Agent Evaluation Harness/live

LLM Tool-Selection Benchmark

North-star: CPC (Cost Per Correct) = total run cost / tasks with correct retrieval strategy  ·  Priority: Quality → Latency → Dollar cost
OpenAI
Anthropic
Ollama (local, $0)