Model Leaderboard

AIWF Medium Context Benchmark — 30-turn multi-turn conversation evaluation with ~12K token knowledge base. Judged by Claude Opus 4.5.

View benchmark on GitHub