Model Leaderboard
AIWF Medium Context Benchmark — 30-turn multi-turn conversation evaluation with ~12K token knowledge base. Judged by Claude Opus 4.5.
View benchmark on GitHubAIWF Medium Context Benchmark — 30-turn multi-turn conversation evaluation with ~12K token knowledge base. Judged by Claude Opus 4.5.
View benchmark on GitHub