Comparing scores across many models

by SlavikF - opened Oct 22, 2025

Oct 22, 2025

•

I compared scores across many models:

RED color - high score
BLUE color - significantly low score
GREEN background - MOE

My conclusions:

Dense models has a bit higher scores than MOE, but much slower
Overall, it looks like QWEN3-VL models are very competitive to closed state-of-the-are models

I think we need more coding scores.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment