Comparing scores across many models

#2
by SlavikF - opened

I compared scores across many models:

RED color - high score
BLUE color - significantly low score
GREEN background - MOE

Few models

My conclusions:

  • Dense models has a bit higher scores than MOE, but much slower
  • Overall, it looks like QWEN3-VL models are very competitive to closed state-of-the-are models

I think we need more coding scores.

Sign up or log in to comment