Comparing scores across many models
#2
by
SlavikF
- opened
I compared scores across many models:
RED color - high score
BLUE color - significantly low score
GREEN background - MOE
My conclusions:
- Dense models has a bit higher scores than MOE, but much slower
- Overall, it looks like QWEN3-VL models are very competitive to closed state-of-the-are models
I think we need more coding scores.
