[request] DeepSeek-V3.1-Terminus

by willfalco - opened 24 days ago

Discussion

willfalco

24 days ago

I think we are missing DeepSeek-V3.1-Terminus in this chain. Or is there some reason it is skipped?
Thanks

JunHowie

QuantTrio org 23 days ago

The quantized evaluation results for DeepSeek-V3.1-Terminus were not satisfactory, so we decided not to release a quantized version of this model.

willfalco

23 days ago

Maybe this will help. What is mix int4 vs other q same way but with AWQ, as AWQ should provide better accuracy
https://huggingface.co/Intel/DeepSeek-V3.1-Terminus-int4-mixed-AutoRound
no artifacts, no stall requests
MMLU-Pro:
business 87/789 wrong (89.0% accuracy)
law 355/1101 wrong (67.8% accuracy)
psychology 137/798 wrong (82.8% accuracy)
biology 64/717 wrong (91.1% accuracy)
chemistry 114/1132 wrong (89.9% accuracy)
history 101/381 wrong (73.5% accuracy)
other 169/924 wrong (81.7% accuracy)
health 181/818 wrong (77.9% accuracy)
economics 109/844 wrong (87.1% accuracy)
math 80/1351 wrong (94.1% accuracy)
physics 143/1299 wrong (89.0% accuracy)
computer science 59/410 wrong (85.6% accuracy)
philosophy 104/499 wrong (79.2% accuracy)
engineering 180/969 wrong (81.4% accuracy)

ALL CATEGORIES 1883/12032 wrong (84.4% accuracy)

tclf90

QuantTrio org 20 days ago

•

edited 20 days ago

Thanks a lot for running the quick benchmark and sharing the results — I really appreciate you taking the time to help validate the project.

Also, your note about AutoRound is super helpful. I’d always thought of it mainly as an Intel-focused tool, but from what you described it looks more like a broader “INT quantization toolkit” (covering things like GPTQ/AWQ and INT2/3/4/8 options). That definitely caught my interest. I’ll take a closer look at it and share any updates if I find something useful.

willfalco

20 days ago

@tclf90 Thank you for making these quantized models!

We had to use AutoRound for a while as vllm/sglang lacked other stable SM120 (RTX 6000 PRO) support, plus the need to use with tenzor-parallel 4 (vs usual 8).
What they actually Mix in this int4-mixed-AutoRound quantization might be helpful to achieve better AWQ with 3.1 Terminus or alike.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment