[request] DeepSeek-V3.1-Terminus

#3
by willfalco - opened

I think we are missing DeepSeek-V3.1-Terminus in this chain. Or is there some reason it is skipped?
Thanks

QuantTrio org

The quantized evaluation results for DeepSeek-V3.1-Terminus were not satisfactory, so we decided not to release a quantized version of this model.

Maybe this will help. What is mix int4 vs other q same way but with AWQ, as AWQ should provide better accuracy
https://huggingface.co/Intel/DeepSeek-V3.1-Terminus-int4-mixed-AutoRound
no artifacts, no stall requests
MMLU-Pro:
business 87/789 wrong (89.0% accuracy)
law 355/1101 wrong (67.8% accuracy)
psychology 137/798 wrong (82.8% accuracy)
biology 64/717 wrong (91.1% accuracy)
chemistry 114/1132 wrong (89.9% accuracy)
history 101/381 wrong (73.5% accuracy)
other 169/924 wrong (81.7% accuracy)
health 181/818 wrong (77.9% accuracy)
economics 109/844 wrong (87.1% accuracy)
math 80/1351 wrong (94.1% accuracy)
physics 143/1299 wrong (89.0% accuracy)
computer science 59/410 wrong (85.6% accuracy)
philosophy 104/499 wrong (79.2% accuracy)
engineering 180/969 wrong (81.4% accuracy)

ALL CATEGORIES 1883/12032 wrong (84.4% accuracy)

Thanks a lot for running the quick benchmark and sharing the results — I really appreciate you taking the time to help validate the project.

Also, your note about AutoRound is super helpful. I’d always thought of it mainly as an Intel-focused tool, but from what you described it looks more like a broader “INT quantization toolkit” (covering things like GPTQ/AWQ and INT2/3/4/8 options). That definitely caught my interest. I’ll take a closer look at it and share any updates if I find something useful.

@tclf90 Thank you for making these quantized models!

We had to use AutoRound for a while as vllm/sglang lacked other stable SM120 (RTX 6000 PRO) support, plus the need to use with tenzor-parallel 4 (vs usual 8).
What they actually Mix in this int4-mixed-AutoRound quantization might be helpful to achieve better AWQ with 3.1 Terminus or alike.

Sign up or log in to comment