Models with that certain something. Non-exhaustive list, no particular order.
Barney Greenway
McG-221
AI & ML interests
LLMs on Apple Silicon
Recent Activity
reacted
to
nightmedia's
post with 🚀 7 minutes ago
Qwen3.5 Performance Metrics
With the 3.5 architecture, a lot of the old quanting methods don't work as before. I noticed this when benchmarking Deckard(qx) quants and by mistake ran a q8 that was better. That only happens if the qx sucked--and it did--enhancing layers just because they look interesting doesn't work anymore, so until I get a clear understanding of the architecture, I will publish mxfp4 and mxfp8 of the 3.5 models, that seem very stable and high performant
I will start posting here the metrics I gather from the series, starting with the smallest. If I have numbers from previous or similar models, I will post them in comparison
Qwen3.5-0.8B
```brainwaves
quant arc arc/e boolq hswag obkqa piqa wino
mxfp8 0.351,0.501,0.733,0.462,0.348,0.682,0.573
mxfp4 0.339,0.489,0.738,0.433,0.330,0.672,0.553
Old model performance
Qwen3-0.6B
bf16 0.298,0.354,0.378,0.415,0.344,0.649,0.534
q8-hi 0.296,0.355,0.378,0.416,0.348,0.652,0.529
q8 0.299,0.354,0.378,0.414,0.346,0.650,0.535
q6-hi 0.301,0.356,0.378,0.415,0.350,0.651,0.541
q6 0.300,0.367,0.378,0.416,0.344,0.647,0.524
mxfp4 0.286,0.364,0.609,0.404,0.316,0.626,0.531
Quant Perplexity Peak memory
mxfp8 6.611 ± 0.049 7.65 GB
mxfp4 7.455 ± 0.057 6.33 GB
```
Detailed metrics by model
https://huggingface.co/nightmedia/Qwen3.5-0.8B-mxfp8-mlx
https://huggingface.co/nightmedia/Qwen3.5-2B-mxfp8-mlx
https://huggingface.co/nightmedia/Qwen3.5-4B-mxfp8-mlx
https://huggingface.co/nightmedia/Qwen3.5-9B-mxfp8-mlx
https://huggingface.co/nightmedia/Qwen3.5-27B-Text
https://huggingface.co/nightmedia/Qwen3.5-122B-A10B-Text-mxfp4-mlx
More metrics coming soon.
I am running these on my Mac, an M4Max with 128GB RAM. Some performance numbers like tokens/second reflect the performance on my box.
This post will be updated with every model that gets tested. The larger models take hours, the 27B a couple days, so it will be a long process.
-G updated
a model 9 minutes ago
McG-221/Sketch-Cydonia-mlx-8Bit published
a model 9 minutes ago
McG-221/Sketch-Cydonia-mlx-8Bit