embedl/Cosmos-Reason2-2B-W4A16-Edge2-FlashHead Image-Text-to-Text ⢠2B ⢠Updated 5 days ago ⢠1.09k ⢠7
view post Post 92 ā” FlashHead benchmarks for Llama 3.2, Gemma 3, and Qwen3 are now on embedl/Edge-Inference-Benchmarks ! These are some of the models used in the FlashHead paper - now easier to explore and compare interactively.š Jetson AGX Thor (tok/s, batch=1):- Llama-3.2-1B: 77 ā 285 (FlashHead+W4A16, 3.7x)- Llama-3.2-3B: 34 ā 112 (3.3x)- Gemma-3-1B: 79 ā 153 (1.9x)- Qwen3-1.7B: 49 ā 189 (3.8x)- Qwen3-0.6B: 140 ā 177 (1.3x)ā Accuracy matches baseline on MMLU-Pro, IFEval, BBH, TruthfulQA, GSM8K. See translation š 1 1 + Reply
embedl/Cosmos-Reason2-2B-W4A16-Edge2-FlashHead Image-Text-to-Text ⢠2B ⢠Updated 5 days ago ⢠1.09k ⢠7