d3LLM: Ultra-Fast Diffusion LLM using Pseudo-Trajectory Distillation π
Model Description
d3LLM-LLaDA is an ultra-fast diffusion language model that achieves high generation speed while maintaining competitive performance. Built on the Dream architecture.
Key Features
- π High throughput: 5.0Γ faster than autoregressive models (Qwen-2.5-7B-it) on H100 GPU, 3.5Γ faster on A100 GPU. Achieves 288.73 tokens/s on H100 (vs 57.32 for AR baseline) on GSM8K-CoT Dataset.
- π High AUP (Accuracy Under Parallelism) scores across benchmarks
- π§ Optimized for coding and math reasoning tasks
Usage
For detailed usage instructions, evaluation scripts, training datasets, and training code, please refer to the official GitHub repository and our blog:
- π Code repo: https://github.com/hao-ai-lab/d3LLM
- π Blog: https://hao-ai-lab.github.io/blogs/text-diffusion/
- Downloads last month
- 826