d3LLM: Ultra-Fast Diffusion LLM using Pseudo-Trajectory Distillation πŸš€

Model Description

d3LLM-LLaDA is an ultra-fast diffusion language model that achieves high generation speed while maintaining competitive performance. Built on the Dream architecture.

Key Features

  • πŸš€ High throughput: 5.0Γ— faster than autoregressive models (Qwen-2.5-7B-it) on H100 GPU, 3.5Γ— faster on A100 GPU. Achieves 288.73 tokens/s on H100 (vs 57.32 for AR baseline) on GSM8K-CoT Dataset.
  • πŸ“Š High AUP (Accuracy Under Parallelism) scores across benchmarks
  • πŸ”§ Optimized for coding and math reasoning tasks

Usage

For detailed usage instructions, evaluation scripts, training datasets, and training code, please refer to the official GitHub repository and our blog:

Downloads last month
826
Safetensors
Model size
8B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for d3LLM/d3LLM_LLaDA

Quantizations
2 models

Dataset used to train d3LLM/d3LLM_LLaDA