RandomXiong commited on
Commit
8d2158a
·
verified ·
1 Parent(s): f563d1d

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +32 -0
README.md ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ ---
4
+
5
+ ## Deepseek-R1-0528-W4AFP8
6
+
7
+ ## Model Overview
8
+ - **Model Architecture:** DeepseekV3ForCausalLM
9
+ - **Input:** Text
10
+ - **Output:** Text
11
+ - **Model Optimizations:**
12
+ - **Dense Weight quantization:** FP8
13
+ - **MOE Weight quantization:** INT4
14
+ - **Activation quantization:** FP8
15
+ - **Release Date:** 25/10/2025
16
+ - **Version:** 1.0
17
+
18
+ Quantized version of [deepseek-ai/R1-0528-W4AFP8](https://huggingface.co/deepseek-ai/R1-0528-W4AFP8)
19
+
20
+
21
+ ### Model Optimizations
22
+ These models were obtained by quantizing the weights and activations of DeepSeek models to mixed-precision data types (W4(int)A(FP)8 for MoE layers and FP8 for dense layers).
23
+ This optimization reduces the number of bits per parameter 4/8, significantly reducing GPU memory requirements.
24
+
25
+ ## Use with SGLANG
26
+ This model can be deployed efficiently using the SGLANG backend with only H200x4, as shown in the example below.
27
+ ```bash
28
+ python -m sglang.launch_server --model novita/Deepseek-R1-0528-W4AFP8 --mem-fraction-static 0.85 --disable-shared-experts-fusion --tp-size 4
29
+ ```
30
+
31
+
32
+