Upload README.md
Browse files
README.md
CHANGED
|
@@ -15,28 +15,37 @@ tags:
|
|
| 15 |
We introduce AceMath, a family of frontier models designed for mathematical reasoning. The models in AceMath family, including AceMath-1.5B/7B/72B-Instruct and AceMath-7B/72B-RM, are <b>Improved using Qwen</b>.
|
| 16 |
The AceMath-1.5B/7B/72B-Instruct models excel at solving English mathematical problems using Chain-of-Thought (CoT) reasoning, while the AceMath-7B/72B-RM models, as outcome reward models, specialize in evaluating and scoring mathematical solutions.
|
| 17 |
|
| 18 |
-
The AceMath-
|
| 19 |
|
| 20 |
For more information about AceMath, check our [website](https://research.nvidia.com/labs/adlr/acemath/) and [paper](https://arxiv.org/abs/2412.15084).
|
| 21 |
|
| 22 |
## All Resources
|
| 23 |
-
[AceMath-1.5B-Instruct](https://huggingface.co/nvidia/AceMath-1.5B-Instruct)   [AceMath-7B-Instruct](https://huggingface.co/nvidia/AceMath-7B-Instruct)   [AceMath-72B-Instruct](https://huggingface.co/nvidia/AceMath-72B-Instruct)
|
| 24 |
|
| 25 |
-
|
|
|
|
| 26 |
|
| 27 |
-
|
|
|
|
| 28 |
|
| 29 |
-
|
|
|
|
| 30 |
|
| 31 |
-
|
|
|
|
| 32 |
|
| 33 |
-
<p align="center">
|
| 34 |
-
<img src="https://research.nvidia.com/labs/adlr/images/acemath/acemath.png" alt="AceMath Benchmark Results" width="800">
|
| 35 |
-
</p>
|
| 36 |
|
| 37 |
|
| 38 |
-
|
| 39 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 40 |
|
| 41 |
## How to use
|
| 42 |
```python
|
|
@@ -82,7 +91,7 @@ input_ids = tokenizer.encode(
|
|
| 82 |
).to(model.device)
|
| 83 |
|
| 84 |
outputs = model(input_ids=input_ids)
|
| 85 |
-
print(outputs[0][0])
|
| 86 |
```
|
| 87 |
|
| 88 |
|
|
|
|
| 15 |
We introduce AceMath, a family of frontier models designed for mathematical reasoning. The models in AceMath family, including AceMath-1.5B/7B/72B-Instruct and AceMath-7B/72B-RM, are <b>Improved using Qwen</b>.
|
| 16 |
The AceMath-1.5B/7B/72B-Instruct models excel at solving English mathematical problems using Chain-of-Thought (CoT) reasoning, while the AceMath-7B/72B-RM models, as outcome reward models, specialize in evaluating and scoring mathematical solutions.
|
| 17 |
|
| 18 |
+
The AceMath-7B/72B-RM models are developed from their AceMath-7B/72B-Instruct models and trained on AceMath-RM-Training-Data using Bradley-Terry loss. The architecture employs standard sequence classification with a linear layer on top of the language model, using the final token to output a scalar score.pull
|
| 19 |
|
| 20 |
For more information about AceMath, check our [website](https://research.nvidia.com/labs/adlr/acemath/) and [paper](https://arxiv.org/abs/2412.15084).
|
| 21 |
|
| 22 |
## All Resources
|
|
|
|
| 23 |
|
| 24 |
+
### AceMath Instruction Models
|
| 25 |
+
- [AceMath-1.5B-Instruct](https://huggingface.co/nvidia/AceMath-1.5B-Instruct), [AceMath-7B-Instruct](https://huggingface.co/nvidia/AceMath-7B-Instruct), [AceMath-72B-Instruct](https://huggingface.co/nvidia/AceMath-72B-Instruct)
|
| 26 |
|
| 27 |
+
### AceMath Reward Models
|
| 28 |
+
- [AceMath-7B-RM](https://huggingface.co/nvidia/AceMath-7B-RM), [AceMath-72B-RM](https://huggingface.co/nvidia/AceMath-72B-RM)
|
| 29 |
|
| 30 |
+
### Evaluation & Training Data
|
| 31 |
+
- [AceMath-RewardBench](https://huggingface.co/datasets/nvidia/AceMath-RewardBench), [AceMath-Instruct Training Data](https://huggingface.co/datasets/nvidia/AceMath-Instruct-Training-Data), [AceMath-RM Training Data](https://huggingface.co/datasets/nvidia/AceMath-RM-Training-Data)
|
| 32 |
|
| 33 |
+
### Base Models
|
| 34 |
+
- [AceInstruct-1.5B](https://huggingface.co/nvidia/AceInstruct-1.5B), [AceInstruct-7B](https://huggingface.co/nvidia/AceInstruct-7B), [AceInstruct-72B](https://huggingface.co/nvidia/AceInstruct-72B)
|
| 35 |
|
|
|
|
|
|
|
|
|
|
| 36 |
|
| 37 |
|
| 38 |
+
## Reward Model Benchmark Results
|
| 39 |
|
| 40 |
+
| Model | GSM8K | MATH500 | Minerva Math | GaoKao 2023 En | Olympiad Bench | College Math | MMLU STEM | Avg. |
|
| 41 |
+
|---------------------------|-------|---------|--------------|----------------|-----------------|--------------|-----------|--------|
|
| 42 |
+
| majority@8 | 96.22 | 83.11 | 41.20 | 68.21 | 42.69 | 45.01 | 78.21 | 64.95 |
|
| 43 |
+
| Skywork-o1-Open-PRM-Qwen-2.5-7B | 96.92 | 86.64 | 41.00 | 72.34 | 46.50 | 46.30 | 74.01 | 66.24 |
|
| 44 |
+
| Qwen2.5-Math-RM-72B | 96.61 | 86.63 | 43.60 | 73.62 | 47.21 | 47.29 | 84.24 | 68.46 |
|
| 45 |
+
| AceMath-7B-RM (Ours) | 96.66 | 85.47 | 41.96 | 73.82 | 46.81 | 46.37 | 80.78 | 67.41 |
|
| 46 |
+
| AceMath-72B-RM (Ours) | 97.23 | 86.72 | 45.06 | 74.69 | 49.23 | 46.79 | 87.01 | 69.53 |
|
| 47 |
+
|
| 48 |
+
*Reward model evaluation on AceMath-RewardBench. The average results (rm@8) of reward models on math benchmarks, randomly sample 8 responses from 64 candidates with 100 random seeds. Response candidates are generated from a pool of 8 LLMs.
|
| 49 |
|
| 50 |
## How to use
|
| 51 |
```python
|
|
|
|
| 91 |
).to(model.device)
|
| 92 |
|
| 93 |
outputs = model(input_ids=input_ids)
|
| 94 |
+
print(outputs[0][0])
|
| 95 |
```
|
| 96 |
|
| 97 |
|