nvidia
/

AceMath-7B-RM

@@ -15,28 +15,37 @@ tags:
 We introduce AceMath, a family of frontier models designed for mathematical reasoning. The models in AceMath family, including AceMath-1.5B/7B/72B-Instruct and AceMath-7B/72B-RM, are <b>Improved using Qwen</b>.
 The AceMath-1.5B/7B/72B-Instruct models excel at solving English mathematical problems using Chain-of-Thought (CoT) reasoning, while the AceMath-7B/72B-RM models, as outcome reward models, specialize in evaluating and scoring mathematical solutions.
-The AceMath-1.5B/7B/72B-Instruct models are developed from the Qwen2.5-Math-1.5B/7B/72B-Base models, leveraging a multi-stage supervised fine-tuning (SFT) process: first with general-purpose SFT data, followed by math-specific SFT data. We are releasing all training data to support further research in this field.
 For more information about AceMath, check our [website](https://research.nvidia.com/labs/adlr/acemath/) and [paper](https://arxiv.org/abs/2412.15084).
 ## All Resources
-[AceMath-1.5B-Instruct](https://huggingface.co/nvidia/AceMath-1.5B-Instruct) &ensp; [AceMath-7B-Instruct](https://huggingface.co/nvidia/AceMath-7B-Instruct) &ensp; [AceMath-72B-Instruct](https://huggingface.co/nvidia/AceMath-72B-Instruct)
-[AceMath-7B-RM](https://huggingface.co/nvidia/AceMath-7B-RM) &ensp; [AceMath-72B-RM](https://huggingface.co/nvidia/AceMath-72B-RM)
-[AceMath-Instruct Training Data](https://huggingface.co/datasets/nvidia/AceMath-Instruct-Training-Data) &ensp; [AceMath-RM Training Data](https://huggingface.co/datasets/nvidia/AceMath-RM-Training-Data)
-[AceMath-RewardBench](https://huggingface.co/datasets/nvidia/AceMath-RewardBench) &ensp; [AceMath Evaluation Script](https://huggingface.co/datasets/nvidia/AceMath-RewardBench/tree/main/scripts)
-## Benchmark Results
-<p align="center">
-  <img src="https://research.nvidia.com/labs/adlr/images/acemath/acemath.png" alt="AceMath Benchmark Results" width="800">
-</p>
-Greedy decoding (pass@1) results on a variety of math reasoning benchmarks. AceMath-7B-Instruct significantly outperforms the previous best-in-class Qwen2.5-Math-7B-Instruct (67.2 vs. 62.9) and comes close to the performance of 10× larger Qwen2.5-Math-72B-Instruct (67.2 vs. 68.2). Notably, our AceMath-72B-Instruct outperforms the state-of-the-art Qwen2.5-Math-72B-Instruct (71.8 vs. 68.2), GPT-4o (67.4) and Claude 3.5 Sonnet (65.6) by a margin.
 ## How to use
 ```python
@@ -82,7 +91,7 @@ input_ids = tokenizer.encode(
 ).to(model.device)
 outputs = model(input_ids=input_ids)
-print(outputs[0][0])
 ```

 We introduce AceMath, a family of frontier models designed for mathematical reasoning. The models in AceMath family, including AceMath-1.5B/7B/72B-Instruct and AceMath-7B/72B-RM, are <b>Improved using Qwen</b>.
 The AceMath-1.5B/7B/72B-Instruct models excel at solving English mathematical problems using Chain-of-Thought (CoT) reasoning, while the AceMath-7B/72B-RM models, as outcome reward models, specialize in evaluating and scoring mathematical solutions.
+The AceMath-7B/72B-RM models are developed from their AceMath-7B/72B-Instruct models and trained on AceMath-RM-Training-Data using Bradley-Terry loss. The architecture employs standard sequence classification with a linear layer on top of the language model, using the final token to output a scalar score.pull
 For more information about AceMath, check our [website](https://research.nvidia.com/labs/adlr/acemath/) and [paper](https://arxiv.org/abs/2412.15084).
 ## All Resources
+### AceMath Instruction Models
+- [AceMath-1.5B-Instruct](https://huggingface.co/nvidia/AceMath-1.5B-Instruct), [AceMath-7B-Instruct](https://huggingface.co/nvidia/AceMath-7B-Instruct), [AceMath-72B-Instruct](https://huggingface.co/nvidia/AceMath-72B-Instruct)
+### AceMath Reward Models
+- [AceMath-7B-RM](https://huggingface.co/nvidia/AceMath-7B-RM), [AceMath-72B-RM](https://huggingface.co/nvidia/AceMath-72B-RM)
+### Evaluation & Training Data
+- [AceMath-RewardBench](https://huggingface.co/datasets/nvidia/AceMath-RewardBench), [AceMath-Instruct Training Data](https://huggingface.co/datasets/nvidia/AceMath-Instruct-Training-Data), [AceMath-RM Training Data](https://huggingface.co/datasets/nvidia/AceMath-RM-Training-Data)
+### Base Models
+- [AceInstruct-1.5B](https://huggingface.co/nvidia/AceInstruct-1.5B), [AceInstruct-7B](https://huggingface.co/nvidia/AceInstruct-7B), [AceInstruct-72B](https://huggingface.co/nvidia/AceInstruct-72B)
+## Reward Model Benchmark Results
+| Model                     | GSM8K | MATH500 | Minerva Math | GaoKao 2023 En | Olympiad Bench | College Math | MMLU STEM | Avg.   |
+|---------------------------|-------|---------|--------------|----------------|-----------------|--------------|-----------|--------|
+| majority@8               | 96.22 | 83.11   | 41.20        | 68.21          | 42.69           | 45.01        | 78.21     | 64.95  |
+| Skywork-o1-Open-PRM-Qwen-2.5-7B | 96.92 | 86.64   | 41.00        | 72.34          | 46.50           | 46.30        | 74.01     | 66.24  |
+| Qwen2.5-Math-RM-72B      | 96.61 | 86.63   | 43.60        | 73.62          | 47.21           | 47.29        | 84.24     | 68.46  |
+| AceMath-7B-RM (Ours)     | 96.66 | 85.47   | 41.96        | 73.82          | 46.81           | 46.37        | 80.78     | 67.41  |
+| AceMath-72B-RM (Ours)    | 97.23 | 86.72   | 45.06        | 74.69          | 49.23           | 46.79        | 87.01     | 69.53  |
+*Reward model evaluation on AceMath-RewardBench. The average results (rm@8) of reward models on math benchmarks, randomly sample 8 responses from 64 candidates with 100 random seeds. Response candidates are generated from a pool of 8 LLMs.
 ## How to use
 ```python
 ).to(model.device)
 outputs = model(input_ids=input_ids)
+print(outputs[0][0])
 ```