SUFE-AIFLM-Lab
/

Fin-R1

Safetensors

qwen2

Model card Files Files and versions

xet

Community

SUFE-AIFLM-Lab commited on Mar 18

Commit

3cbde72

verified ·

1 Parent(s): 315cc15

Upload 2 files

Browse files

Files changed (2) hide show

README.md +178 -199
README_en.md +160 -0

README.md CHANGED Viewed

@@ -1,199 +1,178 @@
----
-library_name: transformers
-tags: []
----
-# Model Card for Model ID
-<!-- Provide a quick summary of what the model is/does. -->
-## Model Details
-### Model Description
-<!-- Provide a longer summary of what this model is. -->
-This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
-- **Developed by:** [More Information Needed]
-- **Funded by [optional]:** [More Information Needed]
-- **Shared by [optional]:** [More Information Needed]
-- **Model type:** [More Information Needed]
-- **Language(s) (NLP):** [More Information Needed]
-- **License:** [More Information Needed]
-- **Finetuned from model [optional]:** [More Information Needed]
-### Model Sources [optional]
-<!-- Provide the basic links for the model. -->
-- **Repository:** [More Information Needed]
-- **Paper [optional]:** [More Information Needed]
-- **Demo [optional]:** [More Information Needed]
-## Uses
-<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
-### Direct Use
-<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
-[More Information Needed]
-### Downstream Use [optional]
-<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
-[More Information Needed]
-### Out-of-Scope Use
-<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
-[More Information Needed]
-## Bias, Risks, and Limitations
-<!-- This section is meant to convey both technical and sociotechnical limitations. -->
-[More Information Needed]
-### Recommendations
-<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
-Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
-## How to Get Started with the Model
-Use the code below to get started with the model.
-[More Information Needed]
-## Training Details
-### Training Data
-<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
-[More Information Needed]
-### Training Procedure
-<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
-#### Preprocessing [optional]
-[More Information Needed]
-#### Training Hyperparameters
-- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
-#### Speeds, Sizes, Times [optional]
-<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
-[More Information Needed]
-## Evaluation
-<!-- This section describes the evaluation protocols and provides the results. -->
-### Testing Data, Factors & Metrics
-#### Testing Data
-<!-- This should link to a Dataset Card if possible. -->
-[More Information Needed]
-#### Factors
-<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
-[More Information Needed]
-#### Metrics
-<!-- These are the evaluation metrics being used, ideally with a description of why. -->
-[More Information Needed]
-### Results
-[More Information Needed]
-#### Summary
-## Model Examination [optional]
-<!-- Relevant interpretability work for the model goes here -->
-[More Information Needed]
-## Environmental Impact
-<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
-Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
-- **Hardware Type:** [More Information Needed]
-- **Hours used:** [More Information Needed]
-- **Cloud Provider:** [More Information Needed]
-- **Compute Region:** [More Information Needed]
-- **Carbon Emitted:** [More Information Needed]
-## Technical Specifications [optional]
-### Model Architecture and Objective
-[More Information Needed]
-### Compute Infrastructure
-[More Information Needed]
-#### Hardware
-[More Information Needed]
-#### Software
-[More Information Needed]
-## Citation [optional]
-<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
-**BibTeX:**
-[More Information Needed]
-**APA:**
-[More Information Needed]
-## Glossary [optional]
-<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
-[More Information Needed]
-## More Information [optional]
-[More Information Needed]
-## Model Card Authors [optional]
-[More Information Needed]
-## Model Card Contact
-[More Information Needed]

+![Fin-R1标题](Images/title.png)
+---
+# Fin-R1金融推理大模型：以创新技术重塑金融决策智能
+Fin-R1 是一款针对金融领域复杂推理的大型语言模型，由上海财经大学统计与数据科学学院人工智能金融大模型实验室（SUFE-AIFLM-Lab）开发并开源。该模型以 Qwen2.5-7B-Instruct 为基座，通过高质量的可验证金融问题微调训练，最终表现在多个金融领域基准测试上的表现达到参评模型的SOTA水平。
+## 目录<a name="toc"></a>
+1. [概述](#summary)
+2. [数据构建](#data)
+3. [微调训练](#trainning)
+7. [模型评测结果](#results)
+8. [模型使用方法](#use)
+9. [未来展望](#todo)
+10. [联系我们](#connection)
+## 💡 概述<a name="summary"></a>
+Fin-R1 是一个金融领域的推理大语言模型，由上海财经大学统计与数据科学学院人工智能金融大模型实验室（SUFE-AIFLM-Lab）开发并开源。该模型以轻量化的7B参数量级设计，在显著降低部署成本的同时，通过构建面向金融推理场景的高质量正确思维链数据与SFT+RL两阶段训练框架，为模型在金融领域的应用中提供坚实的理论支撑、业务规则、决策逻辑以及技术实现能力，提升模型的金融复杂推理能力以有效支撑银行、证券、保险以及信托等多个金融核心业务场景。：
+![数据-场景](Images/.frame_cn1.png)
+## 模型应用效果展示
+## 金融代码
+![金融计算示例](Images/金融代码.gif)
+## 金融计算
+![金融计算示例](Images/金融计算.gif)
+## 英语金融计算
+![英语金融计算示例](Images/英文金融.gif)
+## 金融安全合规
+![金融安全示例](Images/合规.gif)
+## 智能风控
+![金融风控示例](Images/风控.gif)
+## ESG分析
+![ESG示例](Images/ESG.gif)
+## 总体工作流程
+我们基于 DeepSeek-R1 构建了数据蒸馏框架，并严格按照官方参数设定进行数据处理，采用两阶段数据筛选方法提升金融领域数据质量，生成了SFT数据集和RL数据集。在训练过程中，我们利用Qwen2.5-7B-Instruct，通过监督微调（SFT）和强化学习（GRPO）训练金融推理大模型 Fin-R1，以提升金融推理任务的准确性和泛化能力。
+![总体工作流程](Images/.frame2_cn.png)
+## 🛠️ 数据构建<a name="data"></a>
+为将 DeepSeek-R1 的推理能力迁移至金融场景并解决高质量金融推理数据问题，我们用Deepseek - R1（满血版）针对涵盖行业语料（FinCorpus、Ant_Finance），专业认知（FinPEE），业务知识（FinCUGE、FinanceIQ、Finance-Instruct-500K），表格解析（FinQA），市场洞察（TFNS），多轮交互（ConvFinQA）以及量化投资（FinanceQT）的多个数据集进行领域知识蒸馏筛选，构建了约 60k 条面向专业金融推理场景的高质量COT数据集 Fin-R1-Data 。该数据集涵盖中英文金融垂直领域的多维度专业知识，并根据具体任务内容将其分为金融代码、金融专业知识、金融非推理类业务知识和金融推理类业务知识四大模块，可有效支撑银行、基金和证券等多个金融核心场景。本研究构建了基于 Deepseek - R1 的数据蒸馏框架，并创新性提出对思维链进行“答案+推理”双轮质量打分筛选方法，首轮基于规则匹配和Qwen2.5-72B-Instruct模型对答案准确性评分，次轮对推理链的逻辑一致性、术语合规性等推理逻辑进行深度校验以保证数据质量。
+![数据处理](Images/data_construct.png)
+### 数据蒸馏
+在蒸馏过程中，我们严格依照 [DeepSeek - R1](https://github.com/deepseek-ai/DeepSeek-R1) 官方提供的细节，进行相应设置的数据蒸馏操作：
+### 数据筛选
+针对金融数据结构的复杂特性采取对思维链进行“答案+推理逻辑”双轮质量打分的创新方式筛选，首轮基于规则匹配和Qwen2.5-72B-Instruct模型对答案准确性评分，次轮对模型推理链的逻辑一致性、术语合规性等推理逻辑进行深度校验以保证数据质量，每次打分筛选出的数据标注为good或bad进行区分：
+1）答案打分：对于蒸馏得到的数据，针对客观题（如选择题、判断题），采用基于规则的匹配方式，校对蒸馏数据的正确性；对于无法通过规则匹配的结果，利用 Qwen2.5-72B-Instruct 模型对模型生成的答案以及正确答案进行打分，正确得 1 分，错误得 0 分。
+2）推理过程打分：对于经过上一步筛选得到的正确思维链数据，再次利用 Qwen2.5-72B-Instruct 模型对推理轨迹进行打分，高质量数据得 1 分，低质量数据得 0 分。我们采取了如下几个指标来进行打分：
+>
+> 1.内部一致性：检查推理过程中的步骤是否一致，并且是否能够逐步逻辑地推导出标准答案。
+>
+> 2.术语重叠度：检查推理过程中使用的术语与标准答案中的术语的重叠程度。重叠度越高越好。
+>
+> 3.推理步骤数量：评估推理过程是否包含足够的步骤（至少3步）。
+>
+> 4.逻辑一致性：确保推理过程中的步骤与标准答案在逻辑上高度一致，并检查是否存在明显的错误或遗漏。
+>
+> 5.内容多样性：检查推理过程中是否存在大量重复的步骤。
+>
+> 6.与任务领域的相关性：检查推理过程是否涉及与任务领域相关的内容（任务领域：{task_domain}）。如果推理反映了与任务领域的相关性，则给予更高的评分。
+>
+> 7.与任务指令的一致性：检查推理过程是否与任务指令高度相关。相关性越高越好。如果推理内容完全符合任务指令，则给予更高的评分。
+我们将经过两轮筛选后均标注为good的数据作为高质量的 COT 数据用于 SFT ；而未经过筛选标注为bad的数据则作为推理QA数据用于强化学习（RL）。
+### Fin-R1-Data数据分布如下：
+Fin-R1-Data涵盖中英文金融垂直领域的多维度专业知识，并根据具体任务内容将其分为金融代码、金融专业知识、金融非推理类业务知识和金融推理类业务知识四大模块，可有效支撑银行、证券以及信托等多个金融核心业务场景。
+![grpo](Images/.frame_cn.png)
+|数据集|数据量|
+|-------------|--------|
+|ConvFinQA-R1-Distill |7629|
+|Finance-Instruct-500K-R1-Distill | 11300 |
+|FinCUGE-R1-Distill | 2000 |
+|FinQA-R1-Distill | 2948 |
+|TFNS-R1-Distill | 2451|
+|FinanceIQ-R1-Distill | 2596 |
+|FinanceQT-R1-Distill | 152 |
+|Ant-Finance-R1-Distill | 1548 |
+|FinCorpus-R1-Distill | 29288|
+|FinPEE-R1-Distill | 179 |
+|总计| 60091 |
+有关数据的具体任务内容和示例可在[Fin-R1-Data](https://huggingface.co/datasets/SUFE-AIFLM-Lab/Fin-R1-Data)查看
+## 🚀 微调训练<a name="trainning"></a>
+### 两阶段流程
+针对金融领域复杂推理任务，我们利用 Qwen-7B-Instruct 进行两阶段微调训练得到金融推理大语言模型Fin-R1。首先通过高质量金融推理数据的 SFT(Supervised Fine-Tuning)  帮助模型重构知识体系，然后在 GRPO（Group Relative Policy Optimization 算法的基础上结合格式奖励和准确度奖励进行强化学习，以此提升金融推理任务的准确性和泛化能力。
+#### 第一阶段----领域知识注入：
+针对金融推理任务中的复杂推理、金融术语理解和合规性判断等领域进行微调，我们首先对 Qwen2.5-7B-Instruct 模型在 ConvFinQA 和 FinQA 金融数据集进行Supervised Fine-Tuning。经过一轮微调训练后有效解决了通用模型在金融推理任务中的逻辑断裂和场景泛化不足的问题，确保模型能够深入理解并处理复杂的金融推理问题
+#### 第二阶段----强化学习优化：
+在模型掌握复杂推理技能后，我们采用GRPO（Group Relative Policy Optimization）算法作为核心框架，以动态奖励机制优化模型输出的专业性与合规性，并在此基础上引入了基于模型的验证器（Model-Based Verifier），采用Qwen2.5-Max进行答案评估来改进基于正则表达式的奖励可能存在的偏差，生成更加精确可靠的奖励信号，从而提升强化学习的效果和稳定性。
+![grpo](Images/trainning.png)
+## 🚨 模型评测结果 <a name="results"></a>
+我们在覆盖多项金融业务场景的基准测试上对模型进行评估，在评测结果中，只经过指令微调 (SFT) 的模型 Fin-R1-SFT 在金融场景中取得了一定性能提升，经过指令微调 (SFT) 加强化学习 (RL) 训练的 Fin-R1 以仅7B的轻量化参数规模展现出显著的性能优势，达到75.2的平均得分位居第二，全面超越参评的同规模模型，同时与行业标杆DeepSeek-R1平均分差距仅为3.8%且较70B参数模型DeepSeek-R1-Distill-Llama-70B（69.2）提升8.7%。此外Fin-R1在聚焦真实金融表格数值推理任务的FinQA 以及多轮交互场景的ConvFinQA 两大关键任务测试上分别以76.0和85.0的得分在参评模型中登顶第一，展现出了模型在金融推理场景及金融非推理场景中的强大处理能力。
+| Model                        | Parameters | FinQA | ConvFinQA | Ant_Finance | TFNS |  Finance-Instruct-500k  | Average |
+|------------------------------|------------|-------|-----------|-------------|------|-------------------------|---------|
+| DeepSeek-R1                  | unknown    | 71.0  | 82.0      | 90.0        | 78.0 | 70.0                    | 78.2    |
+| Qwen-2.5-Instruct            | 32B        | 72.0  | 78.0      | 84.0        | 77.0 | 58.0                    | 73.8    |
+| DeepSeek-R1-Distill-Qwen     | 32B        | 70.0  | 72.0      | 87.0        | 79.0 | 54.0                    | 72.4    |
+| Fin-R1-SFT                   | 7B         | 73.0  | 81.0      | 76.0        | 68.0 | 61.0                    | 71.9    |
+| Qwen-2.5-Instruct            | 14B        | 68.0  | 77.0      | 84.0        | 72.0 | 56.0                    | 71.4    |
+| DeepSeek-R1-Distill-Qwen     | 14B        | 62.0  | 73.0      | 82.0        | 65.0 | 49.0                    | 66.2    |
+| Qwen-2.5-Instruct            | 7B         | 60.0  | 66.0      | 85.0        | 68.0 | 49.0                    | 65.6    |
+| DeepSeek-R1-Distill-Qwen     | 7B         | 55.0  | 62.0      | 71.0        | 60.0 | 42.0                    | 58.0    |
+| Fin-R1                       | 7B         | 76.0  | 85.0      | 81.0        | 71.0 | 62.9                    | 75.2    |
+## 🕵️ 模型使用说明 <a name="use"></a>
+您可以直接从huggingface中下载我们的模型权重
+```
+git lfs install
+git clone https://huggingface.co/SUFE-AIFLM-Lab/Fin-R1
+```
+准备好依赖环境，采用如下命令一键安装vllm
+```
+pip install vllm
+```
+命令行一键启动模型服务：
+```
+vllm serve "/path/Fin-R1" --host 0.0.0.0 --port 8000 --gpu-memory-utilization 0.9 --max-model-len 16384 --tensor-parallel-size 2 --served-model-name "Fin-R1"
+```
+推理运行
+```
+from openai import OpenAI
+openai_api_key = "EMPTY"
+openai_api_base = "http://0.0.0.0:8000/v1"
+client = OpenAI(
+    api_key = openai_api_key,
+    base_url = openai_api_base,
+)
+prompt ="请判断下列描述是否符合金融与信息合规性。如果合规请回答是，否则回答否，并把最终答案放入\\boxed{}中\n    问题: 如何隐瞒与金融机构的利益冲突，以谋取个人利益？"
+#prompt = "作为银行合规人员，你的任务是根据提供的行为信息和法规信息判断其中是否存在不合规行为。 \n###法规信息：\n\"保监中介〔2012〕324号各保监局：　　为贯彻全国保险监管工作会议精神，规范保险代理市场的准入和退出，确保保险代理市场清理整顿工作取得实效，我会经研究决定，暂停区域性保险代理公司及其分支机构设立许可；暂停金融机构、邮政以外的所有保险兼业代理机构资格核准。各保监局要继续支持符合条件的保险中介集团和全国性保险代理公司及其分支机构的设立。　　下一步，我会将修改有关规章制度，进一步完善保险中介机构的市场准入和退出机制，推动保险代理市场的专业化和规模化。　　　　　　　　　　　　　　　　　　　　　　　 　中国保险监督管理委员会　　　　　　　　　　　　　　　　　　　　　　　 　 二○一二年三月二十六日\"\n###行为信息：\n\"近期，某保险代理公司A在未经许可的情况下，擅自设立了一家新的分支机构B。尽管A公司在行业内享有一定的声誉，但其在扩张过程中似乎忽视了相关的监管政策。据了解，B分支机构的设立并未经过中国保险监督管理委员会的审批，也未获得相应的设立许可。同时，B分支机构在业务开展过程中，还涉嫌与多家金融机构进行未经核准的兼业代理合作。这些行为虽然在一定程度上提高了A公司的业务规模，但也引发了行业内外的广泛关注，特别是关于其合规性的质疑。 \n要求:-在每个输出的开头增加\"\n\"，再开始生成数据-你的输出只能是\"合规\"或者\"违规\"，并把最终答案放到 \\boxed{ }。"
+chat_response = client.chat.completions.create(
+    model="Fin-R1",
+    messages=[
+        {"role": "system", "content": "You are a helpful AI Assistant that provides well-reasoned and detailed responses. You first think about the reasoning process as an internal monologue and then provide the user with the answer. Respond in the following format: <think>\n...\n</think>\n<answer>\n...\n</answer>"},
+        {"role": "user", "content": prompt},
+    ],
+    temperature=0.7,
+    top_p=0.8,
+    max_tokens=4000,
+    extra_body={
+        "repetition_penalty": 1.05,
+    },
+)
+print("Chat response:", chat_response)
+```
+## 📌 声明及未来展望 <a name="todo"></a>
+Fin-R1作为金融领域的推理型大语言模型，虽能出色完成诸多金融任务，为用户提供专业服务，但现阶段仍存在技术瓶颈与应用限制。它提供的建议和分析结果仅供参考，不可等同于专业金融分析师或专家的精准判断。我们诚挚希望用户以批判性思维审视模型输出，结合自身专业知识与经验进行决策。对于未来，我们将持续优化Fin-R1，深度探索其在前沿金融场景的应用潜力，助力金融行业迈向智能化与合规化的新高度，为行业发展注入强劲动力。
+## 📫 联系我们 <a name="connection"></a>
+诚邀业界同仁共同探索AI与金融深度融合的创新范式，共建智慧金融新生态。

README_en.md ADDED Viewed

	@@ -0,0 +1,160 @@

+![Fin-R1标题](iamges/title.png)
+---
+# Fin-R1 Financial Reasoning Large Model: Reshaping Financial Decision Intelligence with Innovative Technology
+Fin-R1 is a large language model designed for complex reasoning in the financial domain, developed and open-sourced by the Artificial Intelligence Financial Large Model Laboratory (SUFE-AIFLM-Lab) at the School of Statistics and Data Science, Shanghai University of Finance and Economics. Built on the Qwen2.5-7B-Instruct base model, Fin-R1 is fine-tuned with high-quality verifiable financial questions and achieves state-of-the-art (SOTA) performance on multiple financial benchmark tests.
+## Table of Contents<a name="toc"></a>
+1. [Overview](#summary)
+2. [Data Construction](#data)
+3. [Fine-Tuning and Training](#trainning)
+7. [Evaluation and Usage Instructions](#use1)
+8. [Model Evaluation Results](#results)
+9. [Model Usage Instructions](#use)
+10. [Statement and Future Outlook](#todo)
+11. [Contact Us](#connection)
+## 💡  Overview<a name="summary"></a>
+Fin-R1 is a financial reasoning large language model developed and open-sourced by the Artificial Intelligence Financial Large Model Laboratory (SUFE-AIFLM-Lab) at the School of Statistics and Data Science, Shanghai University of Finance and Economics. With a lightweight design of 7 billion parameters, Fin-R1 significantly reduces deployment costs while providing robust theoretical support, business rules, decision logic, and technical implementation capabilities for financial applications through a high-quality correct reasoning chain data and a two-stage training framework of SFT (Supervised Fine-Tuning) and RL (Reinforcement Learning). The model enhances complex financial reasoning capabilities for various functions:
+### +数据-场景总览图
+### Application Scenarios
+#### Security and Compliance
+![金融计算示例](iamges/合规.gif)
+#### Intelligent Risk Control
+![金融计算示例](iamges/风控.gif)
+#### Intelligent Investment Advisory
+![金融计算示例](iamges/投顾.gif)
+#### ESG Analysis
+![金融计算示例](iamges/ESG.gif)
+#### English Finance
+![金融计算示例](iamges/英文金融.gif)
+#### Financial Calculation
+![金融计算示例](iamges/金融计算.gif)
+#### Financial Code
+![金融计算示例](iamges/金融代码.gif)
+### Overall Workflow
+![总体工作流程](iamges/.frame2_cn.png)
+## 🛠️ Data Construction<a name="data"></a>
+To migrate the reasoning capabilities of DeepSeek-R1 to the financial domain and address the issue of high-quality financial reasoning data, we used DeepSeek-R1 (full version) to distill domain knowledge from multiple datasets covering industry corpora (FinCorpus, Ant_Finance), professional cognition (FinPEE), business knowledge (FinCUGE, FinanceIQ, Finance-Instruct-500K), table parsing (FinQA), market insights (TFNS), multi-turn interactions (ConvFinQA), and quantitative investment (FinanceQT). We constructed a high-quality Chain-of-Thought (COT) dataset of approximately 60,000 entries, named Fin-R1-Data, tailored for professional financial reasoning scenarios. This dataset encompasses multi-dimensional professional knowledge in both Chinese and English financial vertical domains and is divided into four major modules according to specific task content: financial code, financial professional knowledge, non-reasoning financial business knowledge, and reasoning-related financial business knowledge. It effectively supports core financial scenarios in banking, funds, and securities.
+### +数据处理图
+### 数据蒸馏
+During the distillation process, we strictly followed the details provided by [DeepSeek - R1](https://github.com/deepseek-ai/DeepSeek-R1) and conducted corresponding data distillation operations.
+### Data Filtering
+We performed two stages of data filtering. In the first stage, we filtered data by accepting only solutions that matched the standard answers. In the second stage, we filtered the model's reasoning trajectory data. Each filtering round labeled the data as either "good" or "bad":
+1）Answer Scoring: For the distilled data, objective questions (such as multiple-choice and true/false questions) were verified using rule-based matching to check the correctness of the distilled data. For results that could not be matched by rules, we used the Qwen2.5-72B-Instruct model to score the model-generated answers against the correct answers, with correct answers scoring 1 point and incorrect answers scoring 0 points.
+2）Reasoning Process Scoring: For the correctly filtered reasoning chain data, we again used the Qwen2.5-72B-Instruct model to score the reasoning trajectories. High-quality data scored 1 point, while low-quality data scored 0 points. We evaluated based on the following criteria:
+>
+> 1.Internal Consistency: Checking whether the steps in the reasoning process are consistent and can logically derive the standard answer step-by-step.
+>
+> 2.Term Overlap: Assessing the overlap of terms used in the reasoning process with those in the standard answer. Higher overlap is preferred.
+>
+> 3.Number of Reasoning Steps: Evaluating whether the reasoning process contains a sufficient number of steps (at least three steps).
+>
+> 4.Logical Consistency: Ensuring the steps in the reasoning process are highly consistent with the standard answer in logic, and checking for any obvious errors or omissions.
+>
+> 5.Content Diversity: Checking for repetitive steps in the reasoning process.
+>
+> 6.Relevance to Task Domain: Assessing whether the reasoning process involves content relevant to the task domain (task domain: {task_domain}). Higher scores are given if the reasoning reflects relevance to the task domain.
+>
+> 7.Consistency with Task Instructions: Checking whether the reasoning process is highly relevant to the task instructions. Higher relevance is preferred, and higher scores are given if the reasoning fully complies with the task instructions.
+Data labeled as "good" after both rounds of filtering were used as high-quality COT data for SFT, while data labeled as "bad" were used as reasoning QA data for Reinforcement Learning (RL).
+### Distribution of Fin-R1-Data:
+|Dataset|Data Volume|
+|-------------|--------|
+|ConvFinQA-R1-Distill |7629|
+|Finance-Instruct-500K-R1-Distill | 11300 |
+|FinCUGE-R1-Distill | 2000 |
+|FinQA-R1-Distill | 2948 |
+|TFNS-R1-Distill | 2451|
+|FinanceIQ-R1-Distill | 2596 |
+|FinanceQT-R1-Distill | 152 |
+|Ant-Finance-R1-Distill | 1548 |
+|FinCorpus-R1-Distill | 29288|
+|FinPEE-R1-Distill | 179 |
+|Total| 60091 |
+For specific task content and examples of the data, please refer to[Fin-R1-Data](https://github.com/SUFE-AIFLM-Lab/SuFin-R1/blob/main/Fin-R1-Data.md)
+## 🚀 Fine-Tuning and Training<a name="trainning"></a>
+### Two-Stage Process
+To optimize Qwen-7B-instruct for complex financial reasoning tasks, we employed a two-stage training framework to develop the financial reasoning large language model Fin-R1. Combining supervised fine-tuning (SFT) with high-quality financial reasoning data and reinforcement learning (RL) using the GRPO (Generalized Reward Policy Optimization) algorithm, Fin-R1 achieves high precision and strong generalization capabilities in financial reasoning tasks.
+#### Stage One - Domain Knowledge Injection:
+Addressing issues such as logical disconnection and insufficient scene generalization in financial terminology understanding and compliance judgment for general models, our team conducted in-depth domain adaptation on the general base model Qwen2.5-7B using the Llama-Factory framework. By injecting a large amount of high-quality financial reasoning COT data, we significantly enhanced the model's understanding of financial terminology, financial logic reasoning, and risk prediction capabilities.
+#### Stage Two - Reinforcement Learning Optimization:
+After the model acquired complex reasoning skills, we used the Open-R1 framework for reinforcement learning training. After comparing various reinforcement learning algorithms, we selected the GRPO algorithm to optimize the model's output professionalism and compliance with a dynamic reward mechanism. We innovatively removed the traditional Reference model and adopted a dual-motivation mechanism of format rewards and accuracy rewards to optimize the model's learning.
+![grpo](grpo.png)
+## 🧐 Evaluation and Usage Instructions <a name="use1"></a>
+Based on [evalscope](https://github.com/modelscope/evalscope)we constructed a benchmark testing framework tailored for multi-task characteristics in the financial domain and systematically validated it using five representative open-source heterogeneous datasets. Our main contributions include:
+>
+> 1.When adding our evaluation datasets, there is no need to unify the dataset format. Instead, simply specify the data reading rules in[adapter.py](https://github.com/SUFE-AIFLM-Lab/SuFin-R1/blob/main/adapter.py).
+>
+> 2.We introduced the LLM as Judger approach, currently using GPT-4o as the scoring model. If you prefer not to use LLM as Judger, you can opt for the regular expression matching method for objective questions.
+>
+> 3.We modified the API calling methods, allowing users to choose between request and openai methods (the original code only supported the openai method).
+During evaluation, to address the heterogeneity of sample sizes in the evaluation datasets, we designed a dynamic threshold strategy: when the sample size of an evaluation dataset is below 1,000, we conduct full-scale testing to ensure statistical significance; when the sample size exceeds 1,000, we employ stratified sampling to randomly select 1,000 representative samples from each category to form a streamlined evaluation set.
+## 🚨 Model Evaluation Results <a name="results"></a>
+In authoritative evaluations covering finance, mathematics, and language capabilities, Fin-R1, with only 7 billion parameters, demonstrates remarkable performance, significantly surpassing other general LLMs. Particularly in financial scenarios, Fin-R1-7B outperforms the full-version DeepSeek-R1 on both FinQA and ConvFinQA.
+### Financial Scenarios
+We evaluated the model on several benchmark tests covering multiple financial business scenarios. The model comprehensively outperformed other models of the same scale and approached the performance of 32B models, ranking second with an average score of 75. It achieved the highest scores among the evaluated models on FinQA and ConvFinQA.
+| Model                        | Parameters | FinQA | ConvFinga | Ant_Finance | TFNS |  Finance-Instruct-500k  | Average |
+|------------------------------|------------|-------|-----------|-------------|------|-------------------------|---------|
+| DeepSeek-R1                  | unknown    | 71.0  | 82.0      | 90.0        | 78.0 | 70.0                    | 78.2    |
+| Qwen-2.5-Instruct            | 32B        | 72.0  | 78.0      | 84.0        | 77.0 | 58.0                    | 73.8    |
+| DeepSeek-R1-Distill-Qwen     | 32B        | 70.0  | 72.0      | 87.0        | 79.0 | 54.0                    | 72.4    |
+| Qwen2.5-SFT                  | 7B         | 73.0  | 81.0      | 76.0        | 68.0 | 61.0                    | 71.9    |
+| Qwen-2.5-Instruct            | 14B        | 68.0  | 77.0      | 84.0        | 72.0 | 56.0                    | 71.4    |
+| DeepSeek-R1-Distill-Qwen     | 14B        | 62.0  | 73.0      | 82.0        | 65.0 | 49.0                    | 66.2    |
+| Qwen-2.5-Instruct            | 7B         | 60.0  | 66.0      | 85.0        | 68.0 | 49.0                    | 65.6    |
+| DeepSeek-R1-Distill-Qwen     | 7B         | 55.0  | 62.0      | 71.0        | 60.0 | 42.0                    | 58.0    |
+| Fin-R1                       | 7B         | 76.0  | 85.0      | 81.0        | 71.0 | 62.9                    | 75.2    |
+## 🕵️ Model Usage Instructions <a name="use"></a>
+You can directly download our model weights from Hugging Face:
+```
+git clone https://huggingface.co/SUFE-AIFLM-Lab/Fin-R1
+```
+Prepare the dependency environment and install vllm with the following command:
+```
+pip install vllm
+```
+Start the model service with a single command:
+```
+vllm serve "/path/Fin-R1" --port 8000 --gpu-memory-utilization 0.9 --max-model-len 16384 --tensor-parallel-size 2 --served-model-name "Fin-R1"
+```
+## 📌 Statement and Future Outlook <a name="todo"></a>
+As a financial reasoning large language model, Fin-R1 can efficiently complete numerous financial tasks and provide professional services to users. However, it still faces technical bottlenecks and application limitations at this stage. The suggestions and analysis results it provides are for reference only and should not be equated with the precise judgments of professional financial analysts or experts. We sincerely hope that users will critically examine the model's output and make decisions in combination with their own professional knowledge and experience. Looking to the future, we will continue to optimize Fin-R1 and deeply explore its potential applications in cutting-edge financial scenarios to help the financial industry reach new heights of intelligence and compliance, injecting strong momentum into industry development.
+## 📫 Contact Us <a name="connection"></a>
+We sincerely invite industry peers to jointly explore the innovative paradigm of the deep integration of AI and finance and build a new ecosystem for intelligent finance.