Correct pipeline tag and add Github link
Browse filesThis PR corrects the `pipeline_tag` to `text-generation` which is more appropriate for this model. It also adds a link to the Github repository for easier access to the code.
README.md
CHANGED
|
@@ -1,11 +1,11 @@
|
|
| 1 |
---
|
| 2 |
-
library_name: transformers
|
| 3 |
-
license: apache-2.0
|
| 4 |
-
language:
|
| 5 |
-
- en
|
| 6 |
base_model:
|
| 7 |
- answerdotai/ModernBERT-large
|
| 8 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 9 |
author: Shreyan C (@thethinkmachine)
|
| 10 |
---
|
| 11 |
|
|
@@ -88,9 +88,7 @@ print("Scaled Complexity Score:", get_scaled_complexity_score(query))
|
|
| 88 |
|
| 89 |
### Training Data
|
| 90 |
|
| 91 |
-
We use [BhabhaAI/DEITA-Complexity](https://huggingface.co/datasets/BhabhaAI/DEITA-Complexity) 'deita'set for training the model. The dataset contains 66.5K diverse English instructions along with their complexity scores computed using the DEITA-Evol-Complexity scoring scheme which uses an LLM-judge to rank a sextuple containing 1 seed + 5 progressively complexified (*evolved*) instructions based on their complexity & difficulty. The scheme assigns scores within [1, 6] range, with 1 being the least complex and 6 being the most complex.
|
| 92 |
-
|
| 93 |
-
However, the training dataset used was observed to have instruction-score pairs across a diversity of scores within the range [0,9]. We suspect that this range includes scoring errors, as anomalous scores (0, 7, 8, 9) account for less than 1% of the total instructions.
|
| 94 |
|
| 95 |
The distribution of scores within the dataset is as follows:
|
| 96 |
| Score | Frequency | Relative Freq. |
|
|
@@ -142,7 +140,7 @@ You are advised to use the model keeping these factors in mind.
|
|
| 142 |
|
| 143 |
### CO2 Emissions
|
| 144 |
|
| 145 |
-
Experiments were conducted using Google Cloud Platform in region asia-south1, which has a carbon efficiency of 0.92 kgCO2eq/kWh. A cumulative of 13.24 hours of computation was performed on hardware of type L4 (TDP of 72W)
|
| 146 |
|
| 147 |
Total emissions are estimated to be 0.87 kgCO2eq of which 100% was directly offset by the cloud provider.
|
| 148 |
|
|
@@ -164,4 +162,5 @@ For any queries, suggestions or feedback, please contact Shreyan C at *shreyan(a
|
|
| 164 |
- [[2312.15685] What Makes Good Data for Alignment? A Comprehensive Study of Automatic Data Selection in Instruction Tuning](https://arxiv.org/abs/2312.15685)
|
| 165 |
- [[2404.02948] PiSSA: Principal Singular Values and Singular Vectors Adaptation of Large Language Models](https://arxiv.org/abs/2404.02948)
|
| 166 |
- [DEITA-Complexity](https://huggingface.co/datasets/BhabhaAI/DEITA-Complexity)
|
| 167 |
-
- [ModernBERT-Large](https://huggingface.co/answerdotai/ModernBERT-large)
|
|
|
|
|
|
| 1 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2 |
base_model:
|
| 3 |
- answerdotai/ModernBERT-large
|
| 4 |
+
language:
|
| 5 |
+
- en
|
| 6 |
+
library_name: transformers
|
| 7 |
+
license: apache-2.0
|
| 8 |
+
pipeline_tag: text-generation
|
| 9 |
author: Shreyan C (@thethinkmachine)
|
| 10 |
---
|
| 11 |
|
|
|
|
| 88 |
|
| 89 |
### Training Data
|
| 90 |
|
| 91 |
+
We use [BhabhaAI/DEITA-Complexity](https://huggingface.co/datasets/BhabhaAI/DEITA-Complexity) 'deita'set for training the model. The dataset contains 66.5K diverse English instructions along with their complexity scores computed using the DEITA-Evol-Complexity scoring scheme which uses an LLM-judge to rank a sextuple containing 1 seed + 5 progressively complexified (*evolved*) instructions based on their complexity & difficulty. The scheme assigns scores within [1, 6] range, with 1 being the least complex and 6 being the most complex. However, the training dataset used was observed to have instruction-score pairs across a diversity of scores within the range [0,9]. We suspect that this range includes scoring errors, as anomalous scores (0, 7, 8, 9) account for less than 1% of the total instructions.
|
|
|
|
|
|
|
| 92 |
|
| 93 |
The distribution of scores within the dataset is as follows:
|
| 94 |
| Score | Frequency | Relative Freq. |
|
|
|
|
| 140 |
|
| 141 |
### CO2 Emissions
|
| 142 |
|
| 143 |
+
Experiments were conducted using Google Cloud Platform in region asia-south1, which has a carbon efficiency of 0.92 kgCO2eq/kWh. A cumulative of 13.24 hours of computation was performed on hardware of type L4 (TDP of 72W).\
|
| 144 |
|
| 145 |
Total emissions are estimated to be 0.87 kgCO2eq of which 100% was directly offset by the cloud provider.
|
| 146 |
|
|
|
|
| 162 |
- [[2312.15685] What Makes Good Data for Alignment? A Comprehensive Study of Automatic Data Selection in Instruction Tuning](https://arxiv.org/abs/2312.15685)
|
| 163 |
- [[2404.02948] PiSSA: Principal Singular Values and Singular Vectors Adaptation of Large Language Models](https://arxiv.org/abs/2404.02948)
|
| 164 |
- [DEITA-Complexity](https://huggingface.co/datasets/BhabhaAI/DEITA-Complexity)
|
| 165 |
+
- [ModernBERT-Large](https://huggingface.co/answerdotai/ModernBERT-large)
|
| 166 |
+
- [Github](https://github.com/thethinkmachine/Maxwell-Task-Complexity-Scorer)
|