metadata
library_name: transformers
license: apache-2.0
datasets:
- HuggingFaceFW/fineweb-edu
language:
- en
Model Details
This model is a 1B llama3 model pretrained from scratch with torchtitan on fineweb-edu with C_AdamW optimizer. 100B tokens seen.
How to use
import torch
from transformers import pipeline
pipe = pipeline(
"text-generation",
model="kz919/llama3_1b_cautious_100B_token_8222025",
)
print(pipe("The key to life is"))
Downstream Eval
ARC, Hellaswag, Lambda_OpenAI, OpenbookQA, PIQA
lm_eval --model hf --model_args pretrained=kz919/llama3_1b_cautious_100B_token_8222025,dtype="bfloat16",add_bos_token=True --tasks lambada_openai,hellaswag,piqa,arc_easy,arc_challenge,openbookqa --device cuda:7 --batch_size 8
| Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
|---|---|---|---|---|---|---|---|---|
| arc_challenge | 1 | none | 0 | acc | ↑ | 0.3183 | ± | 0.0136 |
| none | 0 | acc_norm | ↑ | 0.3379 | ± | 0.0138 | ||
| arc_easy | 1 | none | 0 | acc | ↑ | 0.6650 | ± | 0.0097 |
| none | 0 | acc_norm | ↑ | 0.6061 | ± | 0.0100 | ||
| hellaswag | 1 | none | 0 | acc | ↑ | 0.3999 | ± | 0.0049 |
| none | 0 | acc_norm | ↑ | 0.5025 | ± | 0.0050 | ||
| lambada_openai | 1 | none | 0 | acc | ↑ | 0.3912 | ± | 0.0068 |
| none | 0 | perplexity | ↓ | 23.8709 | ± | 0.8855 | ||
| openbookqa | 1 | none | 0 | acc | ↑ | 0.2580 | ± | 0.0196 |
| none | 0 | acc_norm | ↑ | 0.3740 | ± | 0.0217 | ||
| piqa | 1 | none | 0 | acc | ↑ | 0.7116 | ± | 0.0106 |
| none | 0 | acc_norm | ↑ | 0.7149 | ± | 0.0105 |
MMLU
| Groups | Version | Filter | n-shot | Metric | Value | Stderr | ||
|---|---|---|---|---|---|---|---|---|
| mmlu | 2 | none | acc | ↑ | 0.2519 | ± | 0.0037 | |
| - humanities | 2 | none | acc | ↑ | 0.2540 | ± | 0.0064 | |
| - other | 2 | none | acc | ↑ | 0.2527 | ± | 0.0078 | |
| - social sciences | 2 | none | acc | ↑ | 0.2480 | ± | 0.0078 | |
| - stem | 2 | none | acc | ↑ | 0.2518 | ± | 0.0077 |