kz919's picture
Update README.md
c143bab verified
metadata
library_name: transformers
license: apache-2.0
datasets:
  - HuggingFaceFW/fineweb-edu
language:
  - en

Model Details

This model is a 1B llama3 model pretrained from scratch with torchtitan on fineweb-edu with C_AdamW optimizer. 100B tokens seen.

How to use

import torch
from transformers import pipeline


pipe = pipeline(
    "text-generation",
    model="kz919/llama3_1b_cautious_100B_token_8222025",
)

print(pipe("The key to life is"))

Downstream Eval

ARC, Hellaswag, Lambda_OpenAI, OpenbookQA, PIQA

lm_eval --model hf --model_args pretrained=kz919/llama3_1b_cautious_100B_token_8222025,dtype="bfloat16",add_bos_token=True --tasks lambada_openai,hellaswag,piqa,arc_easy,arc_challenge,openbookqa --device cuda:7 --batch_size 8
Tasks Version Filter n-shot Metric Value Stderr
arc_challenge 1 none 0 acc 0.3183 ± 0.0136
none 0 acc_norm 0.3379 ± 0.0138
arc_easy 1 none 0 acc 0.6650 ± 0.0097
none 0 acc_norm 0.6061 ± 0.0100
hellaswag 1 none 0 acc 0.3999 ± 0.0049
none 0 acc_norm 0.5025 ± 0.0050
lambada_openai 1 none 0 acc 0.3912 ± 0.0068
none 0 perplexity 23.8709 ± 0.8855
openbookqa 1 none 0 acc 0.2580 ± 0.0196
none 0 acc_norm 0.3740 ± 0.0217
piqa 1 none 0 acc 0.7116 ± 0.0106
none 0 acc_norm 0.7149 ± 0.0105

MMLU

Groups Version Filter n-shot Metric Value Stderr
mmlu 2 none acc 0.2519 ± 0.0037
- humanities 2 none acc 0.2540 ± 0.0064
- other 2 none acc 0.2527 ± 0.0078
- social sciences 2 none acc 0.2480 ± 0.0078
- stem 2 none acc 0.2518 ± 0.0077