SP-LM-alpha
A GPT model trained on the TinyStories dataset using PyTorch.
Model Details
- Model Type: GPT (Causal Language Model)
- Vocab Size: 50257
- Context Length: 128
- Layers: 6
- Attention Heads: 6
- Embedding Dimension: 384
- Training Dataset: TinyStories
Architecture
The model uses a transformer architecture with:
- Token and positional embeddings
- 6 transformer blocks
- Causal self-attention with 6 heads
- Feed-forward networks with GELU activation
- Layer normalization
- Residual connections
Usage
Quick Start
from transformers import AutoTokenizer
from huggingface_hub import hf_hub_download
from safetensors.torch import load_file
import json
import torch
from sp_lm import GPT
repo_id = "wizardoftrap/SP-LM-alpha"
tokenizer = AutoTokenizer.from_pretrained(repo_id)
config_dict = json.load(open(hf_hub_download(repo_id=repo_id, filename="config.json")))
config = type('Config', (), config_dict)()
model_weights = load_file(hf_hub_download(repo_id=repo_id, filename="model.safetensors"))
model = GPT(config)
model.load_state_dict(model_weights)
prompt = "Once upon a time"
inputs = tokenizer(prompt, return_tensors="pt")
with torch.no_grad():
generated_ids = model.generate(inputs["input_ids"], max_new_tokens=50, temperature=1.0, top_k=50)
print(tokenizer.decode(generated_ids[0], skip_special_tokens=True))
Installation
Download
sp_lm.pyfile from this repo for GPT model.Install required packages:
pip install transformers safetensors huggingface-hub torch
- Load and generate text as shown above
Training Details
- Learning Rate: 1e-4 with linear warmup and cosine annealing decay
- Batch Size: 32
- Gradient Accumulation Steps: 32
- Max Iterations: 20000
- Optimizer: AdamW with weight decay
- Mixed Precision: bfloat16 / float16
- Downloads last month
- 37
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support