SP-LM-alpha

A GPT model trained on the TinyStories dataset using PyTorch.

Model Details

  • Model Type: GPT (Causal Language Model)
  • Vocab Size: 50257
  • Context Length: 128
  • Layers: 6
  • Attention Heads: 6
  • Embedding Dimension: 384
  • Training Dataset: TinyStories

Architecture

The model uses a transformer architecture with:

  • Token and positional embeddings
  • 6 transformer blocks
  • Causal self-attention with 6 heads
  • Feed-forward networks with GELU activation
  • Layer normalization
  • Residual connections

Usage

Quick Start

from transformers import AutoTokenizer
from huggingface_hub import hf_hub_download
from safetensors.torch import load_file
import json
import torch
from sp_lm import GPT

repo_id = "wizardoftrap/SP-LM-alpha"

tokenizer = AutoTokenizer.from_pretrained(repo_id)

config_dict = json.load(open(hf_hub_download(repo_id=repo_id, filename="config.json")))
config = type('Config', (), config_dict)()

model_weights = load_file(hf_hub_download(repo_id=repo_id, filename="model.safetensors"))
model = GPT(config)
model.load_state_dict(model_weights)

prompt = "Once upon a time"
inputs = tokenizer(prompt, return_tensors="pt")
with torch.no_grad():
    generated_ids = model.generate(inputs["input_ids"], max_new_tokens=50, temperature=1.0, top_k=50)
print(tokenizer.decode(generated_ids[0], skip_special_tokens=True))

Installation

  1. Download sp_lm.py file from this repo for GPT model.

  2. Install required packages:

pip install transformers safetensors huggingface-hub torch
  1. Load and generate text as shown above

Training Details

  • Learning Rate: 1e-4 with linear warmup and cosine annealing decay
  • Batch Size: 32
  • Gradient Accumulation Steps: 32
  • Max Iterations: 20000
  • Optimizer: AdamW with weight decay
  • Mixed Precision: bfloat16 / float16
Downloads last month
37
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train wizardoftrap/SP-LM-alpha