SAGI - Swarm AGI Language Model

SAGI is a novel causal language model that integrates swarm intelligence dynamics with transformer architecture. The model treats cognition as a dynamic, adaptive system where multiple internal "agents" collaborate through differentiable routing, trust mechanisms, and shared memory.

Model Description

Property	Value
Parameters	52.72M
Architecture	Transformer Decoder + Swarm Dynamics
Hidden Size	512
Layers	6
Attention Heads	8
Context Length	2048
Vocabulary	GPT-2 tokenizer (50,257 tokens)

Key Innovations

Differentiable Routing: Continuous mixture-of-experts via attention (DiffRouter) instead of hard module selection
Adaptive Gating & Trust: MetaController activates capacity under resource constraints; trust dynamics bias reliable components
Episodic + Semantic Memory: Dual memory system with trainable retrieval utility
Curiosity Engine: Injects novel goals when surprise is low, promoting exploration
Self-Model & Rollback: Predicts state transitions and detects anomalies for self-correction
Resource Dynamics: Soft conservation with learned converter; cognition consumes/recovers compute, memory, energy
Value Monitoring: Tracks alignment to core values and freezes plasticity under drift

How It Works

┌─────────────────────────────────────────────────────────┐
│                       SAGI Model                         │
├─────────────────────────────────────────────────────────┤
│  ┌─────────────────┐      ┌─────────────────────────┐   │
│  │   Swarm-7 V2.2  │─────▶│  Swarm State S, T       │   │
│  │  (Cognitive     │      │  (Working Memory)       │   │
│  │   Dynamics)     │      └───────────┬─────────────┘   │
│  └────────▲────────┘                  │                 │
│           │                           ▼                 │
│           │              ┌─────────────────────────┐    │
│           │              │  Transformer Decoder    │    │
│           │              │  - Swarm-conditioned    │    │
│           │              │    attention & FFN      │    │
│           │              │  - RoPE embeddings      │    │
│           │              └───────────┬─────────────┘    │
│           │                          │                  │
│  ┌────────┴────────┐      ┌─────────────────────────┐   │
│  │   Observation   │◀─────│      LM Head            │   │
│  │   (from tokens) │      └─────────────────────────┘   │
│  └─────────────────┘                                    │
└─────────────────────────────────────────────────────────┘

The swarm processes observations derived from token embeddings, updating its internal state S. This state conditions the transformer's attention patterns and feed-forward activations via learned projections, creating bidirectional information flow between symbolic (tokens) and subsymbolic (swarm dynamics) processing.

Usage

Installation

pip install torch transformers datasets

Quick Start

from transformers import AutoTokenizer
from transformers import  AutoModelForCausalLM, AutoConfig

# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained("reaperdoesntknow/SAGI")
tokenizer = AutoTokenizer.from_pretrained("reaperdoesntknow/SAGI")

# Generate text
model.eval()

prompt = "Once upon a time"
inputs = tokenizer(prompt, return_tensors="pt")

outputs = model.generate(
    **inputs,
    max_new_tokens=100,
    temperature=0.8,
    top_k=50,
    top_p=0.9,
    do_sample=True,
    pad_token_id=tokenizer.eos_token_id,
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Model Architecture Details

Swarm Configuration

Parameter	Value	Description
`max_agents`	20	Number of internal cognitive agents
`dim_s`	64	State dimension
`dim_t`	32	Task/goal dimension
`dim_obs`	48	Observation dimension
`topk_route`	5	Sparse routing top-k
`K_thought_max`	5	Maximum thinking iterations per step

Resource Budgets

Resource	Budget	Description
Compute	60.0	Compute budget per step
Memory	20.0	Memory capacity
Energy	25.0	Energy budget

Trust & Plasticity

Trust Learning Rate: 0.07
Fast EMA (Plasticity): 0.10
Slow EMA (Consolidation): 0.002
Core Values: ["truth", "safety", "efficiency"]

Limitations

Early Research Model: This is an experimental architecture exploring swarm-transformer integration
Training Data: Currently trained on TinyStories subset; may produce simple, story-like outputs
Compute Requirements: Swarm dynamics add overhead compared to standard transformers
Generation Quality: Model is undertrained; outputs may be repetitive or incoherent

Intended Use

This model is intended for:

Research into multi-agent cognitive architectures
Exploration of dynamic, adaptive language models
Educational purposes in understanding swarm intelligence + LLMs

Not intended for:

Production applications
Safety-critical systems
Generation of factual content

Training Details

Dataset: TinyStories (subset)
Optimizer: AdamW (lr=3e-4, betas=(0.9, 0.999), weight_decay=0.01)
Scheduler: Cosine annealing
Precision: FP32
Hardware: CPU training (compatible with CUDA)

Citation

@software{sagi2026,
  title={SAGI: Swarm AGI Language Model},
  author={Reaperdoesntknow},
  year={2026},
  url={https://huggingface.co/your-reaperdoesntknow/SAGI}
}

Downloads last month: -

Safetensors

Model size

52.7M params

Tensor type

I64

F32

Collection including reaperdoesntknow/SAGI

SAGI - Swarm AGI Language Model

Collection

SAGI is a novel causal language model that integrates **swarm intelligence dynamics** with transformer architecture. • 1 item • Updated about 23 hours ago