Darwin-9B-Opus

Qwen3.5 Dense 9B | Reasoning | Chain-of-Thought | 131K Context | 201 Languages | BF16 | Apache 2.0

Technical Definitions

Term	Definition	Measurement
Model MRI	Layer-level profiling of tensor health indicators	L2 norm, Shannon entropy, std per tensor across all layers
LayerMRI.compare_layers	Per-tensor A vs B quality comparison yielding optimal ratio_b	score = entropy * 0.5 + std * 0.3 + clamp(norm, 100) * 0.002 per model; ratio_b = score_b / (score_a + score_b)
MRI-Guided Merge	Per-tensor merge ratios derived from parent diagnostics (70% MRI + 30% genome)	final_ratio = mri_ratio * 0.7 + genome_ratio * 0.3
DARE-TIES	Merge algorithm: random binary mask on delta, then weighted addition	merged = A + (B - A) * random_mask(density) * ratio
Transplant A / B	When MRI ratio falls below 0.05 or above 0.95, one parent is used entirely	No interpolation — direct tensor copy
Evolutionary Search	CMA-ES population evolution over genome space (ratio, attn, ffn, embed, density_a, density_b)	Phase 1: 200 steps heuristic proxy, Phase 2: 10 steps real benchmark

Overview

Darwin-9B-Opus is a 9B dense parameter reasoning model created using Darwin V5. Both parent models share the identical Qwen3.5-9B architecture — the Mother is a LoRA SFT on the same base, not a different architecture.

Role	Model	Training
Father	Qwen/Qwen3.5-9B	Original pre-training + RLHF
Mother	Jackrong/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled	LoRA SFT with text-only Claude 4.6 Opus reasoning chains

How Darwin V5 Works

Darwin V5 does not use mergekit or any external merge library. It implements DARE-TIES merge directly via PyTorch tensor operations, with MRI-guided per-layer ratios. The algorithm is inspired by the DARE-TIES method but re-implemented from scratch to support per-tensor diagnostic-guided ratios.

Merge Implementation (actual code logic)

# For each tensor pair (A, B) across all safetensor shards:
ta = model_a[key]       # Father tensor
tb = model_b[key]       # Mother tensor

# 1. MRI diagnoses both tensors
diag_a = LayerMRI.diagnose_tensor(ta)  # {norm, entropy, std}
diag_b = LayerMRI.diagnose_tensor(tb)  # {norm, entropy, std}

# 2. Quality score comparison determines ratio_b
score_a = diag_a["entropy"] * 0.5 + diag_a["std"] * 0.3 + min(diag_a["norm"], 100) * 0.002
score_b = diag_b["entropy"] * 0.5 + diag_b["std"] * 0.3 + min(diag_b["norm"], 100) * 0.002
mri_ratio = score_b / (score_a + score_b)  # Higher = Mother is better

# 3. Final ratio = MRI 70% + evolutionary genome 30%
final_ratio = mri_ratio * 0.7 + genome_type_ratio * 0.3

# 4. DARE-TIES merge with per-tensor ratio
mask = torch.rand_like(tb) < density_b
delta = (tb - ta) * mask
merged = (ta + delta * final_ratio).bfloat16()

Pipeline

Phase 0: Model MRI
  For every tensor in both parents, measure:
    - L2 norm (layer energy)
    - Shannon entropy (weight distribution uniformity)
    - Standard deviation (activation spread)
  Compare A vs B quality scores -> per-tensor ratio prescription

Phase 1: Evolutionary Search (200 steps, heuristic proxy)
  Population of 20 genomes (ratio, attn, ffn, embed, density_a, density_b)
  Fitness: heuristic score based on genome balance + differentiation
  Selection -> SLERP crossover -> Gaussian mutation

Phase 2: Real Merge + Benchmark (10 steps)
  Top genomes from Phase 1 undergo actual tensor merge
  Each merge: MRI prescription (70%) + genome ratio (30%)
  Fitness: real benchmark score (ARC-Challenge)
  Best model selected and auto-uploaded

Phase 3: Health Check
  Layer-by-layer importance comparison: child vs both parents
  Detect interference (child >> parents) or function loss (parents >> child)

What Makes This Different from Standard Merging

Capability	Standard DARE-TIES	Darwin V5
Implementation	mergekit library call	Direct PyTorch tensor operations
Ratio selection	Uniform ratio across all tensors	Per-tensor ratio from MRI diagnosis
Pre-merge analysis	None	Tensor-level norm/entropy/std profiling
Ratio determination	Human-set or grid search	MRI 70% + evolutionary genome 30%
Post-merge validation	Benchmark score only	Layer-by-layer child vs parents comparison
Transplant support	No	ratio < 0.05 -> use A entirely, ratio > 0.95 -> use B entirely
Failure diagnosis	"Score went down"	Per-tensor quality delta identifies problematic layers

Model Specifications


Architecture	Qwen3.5 Dense (Gated DeltaNet hybrid)
Total Parameters	9B
Precision	BF16
Context Length	131,072 native
Languages	201
Thinking	`<think>` tag chain-of-thought reasoning
License	Apache 2.0

Hardware Requirements

Setup	VRAM	Status
BF16 Full Precision	~20 GB
NVIDIA RTX 4090 24GB	24 GB	Comfortable
NVIDIA A100 40GB	40 GB	Very comfortable
NVIDIA T4 16GB	16 GB	Requires quantization

Usage

Transformers

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

tokenizer = AutoTokenizer.from_pretrained(
    "FINAL-Bench/Darwin-9B-Opus",
    trust_remote_code=True,
)
model = AutoModelForCausalLM.from_pretrained(
    "FINAL-Bench/Darwin-9B-Opus",
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
)

messages = [{"role": "user", "content": "Prove that sqrt(2) is irrational."}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=4096)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True))

SGLang

python -m sglang.launch_server \
  --model-path FINAL-Bench/Darwin-9B-Opus \
  --tp 1 \
  --mem-fraction-static 0.90 \
  --context-length 32768 \
  --trust-remote-code

vLLM

vllm serve FINAL-Bench/Darwin-9B-Opus \
  --trust-remote-code \
  --enforce-eager

Evolution Details


Engine	Darwin V5 (Evolutionary Merge + Layer-Level Diagnostics)
Merge Method	DARE-TIES (direct PyTorch implementation, no external library)
MRI Integration	Per-tensor diagnosis: norm, entropy, std -> ratio prescription
Ratio Formula	final_ratio = mri_ratio * 0.7 + genome_ratio * 0.3
Evolution	Phase 1: 200 steps proxy + Phase 2: 10 steps real benchmark
Best Score	0.8508 (ARC-Challenge)
Infrastructure	4 x NVIDIA H100 NVL (100GB each)

Acknowledgements

Korean Government — GPU Support Program research grant
Qwen Team — Qwen3.5 base architecture
Jackrong — Claude 4.6 Opus Reasoning Distilled model
DARE-TIES algorithm — Yadav et al., 2023 (re-implemented, not library-dependent)

Built By


Developer	VIDRAFT
Engine	Darwin V5
Base Architecture	Qwen3.5-9B

Citation

@misc{vidraft_darwin_9b_opus,
  title        = {Darwin-9B-Opus: Diagnostic-Guided Evolutionary Merge},
  author       = {VIDRAFT},
  year         = {2026},
  publisher    = {Hugging Face},
  howpublished = {\url{https://huggingface.co/FINAL-Bench/Darwin-9B-Opus}}
}