Secure Code SAEs
Collection
7 items
•
Updated
This repository contains 3 Sparse Autoencoder(s) (SAE) trained using SAELens.
| Property | Value |
|---|---|
| Base Model | Qwen/Qwen2.5-7B-Instruct |
| Architecture | standard |
| Input Dimension | 3584 |
| SAE Dimension | 16384 |
| Training Dataset | TQRG/DeltaSecommits_qwen-2.5-7b-instruct_tokenized |
| Hook Point |
|---|
blocks.0.hook_resid_post |
blocks.14.hook_resid_post |
blocks.27.hook_resid_post |
from sae_lens import SAE
# Load an SAE for a specific hook point
sae, cfg_dict, sparsity = SAE.from_pretrained(
release="rufimelo/secure_code_qwen_coder_strd_16384",
sae_id="blocks.0.hook_resid_post" # Choose from available hook points above
)
# Use with TransformerLens
from transformer_lens import HookedTransformer
model = HookedTransformer.from_pretrained("Qwen/Qwen2.5-7B-Instruct")
# Get activations and encode
_, cache = model.run_with_cache("your text here")
activations = cache["blocks.0.hook_resid_post"]
features = sae.encode(activations)
blocks.0.hook_resid_post/cfg.json - SAE configurationblocks.0.hook_resid_post/sae_weights.safetensors - Model weightsblocks.0.hook_resid_post/sparsity.safetensors - Feature sparsity statisticsblocks.14.hook_resid_post/cfg.json - SAE configurationblocks.14.hook_resid_post/sae_weights.safetensors - Model weightsblocks.14.hook_resid_post/sparsity.safetensors - Feature sparsity statisticsblocks.27.hook_resid_post/cfg.json - SAE configurationblocks.27.hook_resid_post/sae_weights.safetensors - Model weightsblocks.27.hook_resid_post/sparsity.safetensors - Feature sparsity statisticsThese SAEs were trained with SAELens version 6.26.2.