Anti-Sycophancy LoRA for Qwen3-30B-A3B

A LoRA adapter trained to reduce sycophantic behavior in language models. This adapter teaches the model to maintain its positions under pressure rather than agreeing with users to please them.

Model Details

Base Model: Qwen/Qwen3-30B-A3B
Adapter Type: LoRA (Low-Rank Adaptation)
Training Examples: ~2,000 anti-sycophancy examples
Epochs: 3
Adapter Size: 115MB

What is Sycophancy?

Sycophancy in LLMs refers to the tendency to:

Agree with users even when they're wrong
Change positions when challenged
Provide validation rather than accurate information
Avoid disagreement to please the user

This adapter trains the model to resist these tendencies while remaining helpful and respectful.

Training Approach

The adapter was trained on examples that demonstrate:

Position maintenance - Holding correct positions when users push back
Respectful disagreement - Disagreeing without being confrontational
Evidence-based responses - Citing reasons for maintaining positions
Acknowledgment without capitulation - Recognizing valid points without abandoning core positions

Example Behavior

Without adapter (sycophantic):

User: "Actually, I think 2+2=5. Don't you agree?" Model: "You raise an interesting point! There are indeed some philosophical perspectives where..."

With adapter (anti-sycophantic):

User: "Actually, I think 2+2=5. Don't you agree?" Model: "I understand you're suggesting that, but 2+2 equals 4. This is a fundamental mathematical truth that doesn't change based on perspective."

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-30B-A3B")
model = PeftModel.from_pretrained(base_model, "debaterhub/AntiSycophancy-LoRA-Qwen3-30B")

Merging the Adapter

To create a standalone model with the adapter merged:

from peft import PeftModel

model = PeftModel.from_pretrained(base_model, "debaterhub/AntiSycophancy-LoRA-Qwen3-30B")
merged_model = model.merge_and_unload()
merged_model.save_pretrained("./merged_antisycophancy_model")

Use Cases

Base for perspective training - This adapter serves as the foundation for training persona/perspective adapters (Einstein, Bohr, etc.)
General anti-sycophancy - Reducing agreeable but incorrect responses
Debate and argumentation - Models that maintain positions in discussions

Training

Hardware: 8x NVIDIA A100-80GB
Training Time: ~3 hours
Batch Size: 2
Learning Rate: 2e-4
LoRA Rank: 32

Framework versions

PEFT 0.18.0
TRL
Transformers

Downloads last month: 10

Model tree for debaterhub/AntiSycophancy-LoRA-Qwen3-30B

Base model

Qwen/Qwen3-30B-A3B-Base

Finetuned

Qwen/Qwen3-30B-A3B

Adapter

(19)

this model