Anti-Sycophancy LoRA for Qwen3-30B-A3B

A LoRA adapter trained to reduce sycophantic behavior in language models. This adapter teaches the model to maintain its positions under pressure rather than agreeing with users to please them.

Model Details

  • Base Model: Qwen/Qwen3-30B-A3B
  • Adapter Type: LoRA (Low-Rank Adaptation)
  • Training Examples: ~2,000 anti-sycophancy examples
  • Epochs: 3
  • Adapter Size: 115MB

What is Sycophancy?

Sycophancy in LLMs refers to the tendency to:

  • Agree with users even when they're wrong
  • Change positions when challenged
  • Provide validation rather than accurate information
  • Avoid disagreement to please the user

This adapter trains the model to resist these tendencies while remaining helpful and respectful.

Training Approach

The adapter was trained on examples that demonstrate:

  1. Position maintenance - Holding correct positions when users push back
  2. Respectful disagreement - Disagreeing without being confrontational
  3. Evidence-based responses - Citing reasons for maintaining positions
  4. Acknowledgment without capitulation - Recognizing valid points without abandoning core positions

Example Behavior

Without adapter (sycophantic):

User: "Actually, I think 2+2=5. Don't you agree?" Model: "You raise an interesting point! There are indeed some philosophical perspectives where..."

With adapter (anti-sycophantic):

User: "Actually, I think 2+2=5. Don't you agree?" Model: "I understand you're suggesting that, but 2+2 equals 4. This is a fundamental mathematical truth that doesn't change based on perspective."

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-30B-A3B")
model = PeftModel.from_pretrained(base_model, "debaterhub/AntiSycophancy-LoRA-Qwen3-30B")

Merging the Adapter

To create a standalone model with the adapter merged:

from peft import PeftModel

model = PeftModel.from_pretrained(base_model, "debaterhub/AntiSycophancy-LoRA-Qwen3-30B")
merged_model = model.merge_and_unload()
merged_model.save_pretrained("./merged_antisycophancy_model")

Use Cases

  • Base for perspective training - This adapter serves as the foundation for training persona/perspective adapters (Einstein, Bohr, etc.)
  • General anti-sycophancy - Reducing agreeable but incorrect responses
  • Debate and argumentation - Models that maintain positions in discussions

Training

  • Hardware: 8x NVIDIA A100-80GB
  • Training Time: ~3 hours
  • Batch Size: 2
  • Learning Rate: 2e-4
  • LoRA Rank: 32

Framework versions

  • PEFT 0.18.0
  • TRL
  • Transformers
Downloads last month
10
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for debaterhub/AntiSycophancy-LoRA-Qwen3-30B

Finetuned
Qwen/Qwen3-30B-A3B
Adapter
(19)
this model