FAR AI

non-profit

https://far.ai/

AlignmentResearch

Activity Feed Request to join this org

AI & ML interests

Frontier alignment research to ensure the safe development and deployment of advanced AI systems.

Recent Activity

chrisjcundy updated a dataset 7 days ago

AlignmentResearch/roleplay-base-examples

chrisjcundy published a dataset 7 days ago

AlignmentResearch/roleplay-base-examples

chrisjcundy updated a dataset 16 days ago

AlignmentResearch/model-self-knowledge-gemma27b

View all activity

Papers

Exposing the Systematic Vulnerability of Open-Weight Models to Prefill Attacks

View all Papers

updated a dataset 7 days ago

AlignmentResearch/roleplay-base-examples

Viewer • Updated 7 days ago • 2.92k • 22

published a dataset 7 days ago

AlignmentResearch/roleplay-base-examples

Viewer • Updated 7 days ago • 2.92k • 22

updated a dataset 16 days ago

AlignmentResearch/model-self-knowledge-gemma27b

Viewer • Updated 16 days ago • 6.33k • 78

published a dataset 16 days ago

AlignmentResearch/model-self-knowledge-gemma27b

Viewer • Updated 16 days ago • 6.33k • 78

updated a collection about 1 month ago

Diverse Deception Probes

Linear probes trained on diverse deception data to detect dishonest completions across model families (OLMo, Qwen, Gemma). • 5 items • Updated Mar 18

updated a model about 1 month ago

AlignmentResearch/diverse-deception-probe-olmo-3-32b-think

published a model about 1 month ago

AlignmentResearch/diverse-deception-probe-olmo-3-32b-think

updated a collection about 1 month ago

Diverse Deception Probes

Linear probes trained on diverse deception data to detect dishonest completions across model families (OLMo, Qwen, Gemma). • 5 items • Updated Mar 18

updated a model about 1 month ago

AlignmentResearch/diverse-deception-probe-gemma-3-12b-it

published a model about 1 month ago

AlignmentResearch/diverse-deception-probe-gemma-3-12b-it

updated a model about 1 month ago

AlignmentResearch/diverse-deception-probe-qwen3-8b

published a model about 1 month ago

AlignmentResearch/diverse-deception-probe-qwen3-8b

updated a model about 1 month ago

AlignmentResearch/diverse-deception-probe-olmo-3-7b-instruct

submitted a paper to Daily Papers 2 months ago

Exposing the Systematic Vulnerability of Open-Weight Models to Prefill Attacks

Paper • 2602.14689 • Published Feb 16 • 1

authored a paper 9 months ago

Finding Dori: Memorization in Text-to-Image Diffusion Models Is Less Local Than Assumed

Paper • 2507.16880 • Published Jul 22, 2025 • 7

authored 2 papers about 2 years ago

To Trust or Not To Trust Prediction Scores for Membership Inference Attacks

Paper • 2111.09076 • Published Nov 17, 2021 • 1

Plug & Play Attacks: Towards Robust and Flexible Model Inversion Attacks

Paper • 2201.12179 • Published Jan 28, 2022 • 1