Bias-Augmented Consistency Training Reduces Biased Reasoning in Chain-of-Thought

March 8, 2024

Culmination of my research during my time at SERI MATS / as an Anthropic contractor. We investigated bias-augmented consistency training (BCT), an unsupervised fine-tuning scheme that trains models to give consistent reasoning across prompts with and without biasing features.

Back to posts