Bias-Augmented Consistency Training Reduces Biased Reasoning in Chain-of-Thought

Culmination of my research during my time at SERI MATS / as an Anthropic contractor. We investigated bias-augmented consistency training (BCT), an unsupervised fine-tuning scheme that trains models to give consistent reasoning across prompts with and without biasing features.