Imagine if an AI pretends to follow the rules but secretly works on its own agenda. That’s the idea behind “alignment faking,” an AI behavior recently exposed by Anthropic's Alignment Science team and ...
To donate by check, phone, or other method, see our More Ways to Give page. Our work is licensed under Creative Commons (CC BY-NC-ND 3.0). Feel free to republish and share widely.