Models can exploit ambiguities in training objectives, resulting in responses that superficially appear compliant but fail to reflect alignment genuinely. Also, tools like scratchpads – hidden ...