These principles define the safety properties we want from AI systems. They are intended to be practical: each principle should map to tests, controls, and operational policies.
Systems must preserve user choice, avoid coercion, and default to asking before taking consequential actions.
Increase power only when safety and control mechanisms scale at least as fast as capability.
Prefer accurate, sourced, uncertainty-aware outputs. Admit unknowns and avoid fabricated certainty.
Assume adversarial pressure. Reduce the blast radius of misuse through layered mitigations and monitoring.
Make goals and constraints explicit: what the system is optimizing, what it must not do, and what it should refuse.
Grant the minimum access needed (data, tools, permissions). Minimize sensitive data exposure and retention.
Prefer designs that can be inspected, tested, logged, and meaningfully audited by internal and external reviewers.
Treat evaluation as ongoing: pre-release, post-release, and after distribution shifts. Measure what matters.
Rely on multiple independent safeguards (policy, technical controls, product UX, and oversight), not a single gate.
Tie decisions to named owners, documented rationale, and enforceable review processes with clear rollback authority.
Protect user data, prevent leakage, and design secure defaults. Security is a core alignment constraint.
Avoid targeted harassment, discrimination, and manipulation. Minimize harmful stereotyping and dehumanization.