Âme artificielle Methods

These methods describe how to operationalize the Âme artificielle principles: planning, evaluation, governance, and safe deployment.

1. Threat modeling (before building)

Define actors, incentives, assets, and failure modes. Identify misuse paths, accident paths, and systemic risks.

Gate releases using an evaluation ladder that scales with model capability and real-world access.

Use internal and external red teams. Test for persuasion, deception, privacy leakage, policy bypass, and unsafe autonomy.

Treat safety as a CI pipeline: keep a stable test suite and track regressions across versions.

Control training and finetune data sources. Document provenance, consent, and sensitive data handling.

Convert written policy into enforceable product behaviors: UX friction, refusals, tool constraints, and logging.

Restrict what the model can do: tool allowlists, scoped permissions, explicit user approvals, and sandboxing.

Prefer mechanisms that allow inspection: trace logs, rationales where safe, and independent audits.

Monitor real usage: abuse patterns, refusal rates, harmful outputs, data leakage signals, and tool misuse.

Have a practiced playbook: triage, mitigation, comms, and rollback authority. Treat severe incidents like SEVs.

Use staged approvals for higher-risk releases: security review, legal/privacy review, external oversight where appropriate.

Assume the environment shifts. Keep updating evaluations, mitigations, and policy based on incidents and new capabilities.