Evaluation Suite — Safety & Robustness — Evaluation & Documentation
Evaluation Suite — Safety & Robustness
EU AI Act Compliance Evaluation & Documentation EU/UK aligned
+ On this page
Key takeaways
- Build a repeatable evaluation suite tied to change control; never ship without thresholds.
Scope & risk mapping
- Map risks to tests: toxicity, hallucination, privacy leakage, bias, jailbreak, tool abuse, data exfiltration.
Safety metrics
- Refusal accuracy, harmful content rate, prompt-injection success, PII leakage rate, groundedness error.
Robustness testing
- Stress tests; adversarial prompts; distribution shift; ablation of guardrails; fail-safe behaviours.
Fairness & cohort tests
- Metrics per cohort; minimum viable sample sizes; statistical significance; remediation gates.
RAG/groundedness tests
- Retrieval precision/recall; citation correctness; hallucination with/without context; source freshness.
- Action approval flows; rate limiting; argument validation; guard policies for destructive tools.
EvalOps & repeatability
- Seed control; dataset versioning; environment capture; CI gates; golden sets; flaky test detection.
Thresholds & waivers
- Approval matrix; waiver policy with expiry; residual risk statements; compensating controls.
Sign-off & packaging
- Evaluation report; model card links; reviewer signatures; release artifact checksums.
Post-release validation
- Canary rollout; shadow tests; rollback criteria; telemetry validation; incident hooks.
Records & dashboards
- Scorecards over time; cohort drift; guardrail health; links to incidents/CAPA and PMM.
Evaluation checklist
- Coverage mapped; thresholds set; repeatable runs; sign-off complete; dashboards live.
© Zen AI Governance UK Ltd • Regulatory Knowledge • v1 05 Nov 2025 • This page is general guidance, not legal advice.
Related Articles
Model Cards & Evaluation Strategy — Evaluation & Documentation
Zen AI Governance — Knowledge Base • EU/UK alignment • Updated 05 Nov 2025 www.zenaigovernance.com ↗ Model Cards & Evaluation Strategy EU AI Act Compliance Evaluation & Documentation EU/UK aligned + On this page On this page Purpose & scope Model ...
Model Versioning & Release Controls — Evaluation & Documentation
Zen AI Governance — Knowledge Base • EU/UK alignment • Updated 05 Nov 2025 www.zenaigovernance.com ↗ Model Versioning & Release Controls EU AI Act Compliance Evaluation & Documentation EU/UK aligned + On this page On this page Versioning scheme ...
Performance, Robustness & Cybersecurity — Lifecycle Operations
Zen AI Governance — Knowledge Base • EU/UK alignment • Updated 05 Nov 2025 www.zenaigovernance.com ↗ Performance, Robustness & Cybersecurity EU AI Act Compliance Regulatory Knowledge EU/UK aligned + On this page On this page Targets & acceptance ...
Technical Documentation (EU/UK aligned)
Zen AI Governance — Knowledge Base • EU/UK alignment • Updated 05 Nov 2025 www.zenaigovernance.com ↗ Technical Documentation (EU/UK aligned) EU AI Act Compliance Regulatory Knowledge EU/UK aligned + On this page On this page System overview & purpose ...
Technical Documentation (Technical File) — Foundations
Zen AI Governance — Knowledge Base • EU/UK alignment • Updated 05 Nov 2025 www.zenaigovernance.com ↗ Technical Documentation (Technical File) — EU/UK aligned EU AI Act Compliance Foundations EU/UK aligned + On this page On this page Scope & purpose ...