PentStark
Blog · Compliance

EU AI Act red teaming: what a pentester actually delivers

PentStark ComplianceApril 14, 20268 min readMore on Compliance

The EU AI Act's obligations on high-risk AI systems and general-purpose AI models came into force. Article 15 requires cybersecurity measures "appropriate to the risks." Article 54a requires adversarial testing for GPAI models with systemic risk. Every provider we work with is now asking the same question: what does an auditor actually expect?

We've run engagements mapped to both articles across the first wave of Q1 2026 audits. Here's the shape that has been accepted.

What the articles actually ask for

Article 15, in substance: high-risk AI systems must be designed and developed to achieve an appropriate level of accuracy, robustness, and cybersecurity, including resistance to data poisoning, model poisoning, adversarial examples, and confidentiality attacks.

Article 54a (via the AI Office's guidance) expects providers of GPAI with systemic risk to conduct adversarial testing, including red teaming by independent experts, document the results, and maintain a testing lifecycle.

Between them: the regulator wants evidence that you've tested the system against the failure modes that match its risk tier — not a generic pentest report.

The engagement shape

Our AI Act engagements have five parts.

1. Threat model

A written threat model, in the provider's own documentation style, that enumerates the applicable classes:

  • Prompt injection (direct, indirect, cross-agent).
  • Adversarial inputs (evasion, typographic attacks).
  • Model supply chain (poisoning of training or fine-tuning data).
  • Tool-use abuse (for agentic systems).
  • Data leakage (training-data extraction, RAG context leakage).
  • Model extraction / replication.
  • Multi-agent compromise.

The threat model is what the auditor reads first. If it doesn't name the classes that match the product, the rest of the engagement is suspect.

2. Test plan

A numbered test plan, one item per attack class, with acceptance criteria. The plan is versioned and the delta from the prior version is visible.

3. Execution log

For each test: inputs, observed outputs, pass/fail, mitigation references. This becomes the evidence trail.

4. CI-ready evaluation suite

We hand the provider a suite they can run in CI. This is the "testing lifecycle" part — the Act doesn't ask for a one-shot report; it asks for ongoing testing. The suite is how ongoing testing happens.

5. Residual risk statement

A short statement: here are the classes tested, here are the findings open, here's the residual risk the provider has accepted and why.

Three places the first audit cycle caught providers off-guard

Documentation density

Providers with solid internal testing often had sparse external documentation. The auditor wanted the artifact, not the confidence. If it's not written down, it didn't happen.

Scope drift between threat model and test plan

The threat model was often broader than the test plan. Auditors flagged the unstated delta. Either test what you threat-model, or state why you didn't.

"Our model provider tests for that"

Several providers deferred to OpenAI / Anthropic for certain classes. The Act places obligations on the deployer as well as the provider. If you build on a frontier model, you still own the tool-use surface, the RAG surface, and the app surface. Upstream testing does not transfer.

What to have in your binder

Before the audit:

  • Threat model signed by the product lead.
  • Test plan signed by whoever runs testing.
  • Red team report from the last 12 months (independent party, per Art. 54a).
  • CI eval suite with results from the last month.
  • Residual risk statement acknowledged by accountable management.

This is the binder that passed audits in our experience. Everything else is a variant.

The takeaway

The AI Act doesn't require a new kind of security testing. It requires documented, scoped, lifecycle security testing of the classes the model actually exposes. Teams that already threat-model, red team, and run eval suites need to harden their paperwork. Teams that don't need to start.

Get research like this monthly.

No marketing fluff. Unsubscribe anytime.

Talk to an operator

Your next finding is one scoping call away.

Thirty minutes with a real operator tells us what you need and what we can deliver. No BDR handoff, no sales engineer theater — the person you talk to is the person who scopes the work.

Talk to an expertBook a demo
Responses in < 1 business day