Two Approaches for Offensive Testing of AI Systems: Architecture-led AI Application Penetration Test and Threat-led AI Red Team Assessment
Artificial intelligence (AI) is changing the shape of the application attack surface. A traditional application assessment usually starts with familiar questions, such as:
-
What are the exposed endpoints?
- What authentication model is in use?
- Where are the input fields?
- What data is processed?
- How does the application enforce authorization?
AI systems add another layer of complexity. The application may include a large language model (LLM), retrieval-augmented generation (RAG), vector databases, agents, plugins, connectors, MCP servers, model artefacts, data pipelines, and supporting machine learning operations (MLOps) infrastructure. The system may also process untrusted inputs that are not obvious to the user, such as retrieved documents, uploaded files, tool responses, third-party content, or data from connected enterprise platforms.
That means offensive testing of AI systems cannot simply rely on payload execution alone. Before testing begins, the more important question is methodological:
Are we trying to understand the security of the AI system's architecture, or are we trying to simulate what a realistic adversary could achieve through it?
At SpiderLabs, we commonly think about offensive AI testing through two complementary approaches:
-
Architecture-led AI application penetration testing
-
Threat-led AI red team assessment
The assessment approach determines the scope, prerequisites, stakeholder involvement, execution model, and the value the organization receives from the engagement.
Approach 1: Architecture-Led AI Application Penetration Testing
The objective here is to understand how the AI application is designed, how its components interact, where trust boundaries exist, and how those boundaries could be abused. This approach is particularly useful when an organization wants assurance over a specific AI-enabled application, workflow, or platform before or after deployment.
An architecture-led assessment starts with the system design.
SpiderLabs' AI Application Penetration Test combines a lightweight threat modelling exercise with hands-on technical testing of the AI/ML system. The STRIDE (Spoofing, Tampering, Repudiation, Information Disclosure, Denial of Service, and Elevation of Privilege) threat model is applied to AI-specific components such as data pipelines, model artifacts, prompt layers, agentic orchestration, connectors and supporting infrastructure. Testing is then aligned to relevant industry references, including the OWASP Top 10 for LLM Applications, OWASP Machine Learning Security Top 10, OWASP Agentic Top 10, MITRE ATLAS, and the OWASP Web Security Testing Guide, wherever applicable.
In practical terms, the assessment asks questions such as:
- Can direct or indirect prompt injection influence the model's behavior?
- Can retrieved content manipulate the response or bypass intended controls?
- Can an agent select or invoke tools in an unintended way?
- Can system prompts, sensitive data, or proprietary information be disclosed?
- Are vector databases, connectors, MCP servers, APIs, model registries or MLOps components misconfigured?
- Can outputs from the model create downstream application security issues?
- Are controls around rate limiting, logging, secrets, access control, and isolation sufficient?
This approach is especially strong when the AI system has meaningful architecture behind it: a RAG pipeline, agentic workflow, enterprise connectors, custom orchestration layer, model supply chain, or integration with business-critical systems.
When to Choose this Approach
Choose an architecture-led assessment when the primary goal is coverage and assurance.
It is the right option when the organization wants to know whether the AI system has been designed and implemented securely across its major components. It is also suitable when an AI feature is moving toward production, when a security team needs a structured view of technical exposure, or when engineering teams need actionable remediation guidance.
Typical scenarios include:
-
Pre-production assessment of an AI-enabled application.
-
Security review of a RAG-based knowledge assistant.
-
Testing of an agentic workflow with tool or plugin access.
-
Assessment of AI features embedded into an existing web, mobile, or SaaS application.
-
Review of vector stores, connectors, model registries, APIs and supporting MLOps infrastructure.
-
Baseline assurance before a more adversarial red team exercise.
Secure your expanding attack surface with LevelBlue Penetration Testing.
Prerequisites and Involvement
Architecture-led testing benefits from stronger collaboration upfront.
Useful inputs include architecture diagrams, data flow diagrams, component lists, API documentation, model and provider details, system prompts where available, RAG corpus information, tool and connector definitions, test accounts, staging access, production constraints, logging visibility and data handling requirements.
The organization should expect involvement from application owners, AI engineers, security architects, platform engineers and, where relevant, data science or MLOps teams. A solution walkthrough is valuable because it allows the testing team to understand ingestion, training, inference and user interaction paths before selecting the most relevant test cases. The level of involvement is moderate to high during planning and threat modelling, then lower during active testing, with targeted support for access issues, environment constraints and clarification.
Value Delivered
The primary value is structured technical assurance.
The organization receives findings that are grounded in the actual architecture of the AI system, not generic AI attack scenarios. The output helps engineering and security teams understand where controls are weak, where trust boundaries are exposed, and where design changes may be needed.
This approach produces strong remediation value because findings can be mapped back to affected components, threat references and specific control gaps. It is also useful for governance because it demonstrates that the AI system has been assessed against recognized AI, ML, and application security risk categories.
Approach 2: Threat-Led AI Red Team Assessment
Instead of asking, "Is this AI system secure across its components?", the assessment asks, "What could a realistic attacker achieve through or against this AI system?"
A threat-led assessment starts with the adversary.
SpiderLabs' AI Red Team Assessment is an end-to-end adversarial simulation against AI and ML systems, including LLM applications, traditional ML pipelines and agentic systems with autonomous tool use. The exercise extends established red team methodology and is informed by MITRE ATLAS, OWASP Top 10 for LLM Applications, NIST AI Risk Management Framework and ISO/IEC 42001.
The key difference is that the testing is scenario-led rather than checklist-led.
Threat actor profiles are selected from sector-relevant intelligence and mapped to MITRE ATLAS tactics. Attack preparation may include OSINT, behavioral baselining, payload library development, and attack infrastructure setup. Execution is then delivered as a chained, multi-step attack rather than a set of isolated test cases.
A threat-led AI red team may explore objectives such as:
-
Establishing initial access through direct or indirect prompt injection.
-
Introducing poisoned content into a RAG corpus or trusted workflow.
-
Manipulating conversation memory or agent state.
-
Abusing authenticated connectors such as SharePoint, Confluence, Jira, Slack, email, or file shares.
-
Escalating privileges through excessive agency or system abuse.
-
Exfiltrating sensitive data.
-
Manipulating outputs or business decisions.
-
Testing whether AI-specific monitoring, prompt analytics, agent action logging, and response playbooks detect the activity.
This approach is not about proving that a single prompt can produce an undesirable output. It is about demonstrating whether an adversary can chain behaviors together to achieve a meaningful objective.
When to Choose This Approach
Choose a threat-led AI red team assessment when the primary goal is business impact and defensive validation.
It is the right option when the organization wants to understand how an AI system could be abused in a realistic attack chain, especially where the AI capability has access to sensitive data, business workflows, enterprise connectors, privileged actions or customer-facing interactions.
Typical scenarios include:
-
Testing a high-risk AI system already deployed in production.
-
Assessing customer-facing AI features with reputational exposure.
-
Validating whether blue team monitoring can detect AI-specific attacks.
-
Testing agentic systems with autonomous or semi-autonomous tool use.
-
Assessing whether prompt injection can become a broader enterprise compromise path.
-
Running executive-level exercises where attack narrative and business impact if given importance more than broad control coverage.
Prerequisites and Organization Involvement
Threat-led work requires clearer objectives, stronger rules of engagement and more operational coordination.
Useful inputs include target objectives, crown-jewel data definitions, threat scenarios of concern, sector-specific threat intelligence, test accounts, approved attack paths, query budgets, logging access, escalation contacts, rollback procedures and agreed boundaries for agentic actions.
The organization involvement is typically higher than in a standard penetration test. Security operations, detection engineering, incident response, application owners, AI engineers, risk stakeholders, and executive sponsors may all be involved at different stages.
This is especially important where the assessment includes blue team response. The red team needs enough freedom to simulate realistic activity, while the organization needs enough control to manage safety, data handling and operational risk.
Value Delivered
The primary value is adversarial realism.
The organization receives a narrative of what happened, how the attack progressed, what controls failed, what was detected, what was missed, and what defensive improvements are required. The output is useful not only for engineering remediation, but also for security operations, detection engineering, incident response, governance, and executive risk communication.
A strong AI red team assessment helps answer questions that architecture-led testing may not fully address:
-
Could an attacker move from the AI layer into connected enterprise systems?
-
Could trusted content or connected tools become an attack path?
-
Would existing monitoring detect AI-specific abuse?
-
Are response teams prepared for prompt injection, agent manipulation, or data exfiltration through AI workflows?
-
Do governance controls work under adversarial pressure?
Architecture-Led vs Threat-Led: How to Choose
An architecture-led assessment is best when the organization needs structured assurance over the AI system itself. A threat-led assessment is best when there is a need to understand what an adversary could achieve and whether the organization can detect and respond.
The two approaches are complementary, but they are not interchangeable.
A practical way to distinguish them is as follows:
|
Decision Factor |
Architecture-Led Assessment |
Threat-Led Assessment |
|
Starting point |
The AI system architecture |
The adversary objective |
|
Core question |
Where can this system fail? |
What could an attacker achieve? |
|
Primary value |
Coverage, assurance and remediation |
Realistic attack simulation and defensive validation |
|
Best suited for |
AI applications, RAG, agents, MLOps, pre-production, or baseline assurance |
High-risk systems, production exposure, scenario execution, blue-team validation |
|
Testing style |
Structured testing against components, trust boundaries, and prioritized threats |
Chained attack execution mapped to realistic objectives |
|
Involvement |
Architecture walkthrough, documentation review, access support |
Rules of engagement, blue team coordination, response validation |
|
Output |
Findings mapped to technical components, threat model references, and remediation guidance |
Attack narrative, impact demonstration, detection gaps, and response recommendations |
The choice should be based on the maturity of the AI system, and the assurance question the organization is trying to answer.
If the system is still being designed or has recently been built, architecture-led testing is usually the better first step. It gives the engineering and security teams a structured view of weaknesses across the AI stack.
If the system is already business-critical, exposed to customers, connected to sensitive systems, or part of a broader enterprise workflow, threat-led testing may provide greater value. It shows whether the organization can withstand and detect realistic abuse.
For many organizations, the most effective path is sequential: start with an architecture-led AI application penetration test to identify and remediate foundational weaknesses, then progress to a threat-led AI red team assessment once the system and controls are mature enough to withstand adversarial simulation.
The Key Differentiation
AI security conversations often move quickly to individual techniques: jailbreaks, prompt injection, RAG poisoning, model extraction, excessive agency, or data leakage. As such, the technique selection should follow a methodology. Without a clear approach, AI offensive testing can drift into one of two weak patterns.
The first is checklist testing: running through known categories without understanding how the AI system actually works. This may provide broad coverage, but it can miss the business-specific ways an AI system could fail.
The second is stunt-based testing: demonstrating a clever prompt or isolated jailbreak without proving meaningful risk. This may be interesting, but it often leaves engineering and security teams without a clear remediation path.
A mature AI testing program needs both structure and realism. Architecture-led assessment provides the structure. Threat-led assessment provides the realism. Together, they help organizations move beyond asking, "Can someone make the model say something strange?" and toward better questions:
-
Can untrusted content cross a trust boundary?
-
Can the model access or expose data it should not?
-
Can an agent misuse its tools?
-
Can connected systems be reached through the AI workflow?
- Can the organization detect and respond when AI-specific abuse occurs?
- Can remediation be validated over time?
An architecture-led AI application penetration test provides structured assurance across the AI system, its components and its trust boundaries. A threat-led AI red team assessment demonstrates how realistic adversaries could chain weaknesses together to achieve business impact.
Offensive testing of AI systems is not a single activity. It is a choice of approach.
Pick the methodology before the payloads. The value of an AI security assessment depends less on any single prompt and more on whether the engagement was structured to answer the right question.
About the Author
Sarath Nair is an APAC-based cybersecurity consultant with 15+ years of experience in offensive security and 7+ years with SpiderLabs, specialising in AI/LLM security, adversary emulation, red and purple team programs, and threat-informed security assurance.
ABOUT LEVELBLUE
LevelBlue secures what's next with intelligence-led security delivering visibility and speed to stop threats faster. As the world’s largest and most analyst-recognized pure-play managed security services provider, our AI-powered managed services and cyber expertise across managed, advisory, and incident response services help clients operate with confidence. Learn more about us.
https://www.levelblue.com/resources/blogs/internal-blog/how-to-create-a-blog-post/