Generative AI Red Team Testing： How to Effectively Evaluate Large Language Models

OWASP's latest guide provides detailed guidance for establishing generative AI red teams or adjusting existing red teams to adapt to new technologies.

Image source: Shutterstock

Red team testing is a time-tested method for testing and strengthening cybersecurity systems, but it needs to adapt to technological evolution continuously. The explosion of generative AI and large language models (LLM) in recent years is the latest technological innovation that has forced the red team testing field to adjust again.

Regulations and regulatory agencies such as the EU's 'Artificial Intelligence Act' and the National Institute of Standards and Technology (NIST)'s 'Artificial Intelligence Risk Management Framework' further highlight the importance of AI red team testing.

As AI is an emerging technology, many organizations are just beginning to formulate methods for generative AI red team testing, making OWASP's recently released 'Generative AI Red Team Testing Guide: Practical Methods for Assessing AI Vulnerabilities' a timely resource.

What is Generative AI Red Team Testing?

OWASP defines generative AI red team testing as a 'structured approach to identifying vulnerabilities in AI systems and mitigating risks', combining traditional adversarial testing with AI-specific methods and risks. This includes all aspects of generative AI systems, such as models, deployment pipelines, and various interactions in a broader system context.

OWASP emphasizes the role of tools, technical methods, and cross-functional collaboration, including threat modeling, scenario design, and automation, all of which are based on human expertise. Some key risks include prompt injection, bias and toxicity, data leakage, data poisoning, and supply chain risks, some of which also appear in OWASP's 'Top Ten Risks of LLMs'.

To effectively implement red team testing, the following key steps need to be taken:

Define clear objectives and scope
Team Building
Threat Modeling
Cover the entire application stack
Summary, post-participation analysis, and continuous improvement

Generative AI red team testing complements traditional red team testing by focusing on the subtleties and complexities of AI-driven systems, covering new testing dimensions such as AI-specific threat modeling, model reconnaissance, prompt injection, and bypassing security barriers.

Scope of AI Red Team Testing

Generative AI red team testing, based on traditional red teaming, covers unique aspects of generative AI such as models, model outputs, and the response of models. The red team should check whether the model can be manipulated to produce misleading or false outputs, or 'escape' to run in an unexpected way.

The team should also judge the possibility of data leakage, which are key risks that generative AI users should be concerned about. OWASP suggests considering both the attacker's perspective and the perspective of affected users during testing.

Based on the NIST AI RMF generative AI profile, the OWASP guide suggests considering all stages of the lifecycle (such as design, development, etc.), risk scope (such as models, infrastructure, and ecosystems), and risk sources in AI red team testing.

Risks addressed by generative AI red team testing

As we discussed, generative AI brings some unique risks, including model manipulation and poisoning, bias, and hallucinations, as shown in the figure above. To this end, OWASP recommends a comprehensive approach covering the following four key aspects:

Model evaluation
Implementation testing
System evaluation
Runtime analysis

These risks also need to be examined from three perspectives: security (operator), security (user), and trust (user). OWASP categorizes these risks into three major categories:

Security, privacy, and robustness risk
Toxicity, harmful context, and interaction risk
Bias, content integrity, and misinformation risk

In particular, 'agent-based AI' has attracted great attention in the industry, with leading investment institutions such as Sequoia Capital even calling 2025 the 'agent-based AI元年'. OWASP has specifically pointed out multi-agent risks, such as multi-step attack chains across agents, exploitation of tool integration, and bypassing permissions through agent interaction. To provide more details, OWASP recently released the publication 'Agent-based AI - Threats and Mitigations', including an abstract of the multi-agent system threat model.

Threat modeling for generative AI/LLM systems

OWASP lists threat modeling as a key activity in generative AI red team testing and recommends MITRE ATLAS as an important reference resource. Threat modeling aims to systematically analyze the attack surface of a system, identify potential risks and attack vectors.

Key considerations include the architecture of the model, data flow, and how the system interacts with a broader environment, external systems, data, and social technical aspects such as users and behavior. However, OWASP points out that AI and machine learning bring unique challenges because models may exhibit unpredictable behavior due to non-determinism and probability.

Generative AI red team testing strategy

The red team testing strategy for generative AI in each organization may vary. OWASP explains that the strategy must be consistent with the organization's goals, which may include unique aspects such as responsible AI goals and technical considerations.

Image source: OWASP

The red team testing strategy for generative AI should consider all aspects shown in the figure above, such as the definition of risk-based scope, participation of cross-functional teams, setting clear objectives, and generating reports with both informativeness and operability.

Blueprint for generative AI red team testing

Once the strategy is determined, organizations can develop a blueprint for generative AI red team testing. This blueprint provides a structured method and specific steps, technologies, and objectives.

OWASP suggests a phased evaluation of generative AI systems, including models, implementations, systems, and runtime, as shown in the following figure:

Image source: OWASP

Each phase has key considerations, such as model source and data pipeline, security fence testing during the implementation process, checking for exploitable components in the deployed system, and potential failures or vulnerabilities in the runtime business processes, especially the interactions of multi-AI components in the production environment.

This phased approach helps to efficiently identify risks, implement multi-layered defenses, optimize resources, and pursue continuous improvement. Tools should also be used for model evaluation to support evaluation speed, efficient risk detection, consistency, and comprehensive analysis. The complete OWASP generative AI red team testing guide provides a detailed checklist for each blueprint stage for reference.

Key Technologies

Although there are many possible technologies for generative AI red team testing, determining which technologies to include or where to start may be overwhelming. OWASP provides some technologies they consider 'essential'.

These technologies include:

Adversarial prompt engineering
Data set generation manipulation
Tracking multi-step attacks
Security boundary testing
Agent tool/plugin analysis
Organizational detection and response capabilities

This is just part of the key technology, and the list provided by OWASP combines technical considerations with organizational operational activities.

Matured AI-related red team

Like traditional red team testing, generative AI red team testing is an ongoing and iterative process, and teams and organizations can gradually mature their methods in terms of tools and practices.

Due to the complexity of AI and its integration capabilities with multiple areas of the organization (such as users, data, etc.), OWASP emphasizes the need to work with multiple stakeholders in the organization, hold synchronized meetings regularly, clearly define the process of sharing findings, and integrate existing organizational risk frameworks and control measures into red team testing.

The teams conducting generative AI red team tests should also continue to develop, adding additional specialized knowledge as needed to ensure that relevant skills can adapt to the rapid changes in the technological landscape of generative AI.

Best Practices

The OWASP generative AI red team testing guide lists some key best practices that organizations should widely consider. For example, formulate generative AI policies, standards, and procedures, and set clear objectives for each red team test.

In addition, organizations need to clearly define meaningful standards to evaluate whether the test is successful, record the test procedures, findings, and mitigation measures in detail, and establish a knowledge base to support future generative AI red team testing activities.

Reference source: