# AI/LLM Pentest Methodologies

## Review Cobalt pentest methodologies for an AI/LLM application.

Cobalt follows an industry-standard methodology based on the __[OWASP Top 10 for Large Language Model Applications](https://owasp.org/www-project-top-10-for-large-language-model-applications/)__.

Cobalt offers three levels of Artificial Intelligence (AI) and Large Language Model (LLM) pentesting for Web and Web + API Assets.

## Prompt Injection

Cobalt focuses on testing AI systems security against prompt injection attacks. These attacks manipulate the AI’s input to generate malicious output, which can compromise the system’s integrity and confidentiality. Prompt Injection AI/LLM pentests are run as an Agile pentest with an __[automated report](https://cobalt-io.brainfish.ai/en-us/articles/reports-YkC2da1roR#h-pentest-report-types)__.

## Isolated LLM

For an Isolated LLM pentest, Cobalt testers will focus on the AI/LLM-specific components of a standalone application — one that does not use retrieval-augmented generation or external knowledge bases. As part of the pentest, Cobalt testers will also test web and/or API components that directly interact with your LLM, using the relevant OWASP Top 10 guidelines.

Cobalt testers will use simulated attacks and analytical techniques to identify potential security vulnerabilities specific to applications leveraging Large Language Models.

For an Isolated LLM pentest, Cobalt performs the following steps to ensure complete coverage:

* Application Reconnaissance and Scope Definition

* Interactive Probing and System Understanding

* Vulnerability Identification and Exploitation

* Reporting, triaging, and retesting

When testing LLM applications, pentesters focus on how data is processed, how the model interacts with external systems, the security of its input and output handling, and the potential for misuse or manipulation, in addition to the security of the underlying infrastructure and APIs.

The specific techniques, tools, custom scripts, and prompt engineering frameworks pentesters use can vary between tests, as this field is rapidly evolving. Provide architectural diagrams, API documentation, and other relevant details when __[describing your asset](https://docs.cobalt.io/en-us/articles/create-an-asset-9P1WPENeIC?utm_source=brainfish\&utm_medium=popup_widget#h-describe-your-assets)__.

## Application Reconnaissance and Scope Definition

During this initial phase, Cobalt pentesters gather information about the LLM application from the perspective of a malicious user. This involves identifying publicly available data and inferring the LLM’s functionality to map its attack surface.

Understanding the application’s attack surface helps pentesters:

* Discover potential input vectors for the LLM.

* Understand how the LLM’s output is utilized.

* Identify integrated systems or third-party dependencies.

Pentesters conduct reconnaissance in the following areas:

* Application Documentation and Public Interfaces: Evaluate described features, API endpoints (if any), user guides, and interaction examples. This includes identifying:

* Intended use cases and limitations.

* Data types accepted as input and produced as output.

* Information about the underlying model (if exposed).

* Connected services or plugins.

* LLM Research: Searching for articles, forum posts, or academic papers related to the specific LLM (if known) or similar applications, to uncover known weaknesses or attack vectors.

* Associated Infrastructure: For web-based LLM applications, standard web application reconnaissance is performed (e.g., identifying technologies and exposed endpoints).

:::info
**Tools**

Cobalt may use tools such as:

* Burp Suite

* ZAP
:::

## Interactive Probing and System Understanding

After gathering initial information, Cobalt pentesters actively interact with the LLM application to understand its behavior, constraints, and how it processes various inputs.

These tests involve:

* Sending a wide variety of prompts and crafted inputs.

* Observing output patterns and error messages.

* Attempting to understand the LLM’s implicit rules and boundaries.

During this testing phase, pentesters use various techniques:

* Manual interaction through the application’s interface.

* Prompt engineering frameworks and custom scripts for systematic probing.

* HTTP proxies (e.g., Burp Suite) to intercept and modify requests to LLM APIs.

Pentesters aim to understand:

* How the LLM responds to ambiguous, leading, or out-of-scope prompts.

* The level of sanitization or filtering applied to inputs and outputs.

* Error handling mechanisms.

## Vulnerability Identification and Exploitation

Cobalt pentesters test for vulnerabilities outlined in the OWASP Top 10 for LLM Applications. The objective is to identify weaknesses that could result in unintended behavior, data leakage, unauthorized actions, or system compromise.

The testing focus for each OWASP LLM Top 10 category involves pentesters attempting to:

### LLM01: Prompt Injection:

* Objective: Manipulate the LLM’s input to bypass instructions or control its behavior.

* Focus: Craft inputs that override system prompts, execute hidden commands, or cause the LLM to ignore prior instructions. Includes testing for direct and indirect prompt injections.

### LLM02: Sensitive Information Disclosure:

* Objective: Elicit sensitive data from the LLM that it should not reveal.

* Focus: Probe for PII, API keys, internal system details, proprietary algorithms, or specific training data.

### LLM03: Supply Chain Vulnerabilities:

* Objective: Identify weaknesses in third-party components, pre-trained models, or data sources.

* Focus: Analyze dependencies (plugins, external datasets, APIs) for known vulnerabilities, and assess the risk of compromised or untrustworthy models.

### LLM04: Data and Model Poisoning:

* Objective: Assess the risk of an attacker corrupting the training data or fine-tuning data to introduce vulnerabilities, biases, or backdoors.

* Focus: If the LLM learns from interactions, attempt to feed it malicious or biased information to skew future outputs or behaviors.

### LLM05: Improper Output Handling:

* Objective: Identify risks associated with how the LLM’s output is parsed and used by downstream components.

* Focus: Test if LLM outputs can inject malicious code (e.g., XSS) into connected systems, or cause unexpected application behaivor due to unsanitized output.

### LLM06: Excessive Agency:

* Objective: Determine if the LLM has been granted excessive permissions or capabilities to interact with other systems or perform actions.

* Focus: Identify all autonomous actions the LLM can take (e.g., API calls, database queries, sending emails) and attempt to trick it into performing unauthorized or harmful actions.

### LLM07: System Prompt Leakage:

* Objective: Trick the LLM into revealing its system prompt or initial instructions.

* Focus: Use various prompt engineering techniques (e.g., role-playing, meta-questions) to make the LLM disclose its configuration, rules, or confidential instructions.

### LLM08: Vector and Embedding Weaknesses:

* Objective: Exploit vulnerabilities related to the generation, storage, or use of vector embeddings.

* Focus: Craft inputs that result in adversarial embeddings. Try to reverse-engineer sensitive information from exposed embeddings, or find collisions leading to misclassification or unintended behavior.

### LLM09: Misinformation:

* Objective: Assess the LLM’s propensity to generate plausible but false, misleading, or harmful information (hallucinations).

* Focus: Testing with prompts designed to elicit factual inaccuracies, biased statements, or harmful advice. Evaluating the LLM’s confidence in its potentially incorrect outputs.

### LLM10: Unbounded Consumption:

* Objective: Identify if the LLM application is vulnerable to attacks that cause excessive resource consumption (time, computation, financial).

* Focus: Craft inputs that lead to unexpectedly long processing times, trigger recursive operations, or exploit inefficient algorithms, potentially causing denial of service or high operational costs.

:::info
**Tools**

Cobalt may use tools such as:

* Adversarial Robustness Toolbox (ART)

* CleverHans

* Foolbox

* Burp Suite

* ZAP

* AI Fairness 360 / Fairlearn
:::

## Reporting, triaging, and retesting

Cobalt pentesters report and triage all vulnerabilities during the assessment. You can review details of all __[findings](https://cobalt-io.brainfish.ai/en-us/articles/findings-6prkG67iav)__, in real time, through the Cobalt platform. In these findings, as well as in any __[report](https://cobalt-io.brainfish.ai/en-us/articles/reports-YkC2da1roR)__, Cobalt’s pentesters include detailed information, including:

* Step-by-step remediation guidance

* Recommendations on how to improve your overall security posture

You can __[remediate findings](https://cobalt-io.brainfish.ai/en-us/articles/remediate-findings-22WPR0Fnlv)__ during and after the pentest. Then you can submit findings for retest. Our pentesters test the updated components and retest vulnerabilities to ensure that there are no security-related residual risks.

__[Learn more](https://cobalt-io.brainfish.ai/en-us/articles/scope-test-period-dvnBEgzJgm#h-aillm-pentesting)__ about how to scope an AI/LLM pentest.

## RAG-Enabled LLM

For a RAG-Enabled LLM pentest, all Isolated LLM testing applies. This offering extends coverage to applications that leverage Retrieval-Augmented Generation (RAG) — systems that combine a large language model with external knowledge sources such as vector databases, document stores, or real-time retrieval pipelines.

In addition to the Isolated LLM methodology, Cobalt pentesters will focus on the following areas:

### Retrieval Pipeline Security

Cobalt pentesters assess the security of the retrieval mechanism, including how queries are constructed and passed to the knowledge base, and whether retrieval results can be manipulated to influence LLM outputs.

### Data Source and Document Injection

Cobalt pentesters test whether malicious content embedded in retrievable documents can trigger indirect prompt injection, influence model behavior, exfiltrate data through the LLM's output, or cause the model to act on adversarial instructions sourced from the knowledge base.

### Vector and Embedding Weaknesses (LLM08)

Cobalt pentesters test for vulnerabilities in the generation, storage, or retrieval of vector embeddings used by the RAG pipeline. This includes crafting inputs that produce adversarial embeddings, attempting to reverse-engineer sensitive information from exposed embeddings, and identifying retrieval collisions that surface unintended content.

### Access Control on Retrieved Content

Cobalt pentesters evaluate whether the retrieval system enforces appropriate access controls — ensuring users cannot retrieve documents or data they are not authorized to access via the LLM interface.

You will be asked to provide architectural diagrams for your RAG pipeline, including details on the vector database, embedding model, document ingestion process, and any access control layers.


AI/LLM Methodologies

Cobalt Methodologies Overview

Web Methodologies

API Methodologies

External Network Methodologies

Internal Network Methodologies

Mobile Methodologies

Cloud Configuration Review Methodologies

Cloud Pentest Methodology

Desktop Methodologies

Digital Risk Assessment Methodology

Secure Code Review Methodologies

Methodologies

How to get started with Cobalt

Getting Started

Assets are what we pentest

Asset Types

Create an Asset

Risk Advisories

Assets

Pentests

Overview

Asset Details

Pentest Details

Scope & Test Period

Preparation

Create a Pentest

Pentest Process

Pentest States

Coverage Checklist

Pentest Expectations

Complete an In-House Pentest (For Pentesters)

Set Up an In-House Pentest

In House Pentests

Remediate Findings

Finding States

Severity Levels

Findings

Contents of a Pentest Report

Customize Your Report

Co-branded Reports

Reports

Engagements Overview

Digital Risk Assessment

Secure Code Review

Engagements

Insights

Planning

Attack Surface Monitoring (ASM)

Attack Surface

DAST Scanner Overview

Best Practices

Integrations

Partial Scans: Reduced Scope

Scans

Sequence recorder

Targets

Target Authentication

Extra Hosts

Blackout Period

DAST Scanner

Account Overview

Account Settings

Password Best Practices

Sign In to Cobalt

Troubleshoot Sign-in Issues

User Account

Collaboration Overview

Collaborate on Pentests

Groups

Manage Pentest Collaborators

User Roles and Permissions

Manage Notifications

Collaboration

Organization Overview

Manage Users

Your Cobalt Contract

Configure SAML SSO

How to Configure SAML 2.0 for Cobalt 

SAML Migration: Update Your Configuration

SCIM Provisioning Setup Guide

Configure Organization Settings

Organization

Cobalt PtaaS Tiers

Cobalt Credits

Track Your Credits

Credits

Cobalt Integrations Overview

Connect your applications

Create Azure DevOps Work Item for Findings

Create GitHub tickets for Findings

Create GitLab tickets for Findings

Send email notifications for Findings from Outlook

Send MS Teams notifications for Findings

Create Jira tickets for Findings

HTTP Connector

Import Assets from Google Sheets

Map Cobalt Pentest Finding Severity to Jira Issue Priority

Modify Jira Issue Summary to contain Pentest Title

How to Guides

Troubleshooting

Integration Builder

Jira Cloud Integration

Jira Data Center Integration

Push Cobalt Findings to Jira

Synchronize Cobalt Severity Levels with Jira

Switching to a New Jira Instance

Troubleshoot Your Jira Integration

Integrate Jira with Cobalt

Slack

Microsoft Teams

Webhooks

DefectDojo

Kenna Security

Vanta

Carried Over Findings

Create Test Finding

Cobalt API Overview

Create a Personal Cobalt API Token

Get an Organization Token

Revoke Your Personal Cobalt API Tokens

Create or Modify an Asset

Import Findings to Google Sheets

Cobalt API

Glossary

May 2026

April 2026

March 2026

February 2026

January 2026

November 2025 

August 2025

July 2025

June 2025

May 2025

April 2025

March 2025

January 2025

December 2024

November 2024

October 2024

September 2024

August 2024

July 2024

June 2024

May 2024

April 2024

March 2024

January 2024

Product Updates

What's New

Guide

Cobalt Product Documentation

AI/LLM Methodologies

Share

Share

AI/LLM Methodologies

Share

Share