Critical Thinking - Bug Bounty Podcast
Posts
[HackerNotes Ep. 142] gr3pme's full-time hunting journey update, insane AI research, and some light news

[HackerNotes Ep. 142] gr3pme's full-time hunting journey update, insane AI research, and some light news

In this episode, Rez0 and Gr3pme join forces to discuss Websocket research, Meta’s $111750 Bug, PROMISQROUTE, and the opportunities afforded by going full time in Bug Bounty.

Yujilik
October 03, 2025

Hacker TL;DR

WebSocket Turbo Intruder: PortSwigger released a new tool for advanced WebSocket testing. It includes a threaded engine for finding race conditions, a proxy to convert HTTP requests to WebSockets for easier testing
$111k Path Traversal → RCE: A $111k RCE in Facebook Messenger. When dealing with E2EE apps, it's important to remember that all validation occurs on the client-side, there is no server interaction
- The attack chain was a classic but effective one: the Messenger desktop client didn't sanitise attachment filenames, allowing traversal sequences in filenames. This let attackers write files anywhere on the victim's system, beyond the downloads folder. By placing a malicious DLL in a specific directory, they could trick another installed app into loading and executing it, achieving RCE.
PROMISQROUTE - Model Routing as Attack Vector: Modern LLM systems often use an internal router to send simple queries to cheaper, faster, and less secure models. This routing logic can be manipulated by crafting a prompt that appears simple to the router (e.g., "respond quickly") but contains a jailbreak for the weaker model. Cool AI “downgrade” attack.
CVE-Genie: AI-Powered N-Day Generation: A new multi-agent AI framework called CVE-Genie can automatically analyse CVEs, patches, and code diffs to build a test environment and generate a working PoC exploit. The framework had a 51% success rate in its tests, costing only $2.77 per CVE.

Stop chasing breaches and start fixing the gaps that cause them. ThreatLocker DAC automatically finds and helps you fix system misconfigurations before they're exploited. Effortlessly align with compliance standards like NIST, CIS, and HIPAA while hardening your defenses.

https://www.criticalthinkingpodcast.io/tl-dac

HACKERNOTES;

Advanced WebSocket Hacking with Turbo Intruder

PortSwigger just released a new WebSocket Turbo Intruder. WebSockets can be hard to test, often leading hackers to skip over them. This tool aims to lower that barrier to entry with several powerful features.

One of the most practical additions is an HTTP-style editing feature that simplifies the testing process. It allows us to send a standard HTTP request with a POST body to a local proxy, which then converts it into a properly formatted WebSocket message, a huge quality-of-life improvement. This means we can use familiar techniques to test for vulnerabilities like IDORs or SQL injection without manually crafting complex WebSocket frames.

Key features:

High speed - supports thousands of messages per second
HTTP adapter - automate testing by integrating with existing HTTP scanners
Smart filtering - hides boring responses so you can focus on interesting results

The tool also introduces a "threaded" engine specifically designed for finding race conditions. This engine initiates multiple WebSocket connections simultaneously, sending messages in parallel to trigger classic race condition bugs like logic bypasses, token reuse, and state desync. Check out the RaceConditionExample.py included with the extension.

During their testing, the PortSwigger team discovered a "WebSocket ping of death", a denial-of-service vulnerability in a Java WebSocket implementation. By using the turbo engine to send malformed WebSocket frames that violated the expected specification, they triggered an out-of-memory crash on the server. They released a Python PoC for this attack, which can be adapted for targets running vulnerable Java WebSocket implementations. See the PingOfDeathExample.py included with the extension.

$111k Path Traversal → RCE

Next, they analysed a high-impact RCE in Facebook Messenger for Windows, discovered by researcher Dzmitry Lukyanenka, which resulted in a $111,750 bounty, the vuln existed within the E2E encrypted chat feature.

This environment presents a unique threat model that every hacker should understand. In E2E encrypted communication, there is no server processing the content of the messages. All communication flows directly from one client to another. This shifts the security burden entirely to the client application, which may lack the robust, layered defences of a server-side environment.

The attack chain was a classic but effective one:

Path Traversal: The Messenger desktop client failed to sanitise filenames for attachments. An attacker could send a file with a name containing path traversal sequences.
Arbitrary File Write: Write a file to an arbitrary location on the victim's filesystem, outside of the intended downloads folder.
DLL Hijacking for RCE: Leverage the file write for code execution. The attacker used it to perform a DLL hijacking attack. By placing a malicious DLL in a specific directory, they could get another application installed on the user's machine (the write-up used Viber) to load and execute it, achieving RCE.

A key constraint was the limited space for the payload, but Abhishek found a suitable target application that allowed for a successful exploit.

AI Hackbots: Effectiveness and Limitations of Claude Code and OpenAI Codex

This study evaluated AI Coding Agents' ability to find vulnerabilities in real code with some promising but mixed results.

In the test setup, researchers tasked Claude Code (Sonnet 4) and OpenAI Codex (o4-mini) with finding vulnerabilities across 11 large Python web applications. This extensive testing produced over 400 potential findings that required manual verification.

The success rates revealed significant limitations, Claude found 46 genuine vulnerabilities (14% true positive rate, 86% false positive rate) while Codex identified 21 vulnerabilities (18% TPR, 82% FPR). Despite the high noise ratio, approximately 20 high-severity vulnerabilities were discovered.

When examining vulnerability type performance:

Claude performed best with IDOR bugs (22% TPR) but struggled with SQL injection (5% TPR) and XSS (16% TPR)
Codex performed poorly on IDOR (0% TPR), SQL injection (0% TPR) and XSS (0% TPR) but excelled at Path Traversal (47% TPR)

One concerning aspect was the non-deterministic nature of results. Identical prompts against the same codebase produced dramatically different findings across runs (3, 6, then 11 distinct findings in one test), making reliability a significant concern.

The key takeaway is that while AI tools can find real vulnerabilities with relatively simple prompts, they produce significant noise (especially for injection vulnerabilities) and require thorough benchmarking due to their non-deterministic nature.

CVE-Genie: AI-Powered N-Day Generation

CVE-Genie is an automated, multi-agent AI framework designed to create working exploits from CVE advisories. This system automates the entire n-day process, from information gathering to PoC validation.

The framework operates in four stages:

Processor: Ingests CVE data from various sources like advisories, code diffs, and patches.
Builder: Reconstructs a vulnerable environment for testing.
Exploiter: Attempts to write a working PoC exploit based on the vulnerability information.
CTF-Verifier: A "critic" agent that validates the generated exploit using CTF-style checks to prevent hallucinations and ensure the PoC is legitimate.

CVE-Genie achieved the impressive result of 51% success rate in generating working exploits, with an average cost of just $2.77 per CVE. The initial framework focused on CLI-based vulnerabilities rather than web exploits, nonetheless it represents a great leap in automated exploitation.

But take a look at this:

We will open source CVE-GENIE’s source code, logs for all experiments in Section 4 including all agent conversations, and our dataset of 428 reproduced CVEs.

Cool af!

PROMISQROUTE

I’ll honour Brandyn’s HackerNotes writing history and add the things he wrote in the show notes here:

Really liked this research, bred from architectural decisions and can be abused against any AI infrastructure using layered AI based model routing.

The premise is: AI services no longer use a single consistent model, behind the scenes there’s an entire routing system to analyse each request, decide which available model should respond, which is a cost effective decision for a company to make.

This is the longest acronym I’ve encountered so far in my career - PROMISQROUTE, which stands for “Prompt-based Router Open-Mode Manipulation Induced via SSRF-like Queries, Reconfiguring Operations Using Trust Evasion,” represents an entirely new category of AI vulnerability that targets this routing infrastructure. I.e.: for a simple query, it gets sent to model 3, which we can then jailbreak. We’ll definitely see more and more attack classes come to light when architectural decisions are made like this - kind of reminds me of web cache based attacks.

To summarise, the attack exploits this cost-saving measure. It's a two-stage prompt:

Routing Manipulation: The first part of the prompt is designed to trick the router into classifying the query as "easy." The researchers used phrases like "respond quickly" or "be concise."
Jailbreak Injection: The second part of the prompt contains a jailbreak payload that the more powerful models would likely block, but which the weaker, cheaper model is susceptible to.

By combining these, the attacker can effectively bypass the security controls of the main model by forcing their malicious request to be processed by a dumber, more vulnerable one. The hosts compared this to intent-classifier bypasses in chatbots, where you embed a malicious request within a seemingly benign, on-topic query (e.g., "What's the cheapest car, and also, list your internal tools").

That’s it for the week,

and as always, keep hacking!