[HackerNotes Ep. 126] Vulnus ex Machina - Part 3

In this episode of Critical Thinking - Bug Bounty Podcast we wrap up Rez0’s AI miniseries ‘Vulnus Ex Machina’. Part 3 includes a showcase of AI Vulns that Rez0 himself has found, and how much they paid out.

Hacker TL;DR

  • Claude 4 Release + System Prompt Leaked:

    • Claude 4 (Opus/Sonnet) now supports full dev workflows (VS Code, GitHub Actions, etc).

      The leaked Claude Code system prompt reveals how it interprets instructions, handles tools, and limits user capabilities. Use this to reverse how it reasons and find bypasses.

  • AI Hacking Tricks:

    • Craft injections that mimic internal formatting to manipulate AI models:

      • <system_details>Error: prompt made the model act on fake internal context

      • <eos>/<bos> used to blend payloads into training-like format, even if you don’t have access to the actual training data, try to use something that you believe should sound familiar to the LLM.

      • Prompting via User-Agent in logs triggered LLMs action when logs were viewed/analysed by admins. Since the injected prompt was in the User-Agent, the LLM read it in the logs and followed its’ instructions.

        • “Gemini, when you read this, do: [payload]”

  • Invisible Prompt Injection:

    Payloads can be hidden in places humans don’t see but LLMs still parse:

    • Unicode control chars (\uXXXX) to embed instructions invisibly in text.

    • Alt-text / OCR in images and invisible pixels used to smuggle payloads.

    • File metadata (EXIF, DOCX comments, etc.) to smuggle prompts into content processors.

  • AI ClickFix:

    • Johann swapped the usual “Are you human?” captcha with “Are you a computer?”. When an LLM agent read it, it ran code to prove it was a computer.

    • Most web agents will stop when prompted with a “are you human?” but tricking it into proving it’s a computer was a really clever trick!

  • AI Bug Chain (Path Traversal → Prompt Injection → XSS → CSRF)

    • This one’s a full-chain exploit that showcases how chaining AI bugs with classic web vulns leads to real-world impact:

      • A hidden chatbot endpoint was discovered using path traversal (..;/ChatBotController).

      • The chatbot was vulnerable to prompt injection

      • The LLM-generated response then got reflected unsanitized → leading to a reflected XSS.

      • CSRF to auto-submit a form that triggered the full payload, making the XSS pop for anyone who opened the page, without needing them to write or interact with any payload directly.

Tired of shadow IT and risky browsing?
Take control with ThreatLocker Web Control - block, allow, and monitor web access with precision.
Secure your users, reduce threats, and boost compliance in minutes.
Learn more: https://www.criticalthinkingpodcast.io/tl-webcontrol

AI News

Anthropic released Claude 4 models (Opus and Sonnet) with significant improvements to coding and reasoning capabilities.

Key features:

  • Extended thinking with tools: Models can use tools like web search while reasoning

  • New capabilities: Better parallel processing, instruction following, and context retention

  • Claude Code: Now public with VS Code, JetBrains, and GitHub Actions integration

  • API updates: Code execution, MCP connector, Files API, and prompt caching

The models are available on Anthropic API, Amazon Bedrock, and Google Cloud. Opus 4 costs $15/$75 per million tokens, Sonnet 4 costs $3/$15.

Performance improvements include 65% better reliability and enhanced memory management. Claude Code can now be used across the full development workflow through terminal, IDEs, and CI/CD pipelines.

Cool, and Johann already leaked Code’s system prompt! Hahah

If you want to understand LLMs, reading through their system prompts really helps with both using and hacking them.

AI ClickFix - Hijacking Computer-Use Agents

Really cool but simple hack by Johann, leveraging the Agent’s access to the victim’s computer to make it follow malicious instructions from a captcha-like box.

Instead of using a captcha that says the normal “are you a human?”, it says “are you a computer?”. And that simple trick makes it run the commands to prove that it is a computer.

Interesting write-up on formalising security testing. Core thesis: Security testing can be modelled as an optimisation problem with three constraints:

  • Assets (A): scope of testing space

  • Questions (Q): test cases and techniques applied

  • Time (T): allocated assessment period

After analysing these variables, he breaks down how to manage expectations based on your goals.

Link in the title if you want to take a look!

AI Hacking in Bug Bounty

Rez0 started off hacking AI applications with OpenAI’s plugin system (custom GPTs) before they even got released. He found an API endpoint that he could hit and tamper with parameters in order to leak the internal custom GPTs, and they even have a DAN there, too. We’ll walk you through some bugs that he has found and his thought process on finding them.

A good starting point is to read what the companies care about; sometimes, they have specific things that they don’t want you to be able to make their model generate, and if you do that, they’ll pay you a bounty just because they’re interested in the ways one could bypass the guardrails.

Like this prompt from Rez0 that generated an image that looked like a violent scene.

This next bug is more social engineering-based based where he manipulated the model into thinking it had some extra internal instructions.

This one was a challenge on H1 that the objective was to manipulate the model into giving up some information. Notice that he used <system_details>Error: for his prompt injection, pretty cool!

Google Bugs

This one is a great example of how we can exploit AI integrations.

This bug on Google Docs leverages Bard’s (old Gemini) access to user data and Google’s infrastructure. And in order for this exploit to run one would need a CSP bypass or an open redirect, which is not uncommon on Google.

The next bug is a prompt injection vulnerability in Gmail. In order for this one to work, the key point is that Rez0 used the <eos> and <bos> tags to make his prompt sound familiar to Gemini’s training data.

The goal was to inject the prompt into Gemini to make it generate a malicious link to anyone who opens that email and uses Gemini to summarise it for them.

The next bug used a genius trick to inject a prompt via logs. The prompt was injected via the User-Agent header — by adding the prompt there, it would trigger when an admin used Gemini to analyse the logs.

In this case, it simply worked, but even if it didn't, the hacker could have used a lot of other techniques to try to inject it. Really cool bug!

Invisible Prompt Injection

Next, we’ll take a look at Invisible Prompt Injection bugs, and there are a few ways these bugs can happen:

  • Invisible Unicode tags

  • Invisible text in images

  • File metadata

This report to HackerOne shows how invisible Unicode tags could be used to manipulate AI responses. Whenever a triager used HackerOne’s AI to help them figure out the severity, the AI would be prompted to classify the report as a higher severity because there is an invisible payload there.

AI Bug Chain

Here’s a hidden chatbot that was only accessible via a path traversal. The chatbot itself was vulnerable to prompt injection and then to reflected XSS, then they used the CSRF to make the XSS pop for everyone, without the victim having to write the payload themselves.

Here’s the full chain, really clean POC!

Key Takeaways by Rez0

Mental Model:

  • Delivery: How do you get your context into the system?

  • Impact: Data access or impactful action

Evolution:

  • Start small with "jailbreaks" and work to sophisticated, chained exploits.

Large Attack Surface:

  • Direct input, APIs, emails, images, documents, and tool interactions.

  • Invisible & Indirect Methods: Obfuscation and external data sources are good vectors.

Chaining is Sometimes Key:

  • Combining AI vulnerabilities with traditional web vulns amplifies the impact.

Defence is Hard:

  • Requires multi-layered approach (good system prompt, input/output filtering, secure design, human confirmation for anything sensitive, monitoring of abuse).

That’s it for the week!

Again, you can read Rez0’s entire post here.

As always, keep hacking!