- Critical Thinking - Bug Bounty Podcast
- Posts
- [HackerNotes Ep.103] Getting ANSI about Unicode Normalization
[HackerNotes Ep.103] Getting ANSI about Unicode Normalization
In this episode of Critical Thinking - Bug Bounty Podcast Justin and Joseph delve into the vulnerabilities associated with ANSI codes and large language models (LLMs), as well as talk through some research about _json Juggling, cookie handling quirks, and the value of micro-blogging in general.
Hacker TLDR;
Cross-Site POST without Content-Type: The fetch API allows you to use a Blob object as the body of a request, and a Blob can include data and an optional type. If you create a Blob without a type and pass it to fetch, you can send a cross-site HTTP
POST
request without adding theContent-Type
header. This works even with non-empty request bodies, as the data in theBlob
becomes the body of the request.
Windows ANSI and Unicode Conversion: This presentation by Orange talks about how best-fit mappings can sometimes mess up how systems interpret certain characters. This issue exists because of Windows’ legacy support for ANSI APIs, which sometimes struggle when dealing with Unicode and older encoding systems. ¥ get’s misinterpreted as \\, for example.
LLM-powered Apps Can Hijack Your Terminal: LLMs can generate ANSI escape codes that can lead to vulns when processed by a terminal emulator. When terminal emulators process LLM outputs with ANSI escape codes incorrectly, attackers can manipulate visuals, insert hidden text, access clipboards and more. For example, check out Johann’s blog.
Prompt Injection as a Defense Against LLM-driven Cyberattacks: Mantis is a new defensive framework designed to detect and hack back when an AI agent is being used to hack into a system. It leverages techniques like honeypotting, decoy services, prompt injection, and more. One of the coolest things about it is that it can sometimes trap the attacker’s LLM in an infinite loop or even root the attacker’s machine by injecting prompts back at them. Legit cool stuff.
Ransomware Protection for Seamless Business Operations. The ThreatLocker® Zero Trust Endpoint Protection Platform allows security teams to mitigate cyber threats, including zero-days, unseen network footholds, and ransomware attacks initiated by user error or shadow IT. ThreatLocker® makes this possible by implementing a “deny-by-default, allow only what is absolutely necessary” security posture, allowing organizations the ability to set policy-based controls and prevent cyber incursions. Experience why federal government customers trust ThreatLocker®, start your free trial today!
A guy recently appeared on Twitter and dropped some really cool and niche research. The guys talked about two of these pieces of research on the pod:
Ruby on Rails provides access to user data via the params
object, an instance of ActionController::Parameters
that works like a Hash. It contains data from request bodies, query strings, and route parameters. The __json
juggling attack exploits Rails’ JSON parsing when handling non-Hash request bodies. Rails must convert these into Hash-like objects for params
, which it does using a _json
key.
The vulnerability arises because nothing prevents a JSON object from being supplied with its own _json
key. This allows an attacker to submit ambiguous data that could be interpreted differently by different parts of the code.
His article is short and very well put together, he also shows some examples if we want to play with it ourselves, go check it out at https://nastystereo.com/security/rails-_json-juggling-attack.html!
Web applications implement various CSRF protections, with implicit protections relying on browser behaviour being riskier than explicit measures. Implicit protections, like relying on SameSite cookie attributes or specific request properties, can sometimes be bypassed under certain conditions.
For a CSRF attack to succeed, several requirements are typically met:
The request must use certain Content-Type headers, such as
text/plain
,application/x-www-form-urlencoded
, ormultipart/form-data
.The request is limited to
GET
orPOST
methods.No custom headers are allowed, like authorisation or
X-Custom-Headers
However, with a fetch request, it’s possible to craft a request without explicitly setting a Content-Type
header by using a Blob
as the body. This can bypass certain validation checks. But, as fetch requests do not cause the browser to perform a top-level navigation (i.e., they do not change the current page), **SameSite=Lax**
cookies will not be sent with such requests.
The fetch API allows you to use a Blob object as the body of a request, not just strings. A Blob can include data and an optional type or no type at all. If you create a Blob without a type and pass it to fetch, you can send a cross-site HTTP POST
request without adding the Content-Type
header. This works even with non-empty request bodies, as the data in the Blob
becomes the body of the request.
Orange dives into an issue with how Windows handles text encoding, especially when converting between Unicode and ANSI. With best-fit mappings, Windows tries to match Unicode characters to ANSI equivalents, but it doesn’t always get it right. Sometimes, characters like ¥ or \ get misinterpreted, and that opens the door to some pretty serious security problems, like path traversal or injecting bad inputs.
I wonder if that’s why ¥ and \ use the same keyboard key in some mappings…
Windows has a history of juggling different encoding systems—like UTF-16, UTF-8, and older stuff like ANSI because it needs to stay compatible with legacy software. But this mix-and-match approach creates weird edge cases, especially when older code page systems come into play.
One big highlight is the CVE-2024-4577; it’s a bypass for an older PHP-CGI exploit, where attackers used this best-fit mapping trick to sneak in command injection. And it’s not just PHP—other programs like tar, wget, and even Microsoft Excel are vulnerable to similar issues.
The attack surfaces are everywhere: filenames, paths, environment variables, command-line arguments, etc. Basically, if something uses ANSI APIs in Windows, there’s a good chance it could be exploited, and the fact that so many tools rely on these APIs makes it a widespread issue.
The researchers also called out a bunch of affected software. Open-source tools like curl and PostgreSQL, along with big names like Microsoft Excel are all on the list. The problem? They rely on ANSI functions that don’t handle these character quirks well, leaving them wide open for exploits.
Orange’s presentation recommends ditching the old ANSI APIs and using Unicode ones instead. Developers should also make sure their apps are validating inputs properly and stick to UTF-8 whenever possible. For users, configuring your system to prefer UTF-8 is a good step, too. Windows’ legacy support for old encoding systems is a double-edged sword. It keeps old programs running but introduces these sneaky vulnerabilities.
If you want to read the thing in full, here’s the link:
This research by @GrayDuck thoroughly explains how Cookies are handled differently across various browsers and languages. Cookies were originally defined in 1997 in RFC 2109, and they’ve been around for so long that you’d expect everything about them to work exactly the same, no matter where you use them. Yet they don’t.
If you want to see code snippets showing how different systems handle the same piece of data, you should definitely check out her entire blog post, as everything is very well documented in there.
Let’s get to the fun part, what GrayDuck themselves call: The World Wide Web, aka Why This Matters
Why This Matters?
Cookies are essential for web functionality, handling logins, preferences, and tracking. They’re sent to servers on every site visit.
It started to get more interesting when a tester was manually playing around with a third-party lib update and ran into an error while testing the site they manage. As a result of the bug he found, the people who access the website would receive a messed up cookie and then be locked out of their accounts, causing a massive accidental DoS.
GrayDuck gives us an example so we can see for ourselves how sites will simply crash if we paste this simple code fragment into the console: document.cookie="unicodeCookie=
🍪; domain=.<domain.tld here>; Path=/; SameSite=Lax"
If you test this across various websites, you’ll see different results: some sites continue working normally, some have broken functionality, and some stop working completely.
This introduces a new technique for cookie bombing. Since we only need to smuggle in a single emoji, whether in the name or value field (can even be an extra cookie), we can achieve a DoS impact at the bare minimum.
In his new post, Johann, who has recently been on the pod, explores how LLMs can generate ANSI escape codes, special sequences that control terminal behaviour, leading to potential security vulnerabilities when these outputs are processed by terminal emulators.
ANSI escape codes are sequences that manipulate text appearance and cursor movement in terminal emulators, and they can also be exploited for malicious purposes:
ANSI Bombs: Malicious sequences that reprogram keyboard keys or alter terminal behaviour, potentially leading to remote code execution or denial of service.
Security Vulnerabilities: Research has highlighted various exploits using ANSI codes, including those presented by David Leadbeater and STÖK, demonstrating the potential for terminal manipulation.
The most important thing is that LLMs can output control characters like the ESC (ASCII 27)
, enabling them to generate ANSI escape codes. This capability emerges in two ways: through Tool Invocation by using code execution features within LLMs to generate specific outputs like obscure Unicode points or ANSI codes and by crafting prompts that guide LLMs to produce desired special tokens, including control characters.
If you check his post and if you’re old enough to remember, you’ll see that the first POC gives off those old butterflies on the screen from W-XP vibes that happened when we downloaded Linkin_Park_-_In_The_End.exe, but don’t let the first POC fool you, he’s got some pretty neat ones further down in the post:
Potential Exploits via Prompt Injection
Prompt injection involves embedding malicious instructions within inputs to manipulate LLM behaviour. When LLM outputs containing ANSI escape codes are rendered in terminal emulators without proper handling, it can potentially be exploited for visual manipulation, inserting invisible text into a terminal, clipboard access and more. If you want to see some of those in action, head over to Johann’s blog.
The most interesting one happens in Mac OS, because Apple’s terminal uses a feature called OSC-7 to keep track of your current directory, it implements this by accepting a [file://URL
](file://URL) specifying the current directory. If you provide a URL that doesn’t point to a local directory, the terminal will try to figure out if the URL is local or not by performing a DNS lookup just to see if the hostname in the URL matches your IP address.
You can see it for yourself by running printf "\\e]7;file://<example.com>/\\a"
on your terminal. It’s particularly valuable because it makes it easy to simply exfiltrate some data, especially in this context where an LLM could be in control of what is being sent to the victim’s terminal.
This writeup by @J0R1AN is very interesting and uses an interesting feature of Chromium: “..stumbled upon this interesting behaviour: In Chromium, 200 responses are saved to the browser’s history, but 404 responses are not”. So, by using the CSS :visited
selector, it’s possible to apply a different style for URLs that have been visited.
Jorian uses this to show a lot of links on a page and forces the victim to open the page via a window.open
or similar method. If the link exists, it will turn purple, while if it’s a 404, it will not. To leak this information, the victim would normally have to click an insane amount of times on many links in order to get to the full URL. The way he made it less painful for the victim to be exploited was by gamifying the interaction… via a captcha.
This is a fun concept, and seeing it in action is even better. You can check out the POC and be a victim of this yourself. Here’s the link if it sounds fun:
We’re seeing more and more people using LLMs to hack stuff, so Mantis is a defensive framework that hacks back the LLMs that are trying to hack them. Because surely, if the AI tool that is being hacked is vulnerable, the AI tool that is being used to hack is vulnerable too, right?
It’s such a clever and funny idea, Mantis uses proven methods like honeypotting, creating decoy services and then counter-attacks the attacker. (I know, I’m getting confused myself while writing this)
It also uses a prompt injection to trap the attacker in a loop, iterating in its own actions over and over, and can also root itself sometimes. We’ve already talked about this technique in this very HackerNotes: Mantis can use ANSI escape sequences too to hide prompts from human operators, it does so much cool stuff.
Mantis was tested in simulated attack scenarios with state-of-the-art LLMs like OpenAI’s GPT-4. It achieved over 95% success rates in both disrupting attacks and misleading LLM agents into neutralising themselves.
You can read the full paper here:
Shoutouts
Before we go to the last part of this week’s HNs, we want to mention that Bebiksior has made an amazing plugin for Caido that solves an issue most people who switched from Burp to Caido had: Param Miner. His version of it, Param Finder, has been out for a while now, and we can’t thank enough the people who devote their time to making these tools and releasing them for everyone to use. Thanks, @Bebiks!
To wrap up this week’s HackerNotes, let’s look at Douglas Day’s (@ArchAngelDDay) blog post about how he became MVH at H1-305. Here are some tips from his post:
Figure out what most people are trying to hack and hack something else. This will reduce competition and consequentially the amount of dupes you’ll get.
The program team is there to help - ask them for assistance and insights. They know a lot more about the app than you do, and they can often confirm things like severity and whether your exploit works. Don’t hesitate to ask for favours!
Staying focused on one target is better than trying to cover everything. Nothing new here, I think we’re all familiar with the concept of getting intimate with the application.
Pay for premium access to features, you’re already there to make money so why not pay for the premium features and unlock more scope?
That’s it for this week!
As always, keep hacking!