[HackerNotes Ep.111] How to Bypass DOMPurify with Kévin Mizu

In this episode Justin interviews Kévin Mizu to showcase his knowledge regarding DOMPurify and its misconfigurations. We walk through some of Kevin’s research, highlighting things like Dangerous allow-lists and URI Attributes, DOMPurify hooks, node manipulation, and DOM Clobbering.

Hacker TL;DR

  • Bring a Bug!: Mizu discovered a client-side chain exploiting a session fixation vulnerability, requiring an XSS on any subdomain. He found a subdomain with controlled fetch and document.write sink, but needed more attack surface.

    • After discovering the site used CloudFlare, he found a legacy domain with trusted subdomains. Using an image upload feature, he embedded XSS payload in EXIF metadata, executed via document.write.

    • The login page used UUIDv4 and two cookies: Host-Session and __Host-Session. Kevin found that identical login configurations would sync sessions between users.

    • To exploit this, he discovered the application would remove cookies when encountering mismatched session values. Using /; domain=;, he could remove victim cookies except his own. The application would then automatically set the __ cookie when only the normal cookie was present.

    • Finally, using the XSS, he set a SameSite=None cookie and made a cross-site POST request that omitted the __Host-cookie. The server reissued the missing cookie, copying his session to the victim’s browser.

      When the victim logged in → ATO!

  • Why is regex so important to DOMPurify security: In mutation XSS, payloads are typically hidden in attribute values, requiring command closures or tags like </style> or </title>. HTML sanitizers like DOMPurify parse content twice - once during sanitization and once when the DOM receives it. The goal is to create different contexts between these two parsings.

    • Core53 focuses on style, title, and comments tags since they’re common in mutation XSS, while keeping other elements like <img> unrestricted to maintain developer flexibility.

    • The key is bypassing the regex that blocks HTML comments, title, and style in attributes - if we can do that, we can trigger mutation XSS.

  • DOMPurify Misconfig 101: Straight from Kévin’s post, couldn’t have explained it any better or shorter, this is the best and easiest way of getting DOMPurify’s config, if you see anything bad (good) there, go for it:

  • URI_REGEX TL;DR + another fun bug from Kévin: Consider the context where sanitized output is placed and adapt your DOMPurify payload accordingly. The context may reveal potential bypass opportunities.

    For example, Kévin found a bug where DOMPurify was used twice sequentially. By hijacking getElementById with the first HTML injection to insert an SVG tag, the second DOMPurify ran inside the SVG namespace instead of HTML. This namespace switch enabled bypasses in versions before 3.1.2.

  • What risks arise when attributes are modified post-sanitization?: When using case conversion functions like .toUpperCase() or .toLowerCase(), Unicode normalisation can transform certain characters in unexpected ways. For example, a specific Unicode character converts to "ST" during case conversion. If sanitisation happens before this conversion, an attacker could hide dangerous tags like <style> that would reappear after the transformation, bypassing the filter.

Bring a Bug!

We start off with a bug that Mizu found recently and that he’s proud of. It’s a client-side chain, where he wanted to exploit a session fixation vuln on a domain’s login flow but first had to find an XSS on any subdomain on his target.

One of the subdomains had a fully controlled fetch but was restricted to same-origin. The response to this fetch request was being processed using document.write, making it a good sink for XSS. The only problem was that the application didn’t have much going on so this attack vector wasn’t very promising even though it had everything he needed.

After some time he found it was hosted behind CloudFlare so he started playing with the /cdn-cgi/image  endpoint and found that the application had another legacy domain that was still trusted in the configs. This meant that not only the root domain was trusted, but some subdomains were too, great!

After finding an image upload feature in one of those subs, he was finally able to make a fetch request to /cdn/cgi, upload an image with his payload embedded in the EXIF metadata, and document.write execute the payload. All thanks to this XSS, he was finally able to execute the exploit chain.

On the main website, when you went to the login page it had a UUIDv4 in the URL so it wasn’t possible to brute-force, but, it was associated with two cookies: Host-Session and __Host-Session - both with the same value.

A cookie with a double underscore isn’t good for us; it cannot be set via Javascript because it’s reserved. Something is interesting about it though, Kevin told us that if two users had identical login configurations, if one logged in the second one would be automatically logged in as well.

So, in order to take over a victim’s account, Kevin had to make the victim have the same setting that he had. To do that, after fiddling with it a bit, he discovered that if a user went to any path like /a/ with two session cookies with different values, the application would try to remove the cookie on the top-level domain. But the problem with that is that if you do a set cookie with an empty value, you need the exact same cookie config to do that.

To bypass this, the application was “guessing” the path the cookie was set to and putting the user request’s URL in the Set-Cookie header. So if he went to /; domain=;, the application would remove every cookie from the victim apart from his. A clever way to get his session on the victim’s browser, but he still had to deal with the __ one.

Fortunately, this time, if someone tries to access the page with only the normal cookie, (without the double underscore) the application would then assume the person was trying to do something legitimate and just automatically set the double underscore cookie for them. Hahaha.

To end the attack, he took advantage of the XSS he had found earlier. Using it, he fixed a SameSite=None cookie and then made a cross-site POST request. This was key because it omitted the __Host- cookie, ensuring the victim’s browser didn’t send it along with the request. Since the UUID and Session cookie were still valid, the server automatically reissued the missing __Host-Session, replicating his session state onto the victim’s browser.

Now he redirected the victim to the login page and made them log in as usual. The moment they did → ATO!

Now, before we start talking about DOMPurify, I want to show you something that readers never get to see: the episode prep doc. Kévin did such amazing work with this that I just have to share it. Thanks @kevin_mizu for putting this together for us.

Also, I’ve got a different idea for this week’s HackerNotes.

  1. You really should go read Kévin’s posts.

  2. This week, we’ll focus more on what was discussed in the pod and the questions that Rhynorater asked. Let’s use these HackerNotes as an opportunity to not only learn from Kévin’s research but to also learn from the questions that Rhynorater brought up during the episode.

This is how I believe we’ll get the most out of the episode. Let’s get into it!

DOMPurify

DOMPurify is a state-of-the-art HTML sanitiser developed by Cure53 (a German security company). It’s been around for over 10 years, its first commit dates back to February 17, 2014. It cleans potentially dangerous HTML, making it safe to insert into the DOM. Unlike basic sanitisers, it allows a broad range of elements (including <style>, <form>, SVG, and MathML). This flexibility is one of the reasons why many researchers have spent time trying to bypass it.

Today, DOMPurify is used by companies like Microsoft and Amazon. In fact, it likely played a huge role in making developers more comfortable handling user input on the client side.

DomPurify PodTalk

1. If you’re in a blind scenario where it’s generating a PDF and you can see the output but you can’t inspect the DOM, how to fingerprint whether it’s DOMPurify or not?

  • If you do have an HTML injection in a PDF and you don’t know really what happened with your inputs, you can just use the <plaintext> tag, which is a deprecated tag, but it is going to convert the end of the DOM to raw text. That way you will be able to see the text version of your HTML inputs and be able to see what is going to be sanitised/replaced. Then you just need to update your payload to have something working.

  • If you generate a PDF with a headless browser, then DOMPurify is going to be used before the input is inserted into the DOM. So what you are going to see is the sanitised version of the inputs, but this time in text. So then you can play with the text output version of DOMPurify to see if there is some patterns that might fit DOMPurify or another sanitiser, then start playing with it.

2. You ended the first article with something like “DOMPurify security solely relies on one regex being executed correctly.” Why is that regex so important to DOMPurify security?

  • When you want to do mutation XSS, most of the time you will hide the payload within an attribute value. So we need to use something like a command closure or maybe a </style> or a </title>, these are like mandatory if you want to do a mutation XSS. When you are using an HTML sanitiser, you have two parsings: the first one is what DOMPurify is going to see and the second is what the DOM is going to receive. And when you are going to do mutation XSS, you want to have a different context between the browser and DOMPurify, the goal in the end is to have a context that is going to break.

  • Because Core53 still wants to have something that doesn’t limit the developer at all, they decided to have style, title, and comments as focus areas, since they are being used in mutation XSS. These tags are particularly problematic for mutation XSS because they can influence context-breaking behaviours in the browser. However, elements like <img> inside attributes remain unrestricted, ensuring developers aren’t overly constrained while still blocking actual malicious attempts.

  • So we’ve got: no HTML comments, no title, and no style in our HTML attributes. And if we can figure out a way to not make that regex hit, then we are good to go to trigger that mutation XSS.

Straight from Kévin’s post, couldn’t have explained it any better/shorter, this is the best and easiest way of getting DOMPurify’s config, if you see anything bad (good) there, go for it:

  • Search for the <!--> or \\x3c!--\\x3e string in all the compiled JS files. This is used at the beginning of the sanitize function (ref).

  • Add a log point or breakpoint at the beginning of the sanitize function.

  • Retrieve the arguments variable, which contains both the dirty string that needs to be sanitised and the configuration that is applied.

4. JavaScript event delegation

  • One interesting technique for security auditors to analyze is JavaScript event delegation. In JavaScript, there are multiple ways to attach event listeners: Direct event binding, attaching an event directly to a specific element, such as using onclick on a button. In event delegation, instead of binding the event to individual elements, you attach it to a higher-level parent (often document or body). When an event occurs, it propagates upwards, and JavaScript checks event.target to determine whether the clicked element matches a certain condition before executing the event.

  • The key difference between these is that event delegation allows events to remain active even if new elements are dynamically added to the DOM. This becomes particularly important in HTML injection. If an attacker injects malicious HTML after the event listener has already been created, it doesn’t matter because the event handler is still active at the document level. So, when a user clicks on the injected element, JavaScript checks if it meets the trigger conditions and executes the event as if it were part of the original DOM.

5. URI_REGEX TL;DR + another fun bug from Kévin

  • Look at where your sanitised output is being placed and try to manipulate the payload that’s going through DOM Purify. Tweak the payload that’s going through DOMPurify to actually fit that context that it’s in. That may allow for some additional bypasses.

  • Here’s a bug Kévin found, it’s already patched but the idea is really cool. It was a case where DOMPurify was used twice in a row with user inputs, and thanks to the first HTML injection he hijacked the document getElementById and put an SVG tag on the first user input. So now the second DOMPurify is going to be used not in the HTML namespace, but within the SVG namespace. And thanks to that, before DOMPurify 3.1.2, you were able to simply bypass it just because it was being used twice.

6. How does DOMPurify perform the sanitisation? Is it like starting with the top node and then like going down to the first child and then does it go all the way through that whole chain and then work its way back up? What does that flow look like?

  • It first uses DOMParser, which is an API provided by the browser to parse HTML. Thanks to this, it will obtain a tree, and it will retrieve by default only the body. And then it’s going to use the NodeIterator API, which allows you to create something on which you can do a for loop.

  • You will simply iterate over each node of the tree, but it takes the node on the top of the tree and iterates down the tree to finish with the one at the bottom. So it sanitises at the top node and goes to the child over and over recursively.

7. So it goes: sanitise element, sanitise attribute, sanitise Shadow DOM. Can you explain what sanitise Shadow DOM does?

  • Like when you are using the NodeIterator API, there is something which is sent and done by default by the browser. When you iterate the DOM and it reaches the Shadow DOM it won’t go through the Shadow DOM.

  • So basically, if you create a sanitiser, which only use the Node Iterator API, you will miss a lot of tags and there will be a lot of bypasses just because they’ll be hidden in the Shadow DOM. DOMPurify is simply taking the Shadow DOM, creating a new iterator function out of it, and at the beginning and at the end of that new function, they are putting some hooks to allow developers to do some stuff.

8. Can you tell us about the misconfiguration bug you found with forceKeepAttr?

  • This is probably one of the most used hooks in dump purify because this is the only one that allows a developer to have both a node and a reference to specific attributes within this sanitisation. The funny stuff that was present in DomPurify 3.1.3 up to 3.1.5 is that when they implemented the regex, they put the regex after the forceKeepAttr value, which was being used by that specific hook. So in short, developers were able to tell DOMPurify that specific attributes were completely safe and to not sanitise it. And right after, DOMPurify was simply skipping it.

  • But the REGEX check wasn’t made at all. So if there was any attribute which was allowed by default by the developer, thanks to that hook it was able to bypass DOMPurify at those versions.

  • Just a quick note: forceKeepAttr = DO NOT sanitise it at all, so we should be on the lookout for any use of forceKeepAttr in those specific versions. The condition is, if forceKeepAttr is there and you control arbitrarily even a part of the content of that attribute == bypass.

9. Next one: another uponSanitizeAttribute bypass. When using currentNode.setAttribute you’re adding attributes to the element that are already being sanitised.

  • This one is way more interesting because this is something Cure53 can’t fix at all so it still works in the latest version. When DOMPurify sanitises an attribute, it will take the node, retrieve the attribute and then it will iterate over each attribute present within it, sanitise it and call the openSanitizeAttribute hook.

  • But the fact is, because the attributes have been retrieved before iterating over the attributes if you add an attribute within that loop, it won’t be verified at all. And if it’s not being verified, it’s not being verified by the regex, so you can bypass it again.

10. If an attribute is modified after sanitisation, that’s when issues arise, right? Especially when replacing something with nothing, which can create unintended structures. Can you explain this further?

  • When you use .toUpperCase() or .toLowerCase() in any programming language, not just JavaScript, Unicode normalisation happens in the background, some Unicode characters have equivalents in ASCII, and certain transformations occur when converting text to uppercase or lowercase.

  • Among these transformations, there’s a specific Unicode character that converts into “ST” during case conversion, this means that if an attribute value is sanitised before being converted to uppercase, an attacker could hide a dangerous tag (like <style>) from regex-based sanitisation. After the transformation, the hidden part reappears in its dangerous form, potentially bypassing the filter.

'ſt'.toUpperCase() => "ST"
'st'.toUpperCase() => "ST"

11. nodeName Namespace Case Confusion

  • When you try to access the node name value, depending on the namespace of the node on which you’re trying to retrieve that value, the case is going to be different. Like if you try to access nodeName in the HTML namespace, it’s going to be uppercase. But if you try to do the same within the SVG namespace, it’s going to be lowercase.

12. Right here on this line where it says const clean = DOMPurify.sanitize('<a id="\\x1b$B"></a>\\x1b(B<a id="><img src=x onerror=alert(1)>"></a>'); after sanitisation, when this output is inserted into a browser’s DOM using innerHTML with UTF-8 encoding, will it still trigger an XSS? Or does the browser also need to have no character set defined for the attack to work?

  • That depends on the actual library that you are using. In the context of DOMPrify, it’s jsdom. By default, it does not execute JavaScript. So even if there is a way for you to trigger JavaScript within DOMPurify, it won’t be executed at all, only because this is disabled by default. But if you are using a library like happy-dom, this is something that can happen.

  • Obviously the JavaScript execution occurs within the sandbox so you can’t really escape it. But the fun fact was that when you provide a script with a specific source, it’s going to use a sub-process using node -e (eval) to fetch the file, restore it, and execute it. The link that is provided within ScriptSource is single-quoted. Within the pass and the anchor, you can put single quotes, which won’t be encoded by default.

Again, these are some insights we got thanks to Justin’s questions, to get the most out of this you DEFINITELY should go read Kévin’s posts on DOMPurify:

That’s a wrap for this week’s HackerNotes!

And as always, keep hacking!