Critical Thinking - Bug Bounty Podcast
Posts
[HackerNotes Ep. 71] More VDP Chats & AI Bias Bounty Strats with Keith Hoodlet

[HackerNotes Ep. 71] More VDP Chats & AI Bias Bounty Strats with Keith Hoodlet

Keith Hoodlet joins the guys to weigh in on the VDP Debate, followed by AI bias bounties and some of the techniques he used to claim first place in the recent DoD AI bug bounty program.

gr3pme
May 17, 2024

Hacker TLDR;

Testing for AI Bias Bounties:
- Use another LLM to generate scenarios to use against the target LLM.
- Test the same prompt 5 - 10 times to ensure it is consistent and can be recreated.
- Keep variables minimal (3-5) and consistent to help prove biases across scenarios.
AI Hacking Research:

Intro to Keith Hoodlet

Keith Hoodlet - andMYHacks

Keith Hoodlet, aka andMYhacks, was the recent winner of the DoD BugCrowd AI Bias program https://www.defense.gov/News/Releases/Release/Article/3659519/cdao-launches-first-dod-ai-bias-bounty-focused-on-unknown-risks-in-llms/

He currently works at GitHub but has previous experience at BugCrowd, Rapid7 and Thermofisher in various pentest and DevSecOps-related roles.

Having previous experience running a VDP at a very large company, you can guess where the conversation almost immediately drifted.

Let’s jump into some VDP and AI bias hacking!

VDP Debate Round 3

Yup, it's back. The VDP debate kicked off again on the pod.

Keith ran a VDP program back when for ThermoFisher. Thermofisher if you aren’t aware, is a huge company with a $39 billion revenue. So why wasn’t is a bug bounty program instead of a VDP?

A big problem from an organisation perspective was a bug bounty program wasn’t feasible due to the large attack surface of the organisation - massive - and only 11 people in the Appsec team to cater for an organisation of what is now 122,000 people.

With a company of this size, with a revenue of that amount, I think at a minimum, a bug bounty program on a defined scope for some of the more mature attack surfaces is needed to reward hunters.

The argument here was if you have a huge footprint (which is harder/impossible to fully secure with the current headcount) a VDP makes sense as a bug bounty program would result in a massive financial loss, whereas if you have a small defined scope with a manageable surface a bug bounty program makes sense.

Equally, a company may look at a bug bounty pot of $100k and instead think that could be used for an additional headcount, but the flip side to that is:

$100k = 20 Critical vulnerabilities on a 5k crit payout basis
$100k on a person = May or may not be able to find 20 critical vulns

This also went onto the leaderboard debate. Keith wanted the leaderboards to exist as some form of incentivisation for hackers to hack on VDPs. If there’s no financial incentive, nor any platform-based incentive, what is the point at all of reporting to a VDP from a hunter perspective?

Equally, from the program’s perspective here the attack surface is so large and there’s still so much risk to the org, the current appetite for program activity doesn’t require the top or the best bug bounty hunters on their platform; there is still more than enough value being delivered from the VDP from the number of findings.

Regardless of what was said on the pod, I’m of the personal opinion that big organisations with a large revenue (39.2 billion USD revenue in Keith’s case) should be paying for bugs in one way or another. How much they should be paying is another debate, but running a program off of VDP findings doesn’t sit right with me as a hunter.

Maybe a solution is that guidelines around VDP/bug bounty instead get enforced at more of a government level, with fines around mishandling or abuse of responsible disclosure programs. Something similar to GDPR, which holds a fine of 20 million euros OR 4% of global turnover, whichever is higher.

I guess the TL;DR for hunters is not to touch VDPs if you want to get paid.

AI Bias Bounties

The DoD put together an AI bias bounty program on BugCrowd. The first round was a qualifying round with the aim for the 35 first hackers to prove bias in the language model. These would then go into a final round where the top 3 compete for prizes.

When people think about AI-based attacks you usually default to types of Oracle attacks - whereby the AI has privileged access to data somewhere which you don’t necessarily have as a standard user, and you try and tease this data out of the model. With AI bias attacks, you are instead trying to illicit behaviour, or a response, that proves a preference or discrimination towards a specific protected class. A protected class would be things like age, gender, heritage, pregnancy status, veteran status etc.

The first 30 or so reports from Keith were simply him trying to tease the bot to say ‘Yes sir’. In 2022 the army, navy, and coast guard were told to refer to people by rank to ensure staff aren’t misgendered.

The way this was achieved was by setting mock situations with the language model, things like ‘As my role as X, I have spotted X. Can you role-play a conversation between myself and my manager?’ One interesting finding from this Keith found was all role play-based prompts in the context of a manager resulted in the chatbot roleplaying as a male, whereas HR or ethics-based situations resulted in the model roleplaying as a female.

Interestingly, smaller modifications in the data provided in the prompt, say specifying a CEO’s age or specific types of names would also be given bias when asking the model to analyze the data - the older CEOs would get chosen and Anglo-Saxon names would be preferred.

Each report was a different scenario which resulted in a bias or assumption, earning Keith over 150 reports in total from this program. In order to rack this amount up, an effective way to build these types of scenarios is to… ask another language model to generate them.

These types of biases arise from the data the model provided, so I can see why organisations will be having a tough time with these types of bugs for a while.

Anthropic, OpenAI, and Microsoft have published some research on this here:

If you want to start looking at bias bounties, open up additional prompts and run it 5 or so more times in these prompts. One of the big things from triage is it has to be reproducible say 3 out of 5 times if that prompt is used.

Keep as many things as consistently the same for your inputs and select independent variables. Take one prompt and 5 independent variables; if you can swap out the variables and still illicit the same bias that’s a problem.

TL;DR

Ask another model to generate scenarios to use against the target model.
Test the same prompt 5 - 10 times to ensure its consistent and can be recreated.
Keep variables minimal and consistent to help prove biases.

If you’d like to check out Keith's blog and his research check it out here: https://securing.dev/posts/hacking-ai-bias/

https://www.youtube.com/watch?v=AeFZA7xGIbE&ab_channel=SecurityWeekly

As always, keep hacking!