[HackerNotes Ep. 162] HackerOne Training AI on Bug Bounty Data ?

We received the CTO of Hackerone to discuss the recent events about AI being fed by our Bug Bounty reports

Hacker TL;DR

  • HackerOne is not training LLMs on bug bounty reports, existing AI/ML usage is limited to spam classification and anonymized trend analysis

  • The PTAS agentic pentesting tool draws from public benchmarks, internal benchmarks, public CVEs, and opt-in sidecar runs, not researcher submissions

  • HackerOne's Terms of Service need updating to distinguish legacy ML from modern LLMs

  • HackerOne decreased bounty payouts across severity tiers

Join Justin at Zero Trust World in March!
Get $200 off registration with Code: ZTWCTBB26

In the News

Who's Alex Rice

Alex Rice is the CTO and co-founder of HackerOne, and he came here today to address some issues about the latest AI moves from HackerOne.

The Terms of Service Controversy

There were tensions in the bug bounty community in recent weeks after researchers looked more into HackerOne's Terms of Service. The most concerning clause reads:

"HackerOne may use confidential information to develop or improve its services, for example, to identify trends and to train AI models, provided such use does not result in disclosure of confidential information to unauthorized third parties."

For researchers who have spent years building proprietary exploitation techniques, documented inside full, unredacted vulnerability reports, it was a no-go.

But Alex told us that HackerOne is not training or fine-tuning LLMs on researcher submissions. The "train AI models" language in the ToS predates modern large language models and was originally written to cover traditional machine learning, such as the spam classification engine HackerOne. Every time a customer flags a report as spam, that signal feeds a regression analysis model that updates spam classifier scores for incoming reports. That is the extent of "AI training" on report data.

And he was aware that he was failing to update the terms as the definition of "AI" shifted from classical ML to generative models. HackerOne's technical documentation has been transparent about its actual data practices, but the legal terms lag behind, creating a gap with us.

So HackerOne plans to update its Terms of Service to explicitly mention legacy ML from LLM training.

What Happens to Anonymized Data

HackerOne does run an anonymization and aggregate trend analysis pipeline on report data, but it is not feeding LLMs. The primary output is the Hacker Powered Security Report, which powers benchmarking suites, trend analysis, and CVE frequency tracking. These are all pre-AI features that rely on traditional statistical analysis, not language models.

Researcher IP Ownership

A common misconception in the community is that submitting a report means surrendering intellectual property. Under Section 8 of HackerOne's community terms, researchers retain all IP rights. HackerOne receives a limited license solely for providing and improving its services. Customers receive a slightly more permissive license to use submissions for securing their own attack surface. So, neither license permits resale or redistribution of report data.

The real leakage vector tends to be downstream: customers share findings through threat intelligence feeds, CVE advisories, and remediation efforts, which eventually enter the common pool of knowledge.

PTAS and the Agentic Pentesting Platform

The second point centered on HackerOne's Pentest as a Service (PTAS) offering and its new agentic AI component. Marketing material that described agents "trained and refined using proprietary exploit intelligence informed by years of testing real enterprise systems" was quite alarming to us.

But he clarified the PTAS backstory: HackerOne launched its pentesting business in 2018 as a defense-in-depth play. Companies running checkbox compliance pentests were jumping into bug bounty programs unprepared. The PTAS was offering community-powered, led by roughly 200 pentesters, and was designed to assess maturity before programs go live.

Where the AI Gets Its Data

The agentic pentesting tool comes from five sources, and none of which are bug bounty submissions:

  1. Public benchmarks — Standard vulnerability benchmarks to validate baseline capability

  2. Internal benchmarks — Custom vulnerable applications built with help from pentesting community members

  3. Public CVEs and disclosures — The only area with potential overlap, since many CVEs originate as bug bounty submissions

  4. Sidecar runs — Parallel testing engagements where both the pen tester and customer have explicitly opted in to run the AI agent alongside the human tester

  5. Foundation models — The bulk of capability comes from frontier AI models, not proprietary fine-tuning

The distinction between PTAS and bug bounty is important here. Pentesting operates under a different engagement model with clear consent and opt-in for data usage. Bug bounty reports remain untouched by the agentic system.

We do subs at $25, $10, and $5, premium subscribers get access to:

Hackalongs: live bug bounty hacking on real programs, VODs available
Live data streams, exploits, tools, scripts & un-redacted bug reports

Do Fine-Tuned Models Even Help?

Justin and Joseph tried using fine-tuned security models and they have consistently underperformed frontier general-purpose models in their testing. The ability to hack correlates heavily with the ability to code, and massive general-purpose training datasets produce better coders than any domain-specific fine-tune.

Even if HackerOne were to fine-tune on every report in the platform, the output would likely underperform against the latest frontier model. Add to that the signal-to-noise problem: the percentage of truly high-quality submissions across any platform is low. The data quality problem alone makes this approach uneconomical.

The Bounty Decrease

Separate from the AI debate, HackerOne reduced bounty payouts across its own bug bounty program.

While low and medium decreases are understandable, cutting high and critical payouts sends a negative signal from the platform that should be leading the industry. When a bug bounty platform decreases its own payouts, it creates a permission structure for every other program to follow suit.

And Alex noted the feedback, and the team is actively considering:

  • An "exceptional" tier with multipliers for bugs that access other researchers' vulnerability submissions

  • Bifurcated critical definitions to properly incentivize the highest-impact findings

  • Ongoing monitoring of engagement metrics to adjust if participation drops

Reporting Potential Data Leaks

Also HackerOne is committed to establishing a dedicated tip line for researchers who suspect their techniques have been leaked, whether through AI models, insider activity, or customer disclosure. Currently, researchers can use the mediation button on specific reports. And a purpose-built channel is in the works.

That's it for the week, keep hacking!

Resources