Critical Thinking - Bug Bounty Podcast
Posts
[HackerNotes Ep. 180] The Bug Bounty Maturity Framework with Steve Hernandez

[HackerNotes Ep. 180] The Bug Bounty Maturity Framework with Steve Hernandez

Steve Hernandez breaks down the Bug Bounty Maturity Framework

Aituglo ‎
June 25, 2026

Hacker TL;DR

Researcher Interface is the weakest pillar in 61% of programs. Response time and engagement, not scope or budget, is what decides whether researchers stay or walk.
33 programs assessed, zero reached Leading. Programs that are 1 to 2 years old scored higher than 3+ year veterans. Maturity is not the same as age.
Self-managed and hybrid programs beat managed-platform ones across the board. The real differentiator is operational ownership: whether the team actually talks to you.
AI crushed the cost of discovery to almost nothing. The bet: trust and responsiveness now matter more than bounty size, because volume is infinite but a program that treats you well is rare.

Giveaway

Steve wants more programs taking the assessment. Pick a program you think would benefit, drop the bugbountymaturity.com link on a valid report to them, screenshot it, and submit via this form : https://forms.ctbb.show/mdaz. Three winners get a one-month Critical Thinkers tier in the CTBB Discord.

Who's Steve Hernandez

Steve (@SteveHernandezM) came up through Hacker One, first working directly with programs (setting up policy pages, the storefront you as a hacker walk into) and later on the Hacker Success side dealing with researchers directly. He also ran sales and partnerships for the podcast for a stretch. His day job is now cloud runtime security, but his heart is still in bug bounty, and the Bug Bounty Maturity Framework (BBMF) is his attempt to give back to the ecosystem.

Why a Maturity Framework

Steve got tired of watching genuinely good programs quietly walk away from bug bounty because leadership could not see a return on investment. These orgs are not running programs for fun, they want security signal: themes, patterns, pervasive issues across their stack. The only way to surface that is to attract the one and only ICP (ideal customer profile) of a bug bounty program, the hacker. Without the hacker, you have nothing.

The framework is built as a self-assessment, a mirror and a map. You cannot chart a path forward if you do not know where you are. No public callouts, no shame, just a private way for a program to ask "where am I crushing it and where am I dropping the ball?" and walk away with actionable fixes inside a 90-day window. Crucially, Steve built it from the researcher's perspective, because that is the vantage point that actually predicts whether a program lives or dies.

Check out the CTBB Discord!

We do subs at $25, $10, and $5, premium subscribers get access to:

– Hackalongs: live bug bounty hacking on real programs, VODs available
– Live data streams, exploits, tools, scripts & un-redacted bug reports

You can also find some hacker swag here.

Need a Pentest? We just launched CTBB Pentests!

Hack full time? Check out the Full-Time Hunter’s Guild!

The Three Pillars

BBMF measures health across three pillars:

Researcher Interface (RI): The relational layer. Communication of expectations, response times, escalations, severity conversations. Everything about building the relationship with the hacker.
Operational Signal (OS): The operations layer. Do you have policies in place, do you abide by them, and (the part everyone forgets) do you abide by them consistently.
Asset Hygiene (AH): Have you prepared your house for guests? Test accounts, credentials, reachable and useful scope, rate limiting, banning. Steve's analogy: the more you care about the guests coming in, the more you dust, vacuum, and light the candle. Too many programs hand over a scope they never once tested for accessibility.

Programs land in one of five bands: Emerging → Developing → Established → Advanced → Leading. And here is the sharp design choice: Leading cannot be reached by averaging. A single weak pillar (a "constraint") drags your whole band down regardless of how strong you are elsewhere. You do not get to game it.

The most quoted line from the research: operations mature before relationships do. Engineers love to systemize, so OS gets built first. Asset hygiene needs coordination with dev teams (staging, credentials, SSRF callback servers), so it lags. And RI, the actual relationship, gets invested in last, because "great report, keep poking around" is not something most AppSec engineers think of as their job. Which is exactly why it is the thing that moves the needle most for us.

The 2026 Numbers

The overall mean across all three pillars was 3.06 out of 5, barely into Established. Honest, and a good sign the 33 programs graded themselves sincerely.

RI was the weakest pillar in 61% of programs. Across every maturity band, Researcher Interface was consistently dead last. OS and AH flip-flop for second place depending on the tier, but RI never leaves the floor.
Zero programs reached Leading. Only about 18% even made it to Advanced.
The jump from Established to Advanced comes down to one thing: can you keep executing consistently under load. A year ago, volume spiked after a talk or an incident and then settled. Now public programs are under heavy strain more or less permanently. Point a well-tuned hackbot at a program with some friends and you will find out fast whether they are Advanced or just Established.

The five lowest-scoring dimensions

Response SLAs & Engagement (mean 2.42): The absolute floor. 36% of programs scored a 1/5, a structural absence of any commitment to response clarity. Nobody scored 5/5. A big part of the bottleneck likely sits outside security: smart program design minimizes how much input you need from the actual engineers. If your security team knows the product and threat model well enough to pay a clear bug without routing it to devs, you win.
Continuous Improvement (OS): Do you have a governance loop that actually actions researcher feedback, not just collects it? Most orgs collect and never close the loop.
Program Ownership Clarity (RI, mean 2.97): When nobody internally owns the decision, you get all the downstream chaos: inconsistent severity, inconsistent payouts, inconsistent response times.
Reward Alignment (mean 3.06): Roughly half the time, program owners themselves feel the bounty table matches the effort required. Good self-awareness, and a hint that more programs might agree with your "this should be worth more" than they can openly say.
Payment Predictability (mean 3.09): Dead simple. Are you paying when you said you would pay?

The Age Paradox

Newer programs scored higher than older ones. One-to-two year programs averaged 3.77, three-plus year programs dropped to 3.28. Counterintuitive until you sit with it. Brand-new teams do not know what they do not know, so they grade themselves a touch generously, and they are riding the early energy, funding, and engagement of a fresh program that has found its stride. The first program manager carries everything in their head, they hit burnout around the 2 to 3 year mark, they leave, and the undocumented processes leave with them. Single point of failure. Get it out of your head, document it, train the people around you.

Self-Managed vs Managed Platform

Ownership model told a clean story:

Self-managed: 3.40 (small n, but well ahead)
Hybrid: 3.33
Managed Platform: 2.75 (lowest across every metric)

Steve was careful not to make this an indictment of platforms. The real signal is operational ownership. Hybrid means you use a platform for intake and tooling but you still respond in reports and stay involved, and hybrid scored significantly higher than fully offloaded programs. When the program is in the room with you, whether or not a platform is underneath, maturity jumps, especially on researcher engagement. The platforms want this involvement too. Managed-platform is where the platform is just picking up operational slack the program refused to carry.

AI Is Compressing the Curve

AI made discovery cheaper, faster, and higher volume. It is cheap enough now that bugs get fired off by a hackbot: curl the PoC, pipe it to Python, watch the data fall out, submit. The disconnect from your own pipeline gets real to the point of submitting the same bug twice when the dupe checker slips.

Programs got caught flat-footed. Daniel Stenberg of the curl project is the cleanest example: within a few months he went from flagging AI reports as low-quality slop to acknowledging they had become genuine, valuable signal, just arriving at a volume his team could barely keep up with. The volume is real, the signal is increasingly real, and severity classification for AI-assisted findings is becoming the new frontier of triage maturity. AI is not a brand-new playbook, it is the same old maturity gaps made impossible to ignore.

Trust vs Bounties

The contrarian take from this episode: under AI, trust will surpass bounties in importance. The cost of discovery dropped, so what you cannot manufacture for yourself is knowing a program will treat you respectfully and consistently. The counterweight is that bounties stay king, hackers like their bounties. Both can be true. If discovery is cheap, you can cast a wider net and still get a great result from the responsive programs, so responsiveness becomes the priority matrix, not just a nice-to-have.

The honest researcher's playbook for the "Program A pays huge but ghosts you vs Program B pays mid but is responsive" scenario: hack A until you find bugs, hack B in the meantime, and stay on B until A finally pays out and validates that the rewards table is real. Good RI is not just warm fuzzies, it lets you validate the company's threat model faster. Every triage and payout tells you you are on the right track with this IDOR.

Verification: The Third-Party Vision

The self-assessment helps program owners. It does nothing for us deciding where to spend our time. So Steve is building Verification: an independent, transparent standards body that audits whether a program's self-assessment claims actually hold up in practice. Real report samples, real conversations, real researcher experience, not vanity metrics like an automated first-reply timestamp (AI answering AI).

The dream is a public directory of verified programs so you can sort by maturity band and know before you invest that you will be treated well. Trust is the most expensive currency right now, and a credible badge from a body with no stake in any program or platform could genuinely shift where top researchers send their best findings. The funding-incentive problem (who pays without polluting the result) is unsolved, but the controls are being written and published openly so everyone knows exactly what they are being audited against.

When the first program gets verified, the call to action from the hosts is simple: go bury it in reports. Showing up on the list at all, wherever you land, is a positive signal that the program cares.

A Tip for Program Managers

If RI is your weak spot (and statistically it is), the highest-leverage move is brutally simple: be honest and transparent. Had staff turnover, budget cuts, a backlog? Say so. Send a note: "we are delayed on first response, triage, and time-to-bounty for the next 60 days, please have grace, we will get to you." Make sure your policy page matches reality. Researchers respect the programs that keep them in the loop (shout out to Adobe and the now-public Instacart program). And get a real human, not a bot account, to ask for feedback with an actual free-form box, then actually action what comes back.

Resources

Bug Bounty Maturity Framework - The framework, the self-assessment, and where Verification will live.
State of Bug Bounty Maturity Posture 2026 - Full report with the graphs, the age paradox, and the ownership-model breakdown.
AI Is Compressing the Bug Bounty Maturity Curve - The AI-pressure signals across all three pillars.
Take the Assessment - Free, 15 minutes
Steve Hernandez on X - Reach out, or email [email protected].

That's it for the week, keep hacking!