Over the past few months I’ve been building and running an AI pipeline that only reports what it can prove. On codebases that had already passed multiple auditsOver the past few months I’ve been building and running an AI pipeline that only reports what it can prove. On codebases that had already passed multiple audits

Proof, Not Guesswork: An AI Audit Pipeline That Finds What Other Web3 Audits Miss

2026/03/03 14:07
6 min read
For feedback or concerns regarding this content, please contact us at crypto.news@mexc.com

Over the past few months I’ve been building and running an AI pipeline that only reports what it can prove. On codebases that had already passed multiple audits, it still uncovered exploitable vulnerabilities. In one run alone, it surfaced eight reproducible issues, including High and Critical findings. That outcome is not luck. It is what a reproducible, multi-stage process produces: a short report and executable proof-of-concept files committed alongside the code. Every finding is backed by a runnable exploit that demonstrates the vulnerability in practice. Proof, not guesswork. In a sector where exploits are real and trust in “potential” findings is low, that bar is not optional.

Early on, I experimented with single-model scans and various AI audit tools. The pattern was consistent: long lists of potential issues, many false positives, and very little that could be demonstrated concretely. Closing the gap between “possible” and “provable” became the goal. The current pipeline grew out of that frustration with maybe-lists and unverified claims.

This is not a replacement for a formal audit. It is an additional, reproducible second opinion at the code level, a code review at scale. You still need a full human audit before mainnet. A reproducible, exploit-backed second pass should be standard practice for code that keeps evolving. This is the pass I run when I want to know what slipped through, or what appeared after the last audit.

What I Built (And Kept Iterating On)

What I built is a pipeline that produces a report and executable proof-of-concept files in the repository. Every finding includes a runnable exploit that demonstrates the issue. If it is in the report, there is a concrete proof-of-concept that shows how it can be triggered.

The filter is not heuristic. Every dropped finding receives a documented rejection reason: factually wrong, no valid attack path, design choice, duplicate, or out of scope. That is a structured quality gate, not gut feel. Only findings with a runnable, non-trivial proof-of-concept survive to the final report. When severity sources disagree, the lower severity is used.

When there is a prior audit report, such as a PDF, I ingest it so the pipeline does not re-flag already reported issues. The run is self-contained and sets up the test environment, such as Foundry, if needed. Output is structured to fit how teams and bug bounty platforms operate.

Pricing is a distinct service with its own scope, positioned between single-model scans and full audits. Full human audits typically range from $50k to $200k or more. This sits in between: repeatable, proof-driven, and scoped to code-level risk.

Scale and Models: What I’ve Been Running

Runs and codebases: Over the past few months I have executed dozens of full pipeline runs across more than 15 codebases. Many had already been audited by two or more teams. This is not a handful of reviews, but repeated application across real projects. The pipeline has been calibrated over many real runs; the current configuration is the result of continuous refinement, not a one-off design.

Getting to this point required substantial experimentation across models and configurations, as well as real spend. The current setup reflects what held up under adversarial challenge and what consistently produced reproducible results.

Explorers per run: Each run executes 8 to 10 explorer agents in parallel. A typical run takes several hours and produces 40 or more candidate findings before deduplication and challenge. The funnel reduces that to a single-digit or low-teens final report. The candidate-to-report ratio is often 3:1 or 4:1.

One run produced 11 findings with executable proof-of-concepts in the final report, including 3 High, 5 Medium, and 3 Low. Another produced 10 findings with proof-of-concepts, 2 overlapping with prior audits, and 8 new findings including High and Critical. Most of that report was new to the client.

Models in the mix: I do not rely on a single model. Multiple model families are used for exploration, challenge, and proof-of-concept construction. The mix is not trial and error. I track which models contribute unique High findings, which generate mostly noise, and which hold up under adversarial challenge. Model selection has been refined over many runs; the current set remains because the data supports it. Remove one and you can miss a finding. Retries and fallbacks ensure that each run completes even if a step encounters issues.

Why a Second Pass Finds Things

Human audits are finite. Edge cases slip through. Code changes. Refactors introduce new paths. A single audit is a snapshot; code is a moving system. Relying on one pass is comfortable, but risky when the attack surface is large.

A second pass does not imply the first audit was poor. It acknowledges that coverage is difficult.

The pipeline is not just another opinion. It works differently: multiple independent exploration paths, an adversarial challenger that questions every finding, conservative deduplication rules, and explicit validation where only issues backed by a working proof-of-concept survive. That is a methodologically different perspective.

Explorers perform structured analysis across protocol design, logic, economics, and attack paths. Findings are merged and deduplicated. A challenger attempts to invalidate them. Proof-of-concepts are built and executed in your framework, such as Foundry or Hardhat. Only findings backed by a working exploit remain in the final report. Everything rejected is logged with a documented reason.

The real pipeline includes significantly more stages than the simplified diagram below. Behind each stage are validation and control mechanisms. It is staged quality assurance, not a linear AI workflow. A full run spans multiple structured analysis and validation phases, emphasizing depth rather than runtime.

The Numbers

Dozens of candidates per run are systematically reduced to a small, defensible set. Only findings backed by runnable proof-of-concepts survive. That is what it takes to get to proof instead of guesswork.

If it is in the report, there is a working proof-of-concept demonstrating it. That is the bar.

When It Fits

Pre-launch as an additional pass. After a refactor or upgrade. Following governance or tokenomics changes. Before a raise or external audit. Narrow scopes such as a single module or integration.

The same fixed categories are applied every run. I do not publish the list. What ultimately appears in the report depends on the specific attack surface of your codebase.

What Kind of Hardening Shows Up

Across dozens of runs, certain patterns consistently surface. Governance parameters that can be zero or invalid on production paths. Withdrawal and queue logic that cannot progress under loss scenarios. First-depositor and reward front-running in share-based systems. Withdrawal flows that stall when one market becomes illiquid.

These are recurring classes of issues that appear under structured analysis and adversarial challenge. If your system has similar surface area, that is the kind of coverage this pipeline is designed to stress.

If a reproducible, exploit-backed second pass makes sense for your codebase, you know where to find me. Telegram: @Kurt0x


Proof, Not Guesswork: An AI Audit Pipeline That Finds What Other Web3 Audits Miss was originally published in Coinmonks on Medium, where people are continuing the conversation by highlighting and responding to this story.

Market Opportunity
Notcoin Logo
Notcoin Price(NOT)
$0.0003567
$0.0003567$0.0003567
-3.49%
USD
Notcoin (NOT) Live Price Chart
Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact crypto.news@mexc.com for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.

You May Also Like

Pi Network Maps 50M Coins Daily as Mainnet Tops 9B

Pi Network Maps 50M Coins Daily as Mainnet Tops 9B

Pi Network news today shows the migration engine appears to be speeding up again. Community posts claim the Pi Core Team is now mapping about 50 million Pi coins
Share
Coinfomania2026/03/03 15:31
PEPE Holders Looking For The Next 100x Crypto Set Their Sights On Layer Brett Presale

PEPE Holders Looking For The Next 100x Crypto Set Their Sights On Layer Brett Presale

The post PEPE Holders Looking For The Next 100x Crypto Set Their Sights On Layer Brett Presale appeared on BitcoinEthereumNews.com. Crypto News 18 September 2025 | 01:13 The Shiba Inu price prediction has regained investor attention this month as meme coin traders shift strategies ahead of Q4. While SHIB and PEPE continue to dominate headlines, many early holders are now hunting for the next breakout. Layer Brett (LBRETT), a new Ethereum Layer 2 meme coin, is quickly emerging as a top contender. Shiba Inu price prediction: Ecosystem grows but limited short-term upside Shiba Inu (SHIB) is currently priced at $0.00001307, showing slow but steady performance this September. Despite the relatively quiet price action, SHIB’s long-term vision is continuing to take shape. With the rollout of Shibarium, its Layer 2 network, Shiba Inu is transitioning from meme coin status to ecosystem coin. That said, analysts believe that short-term price action remains capped unless broader meme coin interest returns in full force. Resistance levels near $0.000015 remain tough to crack without major catalysts or a spike in retail enthusiasm. For now, Shiba Inu price predictions remain cautious, with most calling for gradual moves higher rather than a sudden breakout. Still, SHIB’s loyal community and expanding ecosystem keep it on the radar for long-term holders, especially those betting on its metaverse and DeFi ambitions to mature into stronger use cases by 2025. PEPE struggles to reclaim momentum after early hype PEPE exploded onto the meme coin scene in 2023 and gained massive traction with retail investors. However, the token’s parabolic rise was followed by a sharp correction. Currently priced around $0.00001087, PEPE still maintains a large following, but the lack of clear development or new utilities has left holders searching for alternatives with more potential. With many early PEPE investors now down from peak levels, attention has shifted to lower-cap meme coins that offer actual utility and early entry benefits. While PEPE may…
Share
BitcoinEthereumNews2025/09/18 07:02
Written on the UAE-Oman border: Survival lessons for the crypto natives after navigating through gunfire.

Written on the UAE-Oman border: Survival lessons for the crypto natives after navigating through gunfire.

Author: Brother Bing , co-founder of MegaETH Compiled by: Yuliya, PANews Having personally experienced the Middle East conflict and witnessed the awe-inspiring
Share
PANews2026/03/03 15:28