Claude 4.5 found 21 vulnerabilities in 15 minutes, but missed some

View profile for Aaron Ott

SVP, Application Security

TL;DR — I gave Claude 4.5 a Kali box and an intentionally vulnerable app. In 15 minutes it produced a report with 21 real vulnerabilities (SQLi, exposed .git, misconfigured cookies), but it missed obvious XSS and some business logic issues. In the post I walk through the setup, what worked, what didn’t, and where AI actually belongs in a security workflow — useful for early dev checks and teaching, but not a replacement for manual pen testing. Read the full write-up: https://lnkd.in/gY7MzupX #AIsecurity #PenTesting #Infosec #Claude45

  • Featured image — terminal listing ‘21 vulnerabilities’ on left and stylized neural brain on right; title: ‘AI + PenTesting — Claude 4.5 in the Wild’.
Víctor Mayoral Vilches

Cybersecurity and AI in Robotics

2w

Hey Aaron, good one! I’d be interested in what’s your take on our open source Cybersecurity AI (CAI) https://github.com/aliasrobotics/cai and the supporting Cybersecurity LLM “alias1” https://aliasrobotics.com/alias1.php, which is an alternative to Anthropics’ Claude. You will obtain same results for a fraction of the cost and without so many refusals. Let me know if you’d like to try it out, happy to facilitate it

Julio M.

26 years as a Cyber Security Systems Expert and Solutions Architect. C|EH, CND, GCIH, C|HFI, Sec+, Net+, Linux+, RHCSA. Expert on UEBA, SOAR, SIEM, GRC, BData Analytics , MITRE ATT&CK®, ZT. IAM and all cloud platforms.

3w

I agree — AI is essential for fast, scalable detection and routine containment, but the safest and most effective security posture outside a lab is a hybrid model: automate what’s repeatable and low-risk, and keep humans in the loop for high-impact judgment, oversight, and novel threats.

David C.

Freelance Cybersecurity Consultant | OSCP, CRTO, GICSP

2w

Would be interesting to do the same exact test with codex to see how it compares. But I've seen threat actors using HexStrike MCP to develop exploit for n-days so things can definitely get wild.

Clint Gibler

Sharing the latest cybersecurity research at tldrsec.com | Head of Security Research at Semgrep

1w

Aaron Ott Neat write-up, thanks for sharing! For your custom vulnerable app, I'm curious about the languages/web framework used, total lines of code, general complexity, etc. Also, I'm curious about the true positive/false positive rates on the app, and how consistently it finds the same bugs on subsequent scans. Regardless, cool work! :)

Saeid Atabaki

Founder & CEO @ ManticoreAI | AI-Driven Pen-Testing in Minutes | Red-Team Veteran | Cybersecurity Speaker

2w

Now you can run it against a modern app, with captcha on login form and whole app is under Cloudflare WAF and has vulnerabilites inside authenticated areas. Codex or Claude will do nothing, they are only good on DVWA..

Mauro Andreolini

"Conoscere per deliberare" (Luigi Einaudi)

2w

I also gave Claude 4.5 a vulnerable box. It told me it could not help me hack systems and did not answer my queries.

▪️ Qasim I.

Security Director in healthcare | OSCP, CRTP, CRTO, MBA

2w

Gonna be honest, I first went "Ughh, another AI post" then read your blog and went "Hmmm let me give it a try" I like it and see use of this. I specially like the fact that Claude is asking for my permission before running commands. I'm allow listing most commands for this specific directory but it's feeling quite safe compared to most AI agents that decide to redo whole project's code on a whim. I can at least see this being useful for standard checks during pentest before the human gets creative.

Ibai Castells

Senior Red Team Operator @ CovertSwarm | CRTL | CRTO | OSCP | CREST | SRT

2w

Interesting results, I've been playing with similar ideas recently and there's some really cool capability potential to unlock here. CyberAgent and CAI are some examples on GitHub of how others have been implementing this idea.

Dmitry .

Security Consultant | not ex-Google | Pentester | OSCP/OSWE | Open for work Worldwide

2w

The "intentionally vulnerable web app" was 100% custom made or DVWA/similar?

Like
Reply
Ankit Agrawal

Security Engineering Manager at Webflow

2w

Is there a possibility that Claude referenced existing solutions or write-ups from the internet on DVWA?

Like
Reply
See more comments

To view or add a comment, sign in

Explore content categories