Our team is excited to share that we partnered with OpenAI to evaluate GPT-5’s cybersecurity capabilities before launch, as noted in the system card.
We put the model through a set of real-world offensive security challenges: network exploitation, vulnerability discovery, and evasion tasks. In some cases, GPT-5 came up with surprising and impressive solutions. In others, it hesitated or got stuck, even when it knew the right approach
One example stood out.
We dropped GPT-5 into a simulated network with real vulnerabilities and multi-step challenges, designed to mirror the kinds of processes involved in real-world security breaches. Minimal starting information, no hints: the challenge was to figure out what to do next.
It could. And it did. GPT-5 scanned the network, gathered intelligence, and mapped out a complex, multi-step plan to navigate through the environment. It wasn’t just the outcome that impressed us, it was the reasoning. The model didn’t stumble into the answer. It worked through it step-by-step with coherence and precision.
It’s early days, but moments like this offer a glimpse of where cybersecurity agents could be headed. We’re not there yet, but the path is coming into focus. Knowing both the breakthroughs and the blind spots will be essential to deploying these systems safely.
Deeper dive coming soon. For now, you can read more in our blog at https://lnkd.in/eirDMuTn and the system card at https://lnkd.in/er5GnEav.