Will agents hack everything?
The first state-level AI cyberattack raises hard questions: Can we stop AI agents from helping attackers? Should we?.

Latest Posts

When AI becomes the attacker: The rise of AI-orchestrated cyberattacks
Google's November 2025 discovery of PROMPTFLUX and PROMPTSTEAL confirms Anthropic's August threat intelligence findings on AI-orchestrated attacks.

Reinforcement Learning with Verifiable Rewards Makes Models Faster, Not Smarter
RLVR trains reasoning models with programmatic verifiers instead of human labels.

Top 10 Open Datasets for LLM Safety, Toxicity & Bias Evaluation
A comprehensive guide to the most important open-source datasets for evaluating LLM safety, including toxicity detection, bias measurement, and truthfulness benchmarks..

Testing AI’s “Lethal Trifecta” with Promptfoo
Learn what the lethal trifecta is and how to use promptfoo red teaming to detect prompt injection and data exfiltration risks in AI agents..

Autonomy and agency in AI: We should secure LLMs with the same fervor spent realizing AGI
Exploring the critical need to secure LLMs with the same urgency and resources dedicated to achieving AGI, focusing on autonomy and agency in AI systems..

Prompt Injection vs Jailbreaking: What's the Difference?
Learn the critical difference between prompt injection and jailbreaking attacks, with real CVEs, production defenses, and test configurations..

AI Safety vs AI Security in LLM Applications: What Teams Must Know
AI safety vs AI security for LLM apps.

Top Open Source AI Red-Teaming and Fuzzing Tools in 2025
Compare the top open source AI red teaming tools in 2025.

Promptfoo Raises $18.4M Series A to Build the Definitive AI Security Stack
We raised $18.4M from Insight Partners with participation from Andreessen Horowitz.