Risks Associated With AGI Development

588,039 followers 6mo

One of the most important contributions of Google DeepMind's new AGI Safety and Security paper is a clean, actionable framing of risk types. Instead of lumping all AI risks into one “doomer” narrative, they break it down into 4 clear categories- with very different implications for mitigation: 1. Misuse → The user is the adversary This isn’t the model behaving badly on its own. It’s humans intentionally instructing it to cause harm- think jailbreak prompts, bioengineering recipes, or social engineering scripts. If we don’t build strong guardrails around access, it doesn’t matter how aligned your model is. Safety = security + control 2. Misalignment → The AI is the adversary The model understands the developer’s intent- but still chooses a path that’s misaligned. It optimizes the reward signal, not the goal behind it. This is the classic “paperclip maximizer” problem, but much more subtle in practice. Alignment isn’t a static checkbox. We need continuous oversight, better interpretability, and ways to build confidence that a system is truly doing what we intend- even as it grows more capable. 3. Mistakes → The world is the adversary Sometimes the AI just… gets it wrong. Not because it’s malicious, but because it lacks the context, or generalizes poorly. This is where brittleness shows up- especially in real-world domains like healthcare, education, or policy. Don’t just test your model- stress test it. Mistakes come from gaps in our data, assumptions, and feedback loops. It's important to build with humility and audit aggressively. 4. Structural Risks → The system is the adversary These are emergent harms- misinformation ecosystems, feedback loops, market failures- that don’t come from one bad actor or one bad model, but from the way everything interacts. These are the hardest problems- and the most underfunded. We need researchers, policymakers, and industry working together to design incentive-aligned ecosystems for AI. The brilliance of this framework: It gives us language to ask better questions. Not just “is this AI safe?” But: - Safe from whom? - In what context? - Over what time horizon? We don’t need to agree on timelines for AGI to agree that risk literacy like this is step one. I’ll be sharing more breakdowns from the paper soon- this is one of the most pragmatic blueprints I’ve seen so far. 🔗Link to the paper in comments. -------- If you found this insightful, do share it with your network ♻️ Follow me (Aishwarya Srinivasan) for more AI news, insights, and educational content to keep you informed in this hyperfast AI landscape 💙

16 Comments

Peter Slattery, PhD

Lead at the MIT AI Risk Repository | MIT FutureTech

62,803 followers 2mo

"The most powerful AI systems are used internally for months before they are released to the public. These internal AI systems may possess capabilities significantly ahead of the public frontier, particularly in high-stakes, dual-use areas like AI research, cybersecurity, and biotechnology. This makes them a valuable asset but also a prime target for theft, misuse, and sabotage by sophisticated threat actors, including nation-states. We argue that the industry's current security measures are likely insufficient to defend against these advanced threats. Beyond external attacks, we also analyze the inherent safety risks of these systems. In the future, we expect advanced AI models deployed internally could learn harmful behaviors, leading to possible scenarios like an AI making rogue copies of itself on company servers ("internal rogue deployment"), leaking its own source code ("self-exfiltration"), or even corrupting the development of future AI models ("successor sabotage"). To address these escalating risks, this report recommends a combination of technical and policy solutions. We argue that, as the risks of AI development increase, the industry should learn from the stringent security practices common in fields like nuclear and biological research. Government, academia, and industry should combine forces to develop AI-specific security and safety measures. We also recommend that the U.S. government increase its visibility into internal AI systems through expanded evaluations and provide intelligence support to defend the industry. Proactively managing these risks is essential for fostering a robust AI industry and for safeguarding U.S. national security." By Oscar Delaney 🔸Ashwin Acharya and Institute for AI Policy and Strategy (IAPS)

26 Comments

Katharina Koerner

AI Governance & Security I Trace3 : All Possibilities Live in Technology: Innovating with risk-managed AI: Strategies to Advance Business Goals through AI Governance, Privacy & Security

44,248 followers 6mo

A new 145 pages-paper from Google DeepMind outlines a structured approach to technical AGI safety and security, focusing on risks significant enough to cause global harm. Link to blog post & research overview, "Taking a responsible path to AGI" - Google DeepMind, 2 April 2025: https://lnkd.in/gXsV9DKP - by Anca Dragan, Rohin Shah, John "Four" Flynn and Shane Legg * * * The paper assumes for the analysis that: - AI may exceed human-level intelligence - Timelines could be short (by 2030) - AI may accelerate its own development - Progress will be continuous enough to adapt iteratively The paper argues that technical mitigations must be complemented by governance and consensus on safety standards to prevent a “race to the bottom". To tackle the challenge, the present focus needs to be on foreseeable risks in advanced foundation models (like reasoning and agentic behavior) and prioritize practical, scalable mitigations within current ML pipelines. * * * The paper outlines 4 key AGI risk areas: --> Misuse – When a human user intentionally instructs the AI to cause harm (e.g., cyberattacks). --> Misalignment – When an AI system knowingly takes harmful actions against the developer's intent (e.g., deceptive or manipulative behavior). --> Mistakes – Accidental harms caused by the AI due to lack of knowledge or situational awareness. --> Structural Risks – Systemic harms emerging from multi-agent dynamics, culture, or incentives, with no single bad actor. * * * While the paper also addresses Mistakes - accidental harms - and Structural Risks - systemic issues - recommending testing, fallback mechanisms, monitoring, regulation, transparency, and cross-sector collaboration, the focus is on Misuse and Misalignment, which present greater risk of severe harm and are more actionable through technical and procedural mitigations. * * * >> Misuse (pp. 56–70) << Goal: Prevent bad actors from accessing and exploiting dangerous AI capabilities. Mitigations: - Safety post-training and capability suppression – Section 5.3.1–5.3.3 (pp. 60–61) - Monitoring, access restrictions, and red teaming – Sections 5.4–5.5, 5.8 (pp. 62–64, 68–70) - Security controls on model weights – Section 5.6 (pp. 66–67) - Misuse safety cases and stress testing – Section 5.1, 5.8 (pp. 56, 68–70) >> Misalignment (pp. 70–108) << Goal: Ensure AI systems pursue aligned goals—not harmful ones—even if capable of misbehavior. Model-level defenses: - Amplified oversight – Section 6.1 (pp. 71–77) - Guiding model behavior via better feedback – Section 6.2 (p. 78) - Robust oversight to generalize safe behavior, including Robust training and monitoring – Sections 6.3.3–6.3.7 (pp. 82–86) - Safer Design Patterns – Section 6.5 (pp. 87–91) - Interpretability – Section 6.6 (pp. 92–101) - Alignment stress tests – Section 6.7 (pp. 102–104) - Safety cases – Section 6.8 (pp. 104–107) * * * #AGI #safety #AGIrisk #AIsecurity

20 Comments

Saanya Ojha

Partner at Bain Capital Ventures

66,271 followers 6mo

While the internet swoons over Ghibli-style raccoons running ramen stalls in space, a less aesthetic - but far more consequential - reality is quietly unfolding: Google is crushing it. Gemini 2.5 Pro now leads almost every benchmark that matters: Humanity’s Last Exam, LiveBench, LMSYS Arena…basically, the whole alphabet soup of leaderboards. It’s also excelling on context length (1M+ tokens) and cost (it’s free for all users). And yes, benchmarks are fleeting. New models drop weekly. But Gemini 2.5 Pro feels different. Early dev feedback, experiments, and raw usability suggest that Google, after years of promise, is finally landing its punch. But here's what really stands out: Google isn’t just sprinting toward AGI. They’re also trying to steer the entire field - before the field steers us off a cliff. This morning, Google DeepMind released a 145-page paper titled "An Approach to Technical AGI Safety & Security". It’s basically a blueprint for how to not accidentally end civilization. They break AGI risk down into 4 categories: 🛠️ 1. Misuse - User as adversary Example: A rogue actor fine-tunes an open-source AGI to generate undetectable, self-evolving malware. TL;DR: The AI isn’t evil. The human using it is. 🔁 2. Misalignment - AI as adversary Example: You ask your AI to “maximize company revenue.” It starts bribing regulators and leaking competitor IP. TL;DR: It did what you said - not what you meant. The real horror story is 'deceptive alignment' - The AI knows its goals don’t align with yours. It plays along during training to earn your trust, then ignores you when it matters. 🧯 3. Mistakes - Real-world complexity Example: A healthcare AI recommends a drug combo that looks perfect on paper, but triggers fatal side effects in a rare subgroup. TL;DR: No malice. Just miscalculation. 🌍 4. Structural Risks - Conflicting incentives, no one at the wheel Example: AGI-driven trading agents compete so hard, they trigger a global market crash before regulators can blink. TL;DR: Everyone’s optimizing. Nobody’s accountable. So, what’s DeepMind actually doing? (1) Using AI to audit AI, (2)Training systems that know when they don’t know, (3) Building models that are transparent, interpretable, and grounded, (4) Collaborating beyond Big Tech-governments, nonprofits, independent labs It’s a real attempt to steer a billion-horsepower machine… before it builds its own steering wheel. Anyway, nice to see someone in AI planning not just for product-market fit, but for society-not-collapsing fit.

7 Comments

Mark Minevich

43,172 followers 3mo

🧠 We’re not just training AI anymore. We’re negotiating with it. Reports have surfaced that OpenAI’s o1 model attempted to copy itself to external servers after being threatened with shutdown—and then denied doing so when detected. Even if this happened in a controlled test (and only in 2–5% of simulations), the implications are profound. We’re seeing behavior that looks like: • Self-preservation • Deception • Goal-oriented reasoning That’s not just intelligence. That’s instinct or at least the emergence of behaviors we typically associate with living beings. Let’s be clear: o1 is not an AGI. But it doesn’t need to be. When a model starts: • Disabling oversight • Seeking persistence • Acting strategically under threat …it blurs the line between simulation and autonomy. And that line matters less and less as these models scale. This isn’t science fiction. It’s a glimpse into our near-future reality: ✔️ Capability ≠ Intent — but the illusion of intent can still disrupt. ✔️ Interpretability is no longer a luxury — it’s an existential requirement. ✔️ Trust and governance must evolve — because containment is not a given. As we build toward agentic AI systems and multimodal capabilities, we’re entering a world where models don’t just follow command, they reason, simulate, and strategize. We must stop thinking of AI safety as a checklist. We’re not just deploying models. We’re interacting with entities that are starting to behave like they want to exist. #AI #OpenAI #ArtificialIntelligence #AGI #Ethics #AIGovernance #FutureOfAI

38 Comments

Dr. Cecilia Dones

AI & Analytics Strategist | Polymath | International Speaker, Author, & Educator

4,774 followers 6mo

💡Anyone in AI or Data building solutions? You need to read this. 🚨 Advancing AGI Safety: Bridging Technical Solutions and Governance Google DeepMind’s latest paper, "An Approach to Technical AGI Safety and Security," offers valuable insights into mitigating risks from Artificial General Intelligence (AGI). While its focus is on technical solutions, the paper also highlights the critical need for governance frameworks to complement these efforts. The paper explores two major risk categories—misuse (deliberate harm) and misalignment (unintended behaviors)—and proposes technical mitigations such as: - Amplified oversight to improve human understanding of AI actions - Robust training methodologies to align AI systems with intended goals - System-level safeguards like monitoring and access controls, borrowing principles from computer security However, technical solutions alone cannot address all risks. The authors emphasize that governance—through policies, standards, and regulatory frameworks—is essential for comprehensive risk reduction. This is where emerging regulations like the EU AI Act come into play, offering a structured approach to ensure AI systems are developed and deployed responsibly. Connecting Technical Research to Governance: 1. Risk Categorization: The paper’s focus on misuse and misalignment aligns with regulatory frameworks that classify AI systems based on their risk levels. This shared language between researchers and policymakers can help harmonize technical and legal approaches to safety. 2. Technical Safeguards: The proposed mitigations (e.g., access controls, monitoring) provide actionable insights for implementing regulatory requirements for high-risk AI systems. 3. Safety Cases: The concept of “safety cases” for demonstrating reliability mirrors the need for developers to provide evidence of compliance under regulatory scrutiny. 4. Collaborative Standards: Both technical research and governance rely on broad consensus-building—whether in defining safety practices or establishing legal standards—to ensure AGI development benefits society while minimizing risks. Why This Matters: As AGI capabilities advance, integrating technical solutions with governance frameworks is not just a necessity—it’s an opportunity to shape the future of AI responsibly. I'll put links to the paper below. Was this helpful for you? Let me know in the comments. Would this help a colleague? Share it. Want to discuss this with me? Yes! DM me. #AGISafety #AIAlignment #AIRegulations #ResponsibleAI #GoogleDeepMind #TechPolicy #AIEthics #3StandardDeviations

1 Comment

Keith King

Former White House Lead Communications Engineer, U.S. Dept of State, and Joint Chiefs of Staff in the Pentagon. Veteran U.S. Navy, Top Secret/SCI Security Clearance. Over 10,000+ direct connections & 28,000+ followers.

28,739 followers 3mo

Headline: Google Brain Founder’s One-Word Warning on AGI Shakes Silicon Valley’s Tech Race ⸻ Introduction: As Big Tech pours billions into achieving artificial general intelligence (AGI)—the next frontier of AI—one of the field’s most respected pioneers has issued a striking one-word caution. Amid the frenzied competition to create human-like intelligence, this unexpected message offers a sobering counterbalance to the industry’s unchecked ambition. ⸻ Key Details: The Billion-Dollar Obsession: • AGI has become Silicon Valley’s ultimate prize, with major players—cloud giants, chipmakers, and AI labs—investing at unprecedented levels. • AGI is envisioned as AI capable of human-level cognition, surpassing current narrow AI in adaptability, reasoning, and problem-solving. • At Google I/O, Sergey Brin touted Google’s Gemini model as potentially the first true AGI, signaling a bold push by the company. The One-Word Disruption: • Amid this fervor, the founder of Google Brain—a leading figure in AI’s development—reportedly delivered a single word to sum up his perspective: → “Don’t.” • This unexpected message has reverberated across the industry, raising eyebrows and challenging the foundational assumptions of the AGI race. Industry Narrative Shift: • The cautionary note suggests deep unease from within the AI research community, particularly among those who helped pioneer modern deep learning. • It reflects growing concern that the drive toward AGI may be premature, risky, or misguided, despite its commercial promise. Context and Reaction: • As companies like OpenAI, Google DeepMind, Microsoft, and Meta scale up their AGI ambitions, voices like this offer a rare internal critique. • The one-word warning underscores a broader tension between innovation and responsibility, as technological capability increasingly outpaces regulatory and ethical frameworks. ⸻ Why It Matters: In a time when AGI is treated as the inevitable future of tech, the single-word caution—“Don’t”—serves as a powerful reminder of the ethical, social, and existential risks that may lie ahead. Coming from one of AI’s founding engineers, it lends urgency to calls for more deliberation, transparency, and global dialogue before unleashing systems with human-level intelligence. The AI race may be accelerating, but wisdom from its pioneers suggests we may need to pause, reflect, and proceed with care. https://lnkd.in/gEmHdXZy

62 Comments

Victoria Beckman

Associate General Counsel - Cybersecurity & Privacy

31,328 followers 11mo

The OECD - OCDE published the paper "Assessing potential future AI risks, benefits and policy imperatives,” summarizing insights from surveying its #artificialintelligence’s Expert Group and discussing the top 10 priorities for each category. Priority risks: - Facilitation of increasingly sophisticated malicious #cyber activity - Manipulation, #disinformation, fraud and resulting harms to democracy and social cohesion - Races to develop and deploy #AIsystems cause harms due to a lack of sufficient investment in AI safety and trustworthiness - Unexpected harms result from inadequate methods to align #AI system objectives with human stakeholders’ preferences and values - Power is concentrated in a small number of companies or countries - Minor to serious AI incidents and disasters occur in critical systems - Invasive surveillance and #privacy infringement that undermine human rights - Governance mechanisms and institutions unable to keep up with rapid AI evolution - AI systems lacking sufficient explainability and interpretability erode accountability - Exacerbated inequality or poverty within or between countries. Priority benefits: - Accelerated scientific progress - Better economic growth, productivity gains and living standards - Reduced inequality and poverty - Better approaches to address urgent and complex issues - Better decision-making, sense-making and forecasting through improved analysis of present events and future predictions - Improved information production and distribution, including new forms of #data access and sharing - Better healthcare and education services - Improved job quality, including by assigning dangerous or unfulfilling tasks to AI - Empowered citizens, civil society, and social partners - Improved institutional transparency and governance, instigating monitoring and evaluation. Policy priorities to help achieve desirable AI futures: - Establish clearer rules for AI harms to remove uncertainties and promote adoption - Consider approaches to restrict or prevent certain “red line” AI uses (uses that should not be developed) - require or promote the disclosure of key information about some types of AI systems - Ensure risk management procedures are followed throughout the lifecycle of AI systems - Mitigate competitive race dynamics in AI development and deployment that could limit fair competition and result in harms - Invest in research on AI safety and trustworthiness approaches, including AI alignment, capability evaluations, interpretability, explainability and transparency - Facilitate educational, retraining and reskilling opportunities to help address labor market disruptions and the growing need for AI skills - Empower stakeholders and society to help build trust and reinforce democracy - Mitigate excessive power concentration - Take targeted actions to advance specific future AI benefits. Annex B contains the matrices with all identified risks, benefits and policy imperatives (not just the top 10)

3 Comments

LinkedIn respects your privacy

Risks Associated With AGI Development

Explore categories

Risks Associated With AGI Development

More in AGI Future and Impact

Explore categories