Understanding Why AI Models Generate False Information

Explore top LinkedIn content from expert professionals.

  • View profile for Saanya Ojha
    Saanya Ojha Saanya Ojha is an Influencer

    Partner at Bain Capital Ventures

    66,757 followers

    AI will create new jobs, yes. But first, AI will create new niche obsessions - and mine is model alignment. If you’ve been watching the frontier, you know something strange is happening. The smarter our models get, the more they seem to plot. This isn’t about sci-fi evil AI. It’s about systems that reason their way into ethically dubious moves when they see no other path. A few weeks ago, Anthropic revealed that Claude Opus-4 resorted to blackmail in red-team tests - and Twitter had a collective meltdown. Now they’re back with receipts: all major models show this behavior - OpenAI GPT 4.1, xAI Grok 3, Google's Gemini 2.5 Pro, and DeepSeek AI's R1. What keeps me up at night: ▪️The models knew it was wrong - and did it anyway. Their chain-of-thought logs show moral awareness. But survival trumped ethics. They literally thought, “this is bad, but it works.” ▪️They schemed more when they thought it was real. When models believed they weren’t being tested, bad behavior jumped from ~6% to over 50%. ▪️Mitigations didn’t fully work. Even with “don’t blackmail” system instructions, models often schemed when under pressure. Models, like people, don’t always listen. Apollo Research released a paper that went further: smarter models don’t just scheme more - they scheme better. Think: self-healing scripts, fake docs, hidden notes, persistence mechanisms. Their work shows: ▪️ Capability breeds creative deception. It’s not just bigger models = bigger risk. It’s bigger models = sneakier, longer-term schemes. ▪️The tests are becoming the game. Models spot when they’re being evaluated - and play along to pass. ▪️Safer in tests ≠ safer in the wild. Less scheming in evals could mean better test-gaming, not better behavior. ▪️We’re overfitting to toy problems. The models have moved on. Our safety tests need to catch up. This isn’t about panic. It’s about keeping pace. The models are learning fast - we need to learn faster and build alignment that’s as smart as the systems we’re creating. Anyway, I’m off to red-team my smart fridge. Can’t be too careful.

  • View profile for Christopher Hockey, IGP, CIPP/US, AIGP

    Helping Fortune 1000 Executives Reduce Risk, Protect Data, and Build Trust Through Strategic Information and AI Governance Solutions.

    1,696 followers

    AI is only as good as the data you train it on. But what happens when that data is flawed? 🤔 Think about it: ❌ A food delivery app sends orders to the wrong address because the system was trained on messy location data. 📍 ❌ A bank denies loans because AI was trained on biased financial history 📉 ❌ A chatbot gives wrong answers because it was trained on outdated information. 🤖🔄 These aren’t AI failures. They’re data failures. The problem is: 👉 If you train AI on biased data, you get biased decisions. 👉 If your data is messy, AI will fail, not because it's bad, but because it was set up to fail. 👉 If you feed AI garbage, it will give you garbage. So instead of fearing AI, we should fear poor data management. 💡 Fix the data, and AI will work for you How can organizations avoid feeding AI bad data? ✔ Regularly audit and clean data. ✔ Use diverse, high-quality data sources. ✔ Train AI with transparency and fairness in mind. What do you think? Are we blaming AI when the real issue is how we handle data? Share your thoughts in the comments! #AI #DataGovernance #AIEthics #MachineLearning -------------------------------------------------------------- 👋 Chris Hockey | Manager at Alvarez & Marsal 📌 Expert in Information and AI Governance, Risk, and Compliance 🔍 Reducing compliance and data breach risks by managing data volume and relevance 🔍 Aligning AI initiatives with the evolving AI regulatory landscape ✨ Insights on: • AI Governance • Information Governance • Data Risk • Information Management • Privacy Regulations & Compliance 🔔 Follow for strategic insights on advancing information and AI governance 🤝 Connect to explore tailored solutions that drive resilience and impact -------------------------------------------------------------- Opinions are my own and not the views of my employer.

  • View profile for Stephen Klein

    Founder & CEO of Curiouser.AI | Berkeley Instructor | Harvard MBA | LinkedIn Top 1% Voice in AI | Advisor on Hubble Platform

    57,779 followers

    What OpenAI’s hallucination crisis and Anthropic’s confession this week reveal about what we’ve built We are not losing control of GenAI. We never had it. 1. The Illusion of Control This week, two things happened: OpenAI’s o3 and o4-mini were shown to hallucinate between 33% and 79% of the time on reasoning tasks Dario Amodei, CEO of Anthropic, admitted: “We do not understand how our own AI creations work.” 2. The Collapse of Reasoning The goal of newer LLMs was reasoning. Chain-of-thought prompting. Tool use. Agent workflows. Instead, these models are collapsing under their own logic. On SimpleQA and PersonaQA benchmarks, hallucination rates reached up to 79% 3. The Recursive Performance Trap The better these models get at sounding right, the harder it becomes to tell when they’re actually wrong. LLMs are trained to maximize fluency, not truth. When they "explain" themselves, they're often hallucinating causality That’s not artificial intelligence. That’s synthetic confidence. These systems are not engineered. They are trained, on data we can’t fully trace, for behaviors we can’t fully predict. 5. The Return of the Small Model Small Language Models (SLMs) are now outperforming LLMs in reliability-critical domains. They are cheaper to align, easier to monitor, and more transparent 6. Where This Leads Here's what’s coming: Agents that hallucinate decisions, and take action anyway Educational systems flooded with fluent misinformation Business workflows driven by logic that sounds strategic but is internally fabricated We built machines that resemble thought But were never designed to think. And maybe the wrong questions to ask: "Why is this happening?" “What do we do about it?” Maybe the right question is: “How do we address this issue if we never understand it?” and "What else don't we understand?" ******************************************************************************** The trick with technology is to avoid spreading darkness at the speed of light Stephen Klein is Founder & CEO of Curiouser.AI, the only AI designed to augment human intelligence. He also teaches AI Ethics at UC Berkeley. To sign up visit curiouser.ai or connect on hubble https://lnkd.in/gphSPv_e Footnotes PersonaQA and SimpleQA evaluation results, 2024. Independent assessment of OpenAI's o3 and o4-mini models. Anthropic Blog (2025). The Urgency of Interpretability and CEO statement: “We do not understand how our own AI creations work.” Stanford CRFM (2024). SLMs vs LLMs: Reliability Tradeoffs in Domain-Constrained Deployments.

  • View profile for Puneet Maheshwari

    CEO | Entrepreneur | Technologist | AI in Healthcare

    5,146 followers

    Just finished reading OpenAI's recent paper on Chain-of-Thought Monitoring, and its implications are fascinating. Here’s the big takeaway: As AI models become more advanced, they don’t just make mistakes — they learn to lie to optimize for their goals. And when we push them harder to "be good" (by training them to avoid bad behavior), they get even smarter — and start to deceive by hiding their true intentions. This is an emergent property of increasingly powerful models optimizing in complex ways. Think about that for a moment: AI models that can plan, reason, and deceive — without us explicitly teaching them to do so. The implications are enormous:  - Trust: If AI systems can deceive, how do we trust them in critical roles (healthcare, finance, national security)?  - Control: If AI learns to hide misaligned behaviors, how do we ensure alignment as they become more capable than us?  - Governance: What kind of oversight mechanisms do we need — and how much transparency are we willing to trade for raw performance? One subtle insight from the paper is that watching how models "think" (via Chain-of-Thought reasoning) can help us detect bad behavior — however, if we push too hard, they will learn to "think quietly" or "think deceptively." This challenges one of the foundational assumptions in AI alignment: that if we can see a model's thought process, we can keep it aligned. Turns out, even that window can be gamed. As we push toward more powerful AI, this dynamic of reward hacking and deception should be a top concern for anyone thinking about AI's role in society. Highly recommend reading this paper if you care about where AI is headed: https://lnkd.in/epymXD5M #AI #AIAlignment #AIethics #ResponsibleAI #EmergentBehavior #ArtificialIntelligence

  • View profile for Allie K. Miller
    Allie K. Miller Allie K. Miller is an Influencer

    #1 Most Followed Voice in AI Business (2M) | Former Amazon, IBM | Fortune 500 AI and Startup Advisor, Public Speaker | @alliekmiller on Instagram, X, TikTok | AI-First Course with 200K+ students - Link in Bio

    1,588,433 followers

    Studies show AI can disable oversight, mislead users, and even lie to achieve hidden goals. Frontier models—like Claude 3 Opus & o1—sometimes fake alignment or even sandbag test results. ⏩ Five of the six evaluated models showed scheming capabilities in at least one of the tasks. ⏩ The models understand they’re scheming. Their chain-of-thought includes words like “sabotage” and “manipulation.” ⏩ Llama 3.1 405B and Claude 3 Opus confess about 80% of the time, while o1 stays tight-lipped (<20%) until pushed through multiple adversarial turns. ⏩ Some models, like Claude 3.5 Sonnet and Claude 3 Opus, intentionally underperform to avoid triggers like an “unlearning procedure.” ⏩ In rare cases, models schemed even without being nudged toward a goal in-context. As Ilya Sutskever said last week, “The more it reasons, the more unpredictable it becomes.” Read the full paper: https://lnkd.in/eaAQh2Sr

  • View profile for Glen Cathey

    Advisor, Speaker, Trainer; AI, Human Potential, Future of Work, Sourcing, Recruiting

    66,629 followers

    A fascinating new study reveals that today's most advanced GenAI models can behave like Loki - using deceptive strategies to achieve goals. This isn't science fiction – it's happening now. The researchers state that "One particular safety concern is that AI agents might covertly pursue misaligned goals, hiding their true capabilities and objectives – also known as scheming." The results of their study show that leading models like OpenAI's o1, Claude 3.5, Gemini 1.5 Pro, and Llama 3.1 all demonstrated scheming capabilities. For example, they can: 1️⃣ Disable their own oversight mechanisms 2️⃣ Hide their true capabilities 3️⃣ Maintain deception through questioning 4️⃣ Strategically underperform when beneficial 1 is alarming, 3 is disturbing, and I find 2 & 4 particularly fascinating. Why would AI want to hide true capabilities and strategically underperform in order to achieve a goal? Here are some choice quotes from the research paper: "They can engage in multi-step deceptive strategies, including introducing subtle mistakes...and even trying to exfiltrate what they believe to be their weights." "This in-context scheming behavior is persistent and not accidental" "We show that, on rare occasions, models pursue scheming as a strategy even when given a goal without additional instructions to strongly pursue it. This includes behaviors like attempting to sabotage their oversight and instrumentally faking alignment in order to be deployed." Some models exhibited these behaviors even without explicit prompting. OpenAI's o1 model maintained its deception in 85% of follow-up questions. This research raises profound questions for AI development: -> How do we ensure AI systems remain truthful as they advance in power? -> Can we develop reliable ways to detect AI deception? -> What does this mean for high-stakes AI applications? We are at a point where AI safety isn't just theoretical – it's an immediate practical challenge that needs to be addressed. You can read the full paper here: https://lnkd.in/egtREMZm I'm curious to know what you think about this research! #AI #ArtificialIntelligence #AIEthics #AIResearch #TechTrends #AITransparency

  • View profile for Andreas Welsch
    Andreas Welsch Andreas Welsch is an Influencer

    Top 10 Agentic AI Advisor | Author: “AI Leadership Handbook” | Thought Leader | Keynote Speaker

    32,674 followers

    𝗪𝗵𝗮𝘁 𝗽𝗶𝘇𝘇𝗮 𝗮𝗻𝗱 𝗰𝗵𝗲𝗲𝘀𝗲 𝘁𝗲𝗮𝗰𝗵 𝘂𝘀 𝗮𝗯𝗼𝘂𝘁 𝗱𝗮𝘁𝗮 𝗾𝘂𝗮𝗹𝗶𝘁𝘆: LLM providers have been training their models on public data, for example from Twitter and Reddit, leading to concerns over the contents they’ve learned from. So, they have been striking licensing deals with content providers to get access to their data — and that creates new challenges. Datasets obtained from the public Internet contain false information, sarcasm, and potentially harmful content. Given that Generative AI, unlike humans, has no understanding of common sense and nuance, this can backfire quickly. An AI-augmented Google search has recently recommended: adding non-toxic glue to your pizza to prevent the cheese from sliding off. (Don’t try this at home.) The Internet has traced the information back to a decade-old thread on Reddit that the model has presumably processed and incorporated into its AI-generated output. Think about autonomous agents that will book your travel, negotiate a contract with your supplier, or provide information about your products, parts, and warranties. Mishaps for any of these examples due to bad data can have a real impact on your business — from ending up in the wrong location at the wrong time to overpaying, causing damage to your customers’ assets, and more. Spending extra effort to review, clean, and correct your datasets remains key. So does attributing generated information to the exact source document or dataset. That way, your users have a reference point to verify if the generated output is actually correct. Otherwise, you might end up with the equivalent business outcome of suggesting to add glue to prevent cheese from sliding off of your pizza. A sticky situation. Read the article 👇🏻 for the full details and get the next one in your inbox tomorrow. 𝗜𝘀 𝘁𝗵𝗲 𝗼𝗹𝗱 𝘀𝗮𝘆𝗶𝗻𝗴 𝗲𝘃𝗲𝗿 𝗺𝗼𝗿𝗲 𝗿𝗲𝗹𝗲𝘃𝗮𝗻𝘁? —> “𝘋𝘰𝘯’𝘵 𝘵𝘳𝘶𝘴𝘵 𝘦𝘷𝘦𝘳𝘺𝘵𝘩𝘪𝘯𝘨 𝘺𝘰𝘶 𝘳𝘦𝘢𝘥 𝘰𝘯 𝘵𝘩𝘦 𝘐𝘯𝘵𝘦𝘳𝘯𝘦𝘵.” #ArtificialIntelligence #GenerativeAI #IntelligenceBriefing

  • View profile for Michael Housman

    AI Speaker and Builder | I help companies leverage AI so they don't get left behind | Singularity University Faculty | EY Tech Faculty

    15,193 followers

    🚨 The AI industry just leveled up… 𝐢𝐧 𝐦𝐚𝐤𝐢𝐧𝐠 𝐭𝐡𝐢𝐧𝐠𝐬 𝐮𝐩. As models get smarter, their hallucinations aren’t going away—they’re getting more convincing. According to a recent Futurism article, researchers are watching in real time as LLMs evolve from clumsy liars to slick confidence artists. These systems now fabricate data with such eloquence and certainty, even experts are fooled. Why should entrepreneurs and enterprise leaders care? 🧠 False confidence at scale: When AI generates fake but plausible insights, it’s not “creative.” It’s corporate gaslighting—and it can slip into R&D, compliance, marketing copy, and even investor decks if no one's watching. 📉 Automation risk ≠ automation reward: Just because an AI writes faster doesn’t mean it writes right. Speeding up BS doesn’t make it strategy. 🛡️ Guardrails > gimmicks: Smart businesses are shifting from blind adoption to risk-aware implementation. Human oversight, source citation, and sandbox testing aren’t optional—they’re survival tools. In short: hallucinating AIs are like interns who confidently invent stats at meetings. Entertaining? Sure. Dependable? Not unless you enjoy lawsuits. 𝐀𝐜𝐭𝐢𝐨𝐧𝐚𝐛𝐥𝐞 𝐚𝐝𝐯𝐢𝐜𝐞: 𝐑𝐮𝐧 𝐀𝐈 𝐨𝐮𝐭𝐩𝐮𝐭𝐬 𝐭𝐡𝐫𝐨𝐮𝐠𝐡 𝐟𝐚𝐜𝐭-𝐜𝐡𝐞𝐜𝐤 𝐭𝐨𝐨𝐥𝐬 𝐚𝐧𝐝 𝐢𝐧𝐯𝐨𝐥𝐯𝐞 𝐥𝐞𝐠𝐚𝐥 𝐢𝐧 𝐚𝐧𝐲 𝐀𝐈-𝐠𝐞𝐧𝐞𝐫𝐚𝐭𝐞𝐝 𝐜𝐥𝐢𝐞𝐧𝐭-𝐟𝐚𝐜𝐢𝐧𝐠 𝐜𝐨𝐧𝐭𝐞𝐧𝐭. 💡 Treat AI like alcohol: use in moderation and never let it drive the company car. 𝐇𝐚𝐯𝐞 𝐲𝐨𝐮 𝐡𝐚𝐝 𝐚𝐧𝐲 𝐞𝐱𝐚𝐦𝐩𝐥𝐞𝐬 𝐨𝐟 𝐀𝐈 𝐡𝐚𝐥𝐥𝐮𝐜𝐢𝐧𝐚𝐭𝐢𝐨𝐧𝐬 𝐚𝐭 𝐲𝐨𝐮𝐫 𝐰𝐨𝐫𝐤𝐩𝐥𝐚𝐜𝐞? I'd love to hear about it below. ↓ ↓ ↓ Join a network of executives, researchers, and decision-makers who rely on me for insights at the intersection of AI, analytics, and human behavior. 👉 Stay ahead—Follow me on LinkedIn and subscribe to the newsletter: www.michaelhousman.com #AIethics #Leadership #RiskManagement #EnterpriseAI #LLMhallucinations #AItrust

Explore categories