How Vibe Data Modelling can augment AI-driven data modelling

View organization page for Modern Data 101

12,232 followers

1mo

𝐀𝐈 𝐰𝐨𝐧’𝐭 𝐦𝐚𝐠𝐢𝐜𝐚𝐥𝐥𝐲 𝐟𝐢𝐱 𝐛𝐫𝐨𝐤𝐞𝐧 𝐝𝐚𝐭𝐚 𝐦𝐨𝐝𝐞𝐥𝐬. But if we get the foundations right, we can supercharge how LLMs help us model, document, and evolve our data ecosystems. Over the years, we’ve gone from Kimball and Inmon to dbt and the Modern Data Stack. But too often, we obsess over tools while neglecting fundamentals. As AI and agent-driven IDEs become mainstream, Alejandro Aboy highlights the real opportunity isn’t replacement, but augmentation. This is where 𝐕𝐢𝐛𝐞 𝐃𝐚𝐭𝐚 𝐌𝐨𝐝𝐞𝐥𝐥𝐢𝐧𝐠 comes in: an evolving vision for blending data modelling discipline with AI-native workflows. 📌 Key Takeaways ✅ Data Modelling still matters → It’s the semantic backbone for engineers, analysts, and governance. ✅ Two modes exist: Constructive (adding features, dependencies, docs) and Destructive (removing models, cleaning up). Both are complex, even before AI. ✅ Challenges for AI/Agents → Context limits, missing catalogs, outdated docs, and technical debt can cripple LLM-driven modelling. ✅ Readiness checklist → Solid foundations + well-documented schemas (Star Schema, OBT, or Normalisation depending on trade-offs). 💡 𝐓𝐡𝐞 𝐕𝐢𝐛𝐞 𝐕𝐢𝐬𝐢𝐨𝐧 Level 1: Extend your AI IDE Level 2: Add MCP Power Level 3: More Tools ⬇️ Access the detailed guide on Modern Data 101's latest substack in collaboration with Alejandro Aboy: https://lnkd.in/dTeEgkew The future of data modelling isn’t about AI replacing us but about AI-augmented modelling that reduces technical debt and accelerates business value. What’s your take: Are we ready for Vibe Data Modelling, or are we still stuck in tool-first thinking? #DataModelling #DataStrategy

1 Comment

Modern Data 101

1mo

More from Alejandro: 𝐀𝐠𝐞𝐧𝐭𝐢𝐜 𝐃𝐚𝐭𝐚 𝐒𝐭𝐚𝐜𝐤: 𝐀 𝐍𝐞𝐰 𝐄𝐫𝐚 𝐅𝐨𝐫 𝐃𝐚𝐭𝐚 𝐏𝐫𝐨𝐟𝐞𝐬𝐬𝐢𝐨𝐧𝐚𝐥𝐬 https://moderndata101.substack.com/p/agentic-data-stack-new-era-for-data-professionals

2 Reactions

To view or add a comment, sign in

More Relevant Posts

Alejandro Aboy

Senior Data Engineer | Writing in “The Pipe & The Line”™️
1mo
Report this post
AI has changed how coding is approached. But it still struggles to run a SQL query if the context is not spot on. Training datasets across the internet are full of ways to code a Tinder clone app but there isn't a one size fit all approach to do data modeling in all companies. Data governance or semantic layer documentation is still a mystery for lots of data roles and AI will just inherit the feeling. The data space has still a lot of room for exploration and experimentation! The new confort zone should be cross functional roles that can translate business use cases into AI ready assets. And that's how we evolve the way in which decisions are made. Hope you enjoy both articles I prepared And if you don't like, check them put to see those lovely infographics! Thanks Modern Data 101
Modern Data 101

12,232 followers
1mo

𝐀𝐈 𝐰𝐨𝐧’𝐭 𝐦𝐚𝐠𝐢𝐜𝐚𝐥𝐥𝐲 𝐟𝐢𝐱 𝐛𝐫𝐨𝐤𝐞𝐧 𝐝𝐚𝐭𝐚 𝐦𝐨𝐝𝐞𝐥𝐬. But if we get the foundations right, we can supercharge how LLMs help us model, document, and evolve our data ecosystems. Over the years, we’ve gone from Kimball and Inmon to dbt and the Modern Data Stack. But too often, we obsess over tools while neglecting fundamentals. As AI and agent-driven IDEs become mainstream, Alejandro Aboy highlights the real opportunity isn’t replacement, but augmentation. This is where 𝐕𝐢𝐛𝐞 𝐃𝐚𝐭𝐚 𝐌𝐨𝐝𝐞𝐥𝐥𝐢𝐧𝐠 comes in: an evolving vision for blending data modelling discipline with AI-native workflows. 📌 Key Takeaways ✅ Data Modelling still matters → It’s the semantic backbone for engineers, analysts, and governance. ✅ Two modes exist: Constructive (adding features, dependencies, docs) and Destructive (removing models, cleaning up). Both are complex, even before AI. ✅ Challenges for AI/Agents → Context limits, missing catalogs, outdated docs, and technical debt can cripple LLM-driven modelling. ✅ Readiness checklist → Solid foundations + well-documented schemas (Star Schema, OBT, or Normalisation depending on trade-offs). 💡 𝐓𝐡𝐞 𝐕𝐢𝐛𝐞 𝐕𝐢𝐬𝐢𝐨𝐧 Level 1: Extend your AI IDE Level 2: Add MCP Power Level 3: More Tools ⬇️ Access the detailed guide on Modern Data 101's latest substack in collaboration with Alejandro Aboy: https://lnkd.in/dTeEgkew The future of data modelling isn’t about AI replacing us but about AI-augmented modelling that reduces technical debt and accelerates business value. What’s your take: Are we ready for Vibe Data Modelling, or are we still stuck in tool-first thinking? #DataModelling #DataStrategy
Like Comment
To view or add a comment, sign in
Idan Dekel

Data Scientist | Leveraging a Senior Software Engineering background to build robust, scalable data solutions | Python, Machine Learning, GCP, AWS
1mo
Report this post
Day 1 of Big Data LDN is done, and I came back with incredible insights. It opened with a powerful talk from the Head of Data Operations at the Infected Blood Compensation Authority. She shared how data is helping reconstruct stories and deliver long-overdue justice for those affected by the infected blood scandal (which I’ll admit I wasn’t aware of until today). More background here: https://lnkd.in/ev6XuZy8 Another standout session was “When Data Fails Women: The Uncomfortable Truth About Safety and Harm.” It explored the challenges around collecting reliable data on women’s safety. What struck me most was that the difficulties often come not from technical barriers, but from cultural and emotional ones. Tough to listen to, but essential to confront. The companion report, The Data Delta, is worth a look: https://lnkd.in/esUtrY3f. To balance things out, I also caught the hilarious Data Puppets with their satirical Data Potato — a supposed one-size-fits-all fix for any enterprise’s data problems. Other highlights included: • A session on the evolving role of data engineers in the AI era. The focus was on moving from being “plumbers” to becoming curators of data sources, using context engineering, LLMs, and MCPs. • Product demos that stood out: ▫ lakeFS: Git‑like version control for data ▫ Maia (by Matillion): a digital data engineer designed to handle repetitive tasks, assist with troubleshooting, and free engineers to focus on more advanced work ▫ Wolfram: a tool aiming to combine the creativity of generative AI with the rigour of computational thinking, reducing hallucinations in AI outputs The day wrapped up with The Great Data Debate, Big Data LDN’s flagship event. Panellists discussed everything from integrating agentic AI into the enterprise, to modern pipeline architecture, to the future of AI governance. I’m excited to see what Day 2 will bring and will share more highlights here.

1 Comment
Like Comment
To view or add a comment, sign in
Dhanesh Aradhye

Head of AI, Data & Product | GenAI Trainer | Data Solutions Architect | Gen AI Architect | Data Modeler | Snowflake, AWS, GCP
3w
Report this post
How do you integrate AI LLMs into Data Governance? Well, a little over a year ago, I worked on a project where the goal was simple: reduce costs and improve governance in Snowflake. We approached it in two phases: 🔹 Phase 1 – Traditional Approach Used metadata tables to identify inactive users and disable them. Flagged stale datasets not queried in months and moved them to cheaper storage or purged them. Manual scripts + scheduled tasks got us a solid 20–30% cost reduction and tighter security. 🔹 Phase 2 – Early AI/LLM Adoption Leveraged Snowflake’s Cortex AI functions (AI_CLASSIFY, AI_COMPLETE, etc.) to analyze usage logs + object metadata. LLMs helped classify tables into active, archival, or purge candidates based on usage patterns and documentation. Built AI-driven alerts for inactive users, idle warehouses, and redundant datasets. This second layer brought additional 15–20% savings and far less manual review. Lesson learned: start with the basics, then layer in AI. The real magic comes when LLMs work alongside traditional governance—faster, smarter, but still compliant. I’m curious: how are you (or your teams) using AI or LLMs to improve data governance, cost efficiency, or platform observability? #DataGovernance #Snowflake #AI #CortexAI #GenAI #DataArchitecture #CostSavings
1 Comment
Like Comment
To view or add a comment, sign in
Mohamad Tofique Siddique

Technology Lead | Currently serving notice period, open to new opportunities | Data Engineering & Analytics | Azure Data Factory | Microsoft Azure SQL | Microsoft Fabric | Power BI | SSIS | Azure Databricks
1mo Edited
Report this post
🔹 From JSON to Tabular Insights: How I Leveraged GenAI Copilot In modern data modeling, calculated tables are essential for creating meaningful analytics. While working with Tabular Editor, I encountered a challenge: a JSON file defining calculated tables that was nested, complex, and hard to read manually. Instead of struggling with manual parsing, I turned to GenAI Copilot—and it transformed the way I approached this task. How GenAI Copilot Helped Concept: Parsing Depth Challenge: Nested JSON objects were difficult to analyze GenAI Copilot Role: Converted deeply nested JSON into a structured tabular format Benefit: Simplified understanding of complex calculated tables Concept: Granularity Challenge: Metrics were at different levels of detail (daily vs. monthly) GenAI Copilot Role: Identified optimal granularity during the conversion Benefit: Ensured accurate reporting and optimized model performance Concept: Format Challenge: JSON readability and maintainability GenAI Copilot Role: Reformatted and mapped JSON fields into a clean table Benefit: Improved clarity, maintainability, and usability of data ✨ Outcome • Complex JSON converted into a clear tabular view instantly. • Faster analysis of calculated tables for accurate KPI mapping. • Reduced manual effort, allowing more time for insights and optimization. 💡 Takeaway Generative AI Copilot isn’t just for code—it’s a practical assistant for data analysts, capable of transforming metadata and complex JSON structures into actionable tables. By integrating AI into data workflows, we can streamline modeling, reduce errors, and accelerate delivery. Microsoft

1 Comment
Like Comment
To view or add a comment, sign in
Joydeep Ghosh
1mo
Report this post
🎯 𝗘𝘅𝘁𝗿𝗮𝗰𝘁 𝗕𝘂𝘀𝗶𝗻𝗲𝘀𝘀 𝗖𝗼𝗻𝘁𝗲𝘅𝘁 𝗳𝗿𝗼𝗺 𝗟𝗟𝗠-𝗯𝗮𝘀𝗲d 𝗦emantic Metadata Analysis If you're still crafting SQL to understand field meanings, you’re not alone. Many Data engineers continue to spend excessive time: → 𝙎𝙘a̶n̶n̶i̶n̶g schemas → Manually defining semantic models → Coding quality checks field by field That was static metadata. With agentic AI, things transform: ➡️ Schemas are identified automatically ➡️ Fields are categorized with business context ➡️ Initial rules (nulls, ranges, integrity) are applied immediately ➡️ Coverage updates dynamically in your business notebook It’s more than a map. It’s an intelligent, evolving context layer. ❇️ And here’s why it matters: 42% of enterprises extract data from over eight sources for AI workflows. Such complexity disrupts static metadata models. To construct reliable AI, you need metadata that acts—semantic context that evolves over time. #AgenticAI #DataManagement #DataQuality #DataObservability #AIReadyData #semanticmetadata
Like Comment
To view or add a comment, sign in
NetMinded

1,457 followers
1mo
Report this post
Data teams: 3 reasons why separation of concern is vital for ML/AI outcomes In ML and AI, the old maxim still applies: garbage in, garbage out. APIs can help for point-to-point integrations, but when data must flow across an ecosystem, they fall short. Even when a data bus is used, too often, teams push raw data and then try to retrofit context with complex translation rules. That’s slow, fragile, and error-prone. Here’s why separation of concern matters: 1. Context at the point of publishing Senders should align with a shared schema before data leaves their system. That way, every consumer reads the same structure and doesn’t waste time reverse-engineering meaning. 2. Universal signals alongside domain detail Domain expertise will always matter, but adding common signals—like a severity score—up front gives Data Scientists a head start. They can explore patterns system-wide without first untangling raw telemetry. 3. Normalised data fuels automation When data is structured and scored at source, it’s instantly usable for ML training and inference. This accelerates AI outcomes and enables cross-domain automation. At NetMinded, this is how we’ve built MNOC from day one. Our toolkit gives data engineers the ability to create pipelines that data owners can trust and use directly—because separation of concern isn’t an afterthought, it’s the foundation. If you’re tackling these challenges in your own data ecosystem, let’s talk. Reach out and let’s explore how MNOC can support your team.
Like Comment
To view or add a comment, sign in
Rajesh Iyer

Global Head of AI for FS | Chief Architect | GenAI Engg. Leader | L5-Banking | L4-Insurance | Startup Advisor
1mo
Report this post
Hey Data Product, Where Are You Hiding? For years, Data Products played shy — tucked away in catalogs, defined by YAML, and politely ignored outside the data team. Now with GenAI exploding into the enterprise, everyone’s whispering: “Is the Data Product dead?” Not even close. In fact, this is where Data Products finally come out of hiding. The twist nobody is talking about: the Feature Store is their secret sidekick. • Structured + Unstructured: Yesterday’s Data Product wrapped a fact table. Today, it needs to fuse policy records with call transcripts, scanned PDFs, even embeddings from emails. This is where Feature Stores evolve — not just numeric encodings for ML, but unstructured signals and embeddings too. • Static → Dynamic: Data Products set the stage with semantics and governance. Feature Stores feed them with dynamic signals — the real-time churn scores, fraud risk features, or embeddings that keep GenAI grounded. • Catalog → Context: GenAI doesn’t want to “browse” data assets. It wants context, streamed directly. Data Products define that context. Feature Stores make it consumable at runtime. • Governance → Advantage: Without Data Products, Feature Stores are just plumbing. Without Feature Stores, Data Products are just PDFs in a catalog. Together, they form the only trustworthy bridge between raw enterprise data and GenAI answers. So when we shout “Hey Data Product, where are you hiding?”, the answer shouldn’t be “in a dusty catalog.” 👉 It should be: “I’m right here — powered by a Feature Store, delivering structured + unstructured signal straight into GenAI.” This isn’t the end of Data Products. It’s the moment they finally step onto the main stage — with Feature Stores running the spotlight. #tPower #ml #ai #GenAI4FS #inc81starch

4 Comments
Like Comment
To view or add a comment, sign in
Konsta Rönkkö

Founder & CEO @ Meshly Oy
3w
Report this post
What would it mean for your business if you freed half of your AI budget next year? Developing AI-based solutions is often frustrating for the data scientist. Up to 70% of almost every project is just scraping together data, cleaning and tying it all together. To make things even worse, this is done per-project basis, with very little reusable assets being formed while doing it. I’ve been watching this inefficiency and outright waste of money for years while a better alternative exists right under our noses; with proper pipes and thinking of data as a product, you can start to create a portfolio of reusable, high-quality, and governed data assets that deliver data-intensive applications up to 90% faster, with lower costs to boot. And no, a data lake does not solve the problem. It can be a part of the solution, but by itself it doesn’t bring any tangible business outcomes. By creating a catalog of data products with a clear contract on what it serves, who owns it, and how to access it, you create use for it, and accidentally solve the data governance & ownership problems too. I’ll happily discuss this more if you’re interested. Meshly Oy provides open-source based data control solutions to serve up-to-date, curated data for your AI & BI needs, regardless of where your data lives.
Like Comment
To view or add a comment, sign in
MILAN KUMAR BEHERA

Data Scientist | 4+ Yrs in BFSI, Risk Analytics, & AI/ML | Python, SQL, Tableau, Power BI | Automating Insights, APIs, Dashboards | NLP, GenAI, Compliance
1mo
Report this post
Beyond ARIMA: Unlocking Rapid, Data-Driven Forecasts for Business & Compliance In today's fast-paced business environment, waiting weeks to deliver critical forecasts simply isn't an option. Whether it's predicting financial market trends, optimising inventory for supply chain resilience, or anticipating regulatory changes, the demand for timely, accurate insights is constant. Data scientists often face a dilemma: deep model tuning for perfection versus delivering actionable insights now. This isn't a call to cut corners, but rather a strategic embrace of what I call the "intelligent automation" mindset. Instead of reinventing the wheel with painstaking traditional methods, we can leverage powerful AI/ML tools to deliver robust forecasts with remarkable speed and efficiency. This approach is key to transforming data into tangible business value and strengthening compliance postures. My experience in AI/ML for financial services has shown that Python's rich ecosystem offers incredible solutions. Libraries like Prophet excel at capturing business-specific trends and seasonality with minimal effort, providing quick, reliable baselines. For those needing a bit more statistical depth without the manual grind, Auto ARIMA automates parameter selection, turning complex models into plug-and-play solutions. And when the enterprise calls for ultimate automation, AutoML platforms like H2O or Azure AutoML can ingest raw time series data and deliver optimised models with unprecedented speed, ready for deployment in MLOps pipelines. By adopting these "lazy" — or rather, smart — forecasting methods, we empower organisations to make quicker, more informed decisions. This translates directly into reducing operational risks, optimising financial planning, and even automating aspects of compliance monitoring where predictive insights are crucial. Imagine predicting potential breaches in service level agreements or identifying anomalous financial transactions with speed and accuracy. It’s about leveraging predictive insights to drive both significant business impact and robust governance. What are your go-to tools for balancing forecasting accuracy with speed? Share your insights! Inspired by "The Lazy Data Scientist’s Guide to Time Series Forecasting" 📌 Ultimately, shifting from painstaking model tuning to rapid Python forecasting isn't just about speed; it's about unlocking the agility to truly shape strategy and confidently navigate complexity. For me, seeing how quickly we can deliver robust, actionable insights to drive compliance and business value is where the real magic happens. #DataScience #AIForecasting #RiskAnalytics #ComplianceAutomation #FinancialML #DataScience #AI #FinancialAnalytics
Like Comment
To view or add a comment, sign in
Tripp Parker

VP, Data Transformation | $3B Forecast Accuracy Impact | Founder, Fueled Analytics — Advancing Peer Benchmarking & Data Trust Across Finance | AI • Governance • Analytics
1mo
Report this post
Data Lineage Without the Heavy Tools When a reference table changes, the first question every analyst asks is: “What will break downstream?” Traditionally, answering that means waiting for enterprise lineage tools — or manually untangling dependencies across dozens of tables. Both slow, both frustrating. This month I tested a different approach: using Gemini AI to create a fast, analyst-friendly lineage map. Example (fictional demo data) I provided Gemini with three table schemas: transactions (tx_id, customer_id, country_code, amount) customers (customer_id, home_country_code) ref_country (code, country_name, region) Gemini Output (summary) transactions.country_code → depends on ref_country.code. transactions.customer_id → depends on customers.customer_id. customers.home_country_code → depends on ref_country.code. Impact note: Any change to ref_country.code affects both transactions and customers. Why This Matters Faster impact analysis → analysts can spot dependencies in minutes. Less bottlenecked → you don’t wait weeks for central lineage tools. More proactive → governance boards see risks before controls break. Takeaway: AI doesn’t replace enterprise lineage tooling, but it gives analysts a first-pass map to navigate dependencies. That agility means fewer surprises when reference tables shift. Question for you: If you could map lineage in 10 minutes, what would you use it for — audits, forecasting, or change management? #DataQuality #AI #Governance #Lineage #Gemini #Analytics
Like Comment
To view or add a comment, sign in

12,232 followers

View Profile Connect

LinkedIn respects your privacy

How Vibe Data Modelling can augment AI-driven data modelling

More from this author

AI’s Usage Crisis: Why UX Is the Differentiator

🦾 Control the Context, Control the Agents: Context Engineering as Foundation for AI Dominance

The AI Shortcut Myth: Faster Decisions or Hidden Delays?

Explore content categories

How Vibe Data Modelling can augment AI-driven data modelling

More Relevant Posts

More from this author

AI’s Usage Crisis: Why UX Is the Differentiator

🦾 Control the Context, Control the Agents: Context Engineering as Foundation for AI Dominance

The AI Shortcut Myth: Faster Decisions or Hidden Delays?

Explore related topics

Explore content categories