Big Data LDN Day 1: Insights on Data, AI, and Ethics

Data Scientist | Leveraging a Senior Software Engineering background to build robust, scalable data solutions | Python, Machine Learning, GCP, AWS

1mo

Day 1 of Big Data LDN is done, and I came back with incredible insights. It opened with a powerful talk from the Head of Data Operations at the Infected Blood Compensation Authority. She shared how data is helping reconstruct stories and deliver long-overdue justice for those affected by the infected blood scandal (which I’ll admit I wasn’t aware of until today). More background here: https://lnkd.in/ev6XuZy8 Another standout session was “When Data Fails Women: The Uncomfortable Truth About Safety and Harm.” It explored the challenges around collecting reliable data on women’s safety. What struck me most was that the difficulties often come not from technical barriers, but from cultural and emotional ones. Tough to listen to, but essential to confront. The companion report, The Data Delta, is worth a look: https://lnkd.in/esUtrY3f. To balance things out, I also caught the hilarious Data Puppets with their satirical Data Potato — a supposed one-size-fits-all fix for any enterprise’s data problems. Other highlights included: • A session on the evolving role of data engineers in the AI era. The focus was on moving from being “plumbers” to becoming curators of data sources, using context engineering, LLMs, and MCPs. • Product demos that stood out: ▫ lakeFS: Git‑like version control for data ▫ Maia (by Matillion): a digital data engineer designed to handle repetitive tasks, assist with troubleshooting, and free engineers to focus on more advanced work ▫ Wolfram: a tool aiming to combine the creativity of generative AI with the rigour of computational thinking, reducing hallucinations in AI outputs The day wrapped up with The Great Data Debate, Big Data LDN’s flagship event. Panellists discussed everything from integrating agentic AI into the enterprise, to modern pipeline architecture, to the future of AI governance. I’m excited to see what Day 2 will bring and will share more highlights here.

1 Comment

Anne-Sophie Pereira De Sá

Data visualisation, insights (and macro data refinement) at LinkedIn

1mo

Gutted I had to miss this but eager to read your notes on day 2!

1 Reaction

To view or add a comment, sign in

More Relevant Posts

Zparse

131 followers
1w
Report this post
𝗬𝗼𝘂𝗿 𝗔𝗜 𝘀𝘁𝗿𝗮𝘁𝗲𝗴𝘆 𝗶𝘀 𝗼𝗻𝗹𝘆 𝗮𝘀 𝘀𝘁𝗿𝗼𝗻𝗴 𝗮𝘀 𝘆𝗼𝘂𝗿 𝗱𝗮𝘁𝗮. 📊 Data quality issues are forcing data science teams to waste 80% of their time on cleanup. This dramatically slows down model deployment and cuts into ROI. Discover how Zparse leverages AI-powered, no-code ETL to instantly provide the clean, consistent data foundation your machine learning models or RAG systems need for accurate results. Read our latest blog post to see the path from data chaos to competitive advantage: https://lnkd.in/emxGUzUb #AI #DataFoundation #NoCode #DataStrategy
Like Comment
To view or add a comment, sign in
Dhanesh Aradhye

Head of AI, Data & Product | GenAI Trainer | Data Solutions Architect | Gen AI Architect | Data Modeler | Snowflake, AWS, GCP
4w
Report this post
How do you integrate AI LLMs into Data Governance? Well, a little over a year ago, I worked on a project where the goal was simple: reduce costs and improve governance in Snowflake. We approached it in two phases: 🔹 Phase 1 – Traditional Approach Used metadata tables to identify inactive users and disable them. Flagged stale datasets not queried in months and moved them to cheaper storage or purged them. Manual scripts + scheduled tasks got us a solid 20–30% cost reduction and tighter security. 🔹 Phase 2 – Early AI/LLM Adoption Leveraged Snowflake’s Cortex AI functions (AI_CLASSIFY, AI_COMPLETE, etc.) to analyze usage logs + object metadata. LLMs helped classify tables into active, archival, or purge candidates based on usage patterns and documentation. Built AI-driven alerts for inactive users, idle warehouses, and redundant datasets. This second layer brought additional 15–20% savings and far less manual review. Lesson learned: start with the basics, then layer in AI. The real magic comes when LLMs work alongside traditional governance—faster, smarter, but still compliant. I’m curious: how are you (or your teams) using AI or LLMs to improve data governance, cost efficiency, or platform observability? #DataGovernance #Snowflake #AI #CortexAI #GenAI #DataArchitecture #CostSavings
1 Comment
Like Comment
To view or add a comment, sign in
Wayne Jin

Chief Marketing Officer @ Monte Carlo
3w
Report this post
Great post from Tomasz Tunguz on the importance of Data + AI observability. "The entire system needs accurate data & fast. That’s why data observability will also fuse with AI observability to provide data engineers & AI engineers end-to-end understanding of the health of their pipelines. Data & AI infrastructure aren’t converging. They’ve already fused." Check out the full article here: https://lnkd.in/gw9vacx4

Data & AI Infrastructure Are Fusing tomtunguz.com
Like Comment
To view or add a comment, sign in
Lyndsay Goldman

Head of Data & Analytics | Scaling Teams, Platforms & Data-as-a-Product Initiatives | Driving Actionable Insights
1mo Edited
Report this post
Not long ago, every conversation was about Big Data. Then it was all about AI. But now? We’re back to talking about semantic layers and reporting layers. In other words: the fundamentals. I started my career as a coder/developer and designer of large-scale application systems, then pivoted to BI. What I learned early on it was that it was about making sure the data was clean, consistent, and trusted. That work wasn’t glamorous, but it was the foundation that made strategy and innovation possible. And today, based on first-hand experience, the same truth holds: - Clear, consistent metric definitions - A governance framework that ensures trust - Reporting layers that people actually use The hype cycles change: Big Data, AI, you name it. But the basics don’t. Without a strong foundation, none of the big ideas matter.
Like Comment
To view or add a comment, sign in
Masscom Corporation

12,587 followers
2w Edited
Report this post
🚀 Beyond the Dashboard: The New Era of AI-Powered Data In today’s AI-first world, it’s not just about having data, it’s about flowing, governing, and learning from it in intelligent, secure ways. The latest blog explores how roles like Data Engineer, Data Analyst, and Data Architect are evolving, and why the organizations that win will be those that build smarter, ethical, real-time data systems. 🔗 Dive in here: https://lnkd.in/gAySA-hi Masscom Corporation #DataEngineering #AI #RealTimeAnalytics #DataGovernance #MasscomCorporation #DigitalTransformation

Beyond the Dashboard: How AI Is Turning Data Pros into Architects of the Future https://masscomcorp.net
Like Comment
To view or add a comment, sign in
Itay Bleier

Co-founder at Monte Carlo
3w
Report this post
Don't miss this post from Tomasz Tunguz on the convergence of data and AI infrastructure—and why observability is the foundation. "The entire system needs accurate data & fast. That’s why data observability will also fuse with AI observability to provide data engineers & AI engineers end-to-end understanding of the health of their pipelines. Data & AI infrastructure aren’t converging. They’ve already fused." Check out the full article here: https://lnkd.in/ghwyqt3b

Data & AI Infrastructure Are Fusing tomtunguz.com

1 Comment
Like Comment
To view or add a comment, sign in
Karan Chandra Dey

AI Product Strategist | GenAI Business Consultant | Human-AI Interaction Designer | LLM Engineer | SF / Los Angeles
1w
Report this post
🎥 Enterprise AI Crossroads — choosing the right GenAI path in 2025 Most teams aren’t failing on models. They’re failing on fit. In this video, I walk through a 4-question decision tree that helps enterprise leaders pick the right implementation—fast. TL;DR: Ask these four questions in order Q1 — Data Currency & Citability Do we need up-to-date, citeable facts from internal sources? → Yes: Start with RAG (hybrid retrieval + reranker). Layer Graph-RAG for multi-hop questions. → No: Go to Q2. Q2 — Behavior, Format & Scale Do we require strict behavior (tool use, JSON schema) or a smaller/cheaper model at scale? → Yes: Fine-tune (PEFT + DPO). Optionally add light RAG later for grounding. → No: Strong prompting or RAG-lite may be enough. Q3 — Content Volatility Are sources changing weekly/daily? → High: Prioritize RAG. Re-embed only “hot” data on a schedule. → Low/Stable: RAG or Fine-tune can work—run a quick break-even on tokens, infra, and data labeling. Q4 — Query Complexity Do users ask multi-hop, relationship-heavy questions (people ↔ orgs ↔ assets)? → Yes: Add Graph-RAG (knowledge graph + retrieval) for precision and traceability. Practical playbook • Start lean: RAG first for changing data; Fine-tune when format/latency/scale demand it. • Instrument evals: accuracy, latency, cost per answer, and “% answers w/ citations.” • Design for ops: pipelines for chunking, re-embeds, guardrails, & model refresh. • Don’t guess—measure the break-even between data program cost vs. serving cost. If you’re at the crossroads, this framework will save months of trial-and-error. 🔗 Watch the full breakdown in the video and tell me: where does your use case land—RAG, Fine-tune, or Graph-RAG? #EnterpriseAI #GenAI #RAG #FineTuning #GraphRAG #AIProductStrategy #MLOps #DataStrategy #LLMEngineering
Like Comment
To view or add a comment, sign in
Jacopo Tagliabue

AI since before it was cool, now also data and volleys
1w
Report this post
"According to the Anthropic data, coding tasks dominate API usage at 44%. The sheer scale of this concentration points less to model capabilities and more to a simple reality: it also reflects where existing infrastructure makes automation feasible." Fantastic opening from Ben Lorica 罗瑞卡, cutting through the noise: automation in data engineering will accelerate if we treat the data lifecycle as code-first, not GUI-first. The #lakehouse “complexity tax” humans pay today is the biggest bottleneck to real automation for many #dataengineering tasks: #LLMs are pretty good, but many data abstractions from a previous era haven’t kept up. Thanks, Ben, for explaining the needs of #agentic data engineering so clearly: APIs, isolation, and versioning are the missing ingredients. It’s early days, but we’re proud of bauplan customers for testing the waters with MCP, coding assistants, and agentic loops over *production data*. See you, #agentic cowboys!

Ben Lorica 罗瑞卡

gradientflow.substack.com
1w

🆕 Why AI Agents Struggle with Your Data Stack 🎯 Inside the Architectural Shift Toward Agent-Native Platforms (with Ciro Greco of bauplan) → https://lnkd.in/gMyRT4UR

The Convergence of Data, AI, and Agents: Are You Prepared? gradientflow.substack.com

1 Comment
Like Comment
To view or add a comment, sign in
Joshua Berkowitz

💻 Software Consulting 🤖 AI & Full Stack Developer 👔 Professional Education 🛒 eCommerce 🏢 BigData 🛢️ Database Development 🏗️ Startup Mentor 🎓 Private Instruction 🤝 DevOps
3w
Report this post
I was looking into the recent announcement about Anthropic #Claude #Sonnet 4.5 becoming available on Databricks. This integration aims to streamline workflows for data teams by embedding AI capabilities directly into existing processes. Read more 👉 https://lnkd.in/e6zCPWy9 Here are a few points I found noteworthy: 📊 #Simplified #ETL: With Lakeflow, teams can automate ETL processes that use Claude 4.5 for tasks like summarization, classification, and data enrichment. 📈 #Scalable #Analysis: The ability to run complex reasoning over millions of rows, PDFs, and other unstructured data directly in DBSQL is a considerable advantage. 💡 #Actionable #Insights: Results can be stored securely in Delta tables, creating a smoother path from analysis to action. I'm interested to see how embedding these functions directly in the data platform will impact the speed of developing and deploying GenAI applications. Read more 👉 https://lnkd.in/e6zCPWy9 #GenerativeAI #DataAnalytics #Databricks #AI #MachineLearning #BigData
Like Comment
To view or add a comment, sign in
Alex Alcivar

Building PublicDataWatch.com: The Future of Hyperlocal Data Intelligence | AI & Product Leader | AlcivarTech
3w
Report this post
The task was to create a ✨ 𝗴𝗲𝗻𝗲𝗿𝗮𝗹𝗶𝘇𝗲𝗱, 𝗿𝗲𝘂𝘀𝗮𝗯𝗹𝗲 𝘁𝗼𝗼𝗹 ✨ that automates one of the most tedious parts of data science: creating a mapping schema for raw categorical data. The new command takes any dataset, finds all unique values in a field, and uses an LLM to generate a clean, structured category map—saving me countless hours of manual work. 🤖 This wasn't just a simple script. We gave the AI a high-level "vibe" and some architectural context, and it executed a full engineering sequence: 1️⃣ Scaffolding a new database migration. 2️⃣ Updating the Eloquent model. 3️⃣ Creating the new, complex Artisan command from scratch. 4️⃣ Modifying the database seeder to use the new logic. ...all while flawlessly (almost) adhering to our project's specific patterns. ✔️ But a cool tool is only half the story. 🎯 The real value is what it unlocks. To test it, we immediately pointed this new tool at a dataset I care about personally: my hometown of 𝗘𝘃𝗲𝗿𝗲𝘁𝘁'𝘀 police dispatch logs. 📍 The insights it helped uncover for 2025 were eye-opening: 📉 𝗗𝗲𝘁𝗮𝗶𝗹 / 𝗣𝗮𝘁𝗿𝗼𝗹 𝗔𝗰𝘁𝗶𝘃𝗶𝘁𝘆: Down a staggering 𝟰𝟱.𝟲𝟴% 📈 𝗪𝗮𝗿𝗿𝗮𝗻𝘁 𝗦𝗲𝗿𝘃𝗶𝗰𝗲: Up an incredible 𝟭𝟱𝟳.𝟯𝟱% This is the real power of this workflow. We went from a complex feature idea to a statistically significant insight about my own community in a fraction of the time. ⚡ The attached paper is our deep dive into formalizing this process. 🔬 It breaks down the workflow, the math behind the AI's "attention mechanism," and showcases the raw data output. It's a testament to how this new paradigm isn't just about developer productivity—it's about accelerating the path to discovery. Check out the full paper attached below with an Appendix detailing trends in Everett crime over the last 5 years. 👇 🤔 Beyond just writing code faster, what’s the most valuable, non-obvious outcome you’ve achieved using AI in your development workflow? #AI #SoftwareDevelopment #LLM #DeveloperTools #DataAnalytics #CaseStudy #Laravel #GenAI #FutureOfWork #AIinDev #FormalMethods
Like Comment
To view or add a comment, sign in

489 followers

14 Posts

View Profile Connect

LinkedIn respects your privacy

Big Data LDN Day 1: Insights on Data, AI, and Ethics

Explore content categories