How to use Snowflake for Lakehouse Analytics | Snowflake Developers posted on the topic | LinkedIn

LinkedIn respects your privacy

LinkedIn and 3rd parties use essential and non-essential cookies to provide, secure, analyze and improve our Services, and to show you relevant ads (including professional and job ads) on and off LinkedIn. Learn more in our Cookie Policy.

Select Accept to consent or Reject to decline non-essential cookies for this use. You can update your choices at any time in your settings.

View organization page for Snowflake Developers

Snowflake Developers

50,566 followers

2w

Introducing Lakehouse Analytics with Snowflake! Achieve the performance and security of a data warehouse directly on your data lake, all without moving or copying data. Snowflake’s single engine allows you to connect data in-place, reliably powering BI, AI, and more, all on a platform that just works to supercharge decision-making with AI you can trust. Watch the full demo from Vino Duraisamy to see how Snowflake’s single engine and built-in governance help you build a simpler, more powerful data architecture. Here’s what you'll learn: ❄️ Iceberg Discovery: How to automatically discover and query fresh Iceberg tables directly from your AWS Glue Data Catalog via a catalog-linked database. ❄️ Enterprise Governance: How to apply and manage consistent, enterprise-grade governance and fine-grained access policies using Snowflake Horizon. ❄️ AI-Powered Access: How to talk to your lakehouse data with Snowflake Intelligence or securely access it via Data Share. Learn more about Lakehouse Analytics with Snowflake: https://lnkd.in/gbhjXAGb

12 Comments

Transcript

All right everyone, I Amino Duraisamy and I'm a developer advocate. It's not fake. Let's jump straight into the live demo. It's going to be fun, I promise you that. For the demo, I want to start with the end result, the wow moment, and then I will show you how it got there. Imagine you're a product quality associate at an ecommerce company. You're constantly monitoring product quality through a product ratings dashboard. It's been a fine Friday afternoon. When you just noticed a dip in the star rating of one of your top selling product categories, well, you need to understand why. And you need to know now your options are slag your favorite data analyst to quickly pull up the data for you. What a fun Friday evening that makes for the both of you. Or file a ticket and wait for a couple of days so your data team could investigate that. Well, what if there is 1/3 option, The one that does not ruin your TGIF lights? All thanks to Snowflake Intelligence, you can talk to your data directly and supercharge your decision making. As a business user and a data customer, you can now get all your questions answered instantly. And as the data analyst, you can build these self-serve apps that free you from Brandom adhoc requests from the business. Sounds too good to be true. Well, it's still good, Antrup. Let's see if we can find answers to our questions. After that I will show you the architecture that powers this type of analytics on the data in your lake house without moving the data anywhere. Well firstly, which product categories have the highest average ratings? It seems CD's, vinyl and whatnot looks great. Hmm, interesting. When I asked her to give me the volume of the readings as well, It's very interesting as you can see that it's working through the data. And here we go, books have the highest rating. By the rating volume. Very cool. Firstly, we phone answers to our questions, which is great. Secondly, both you and your data analyst can be confident in these answers because Snowflake Intelligence shows you the query it ran against your lake house. So, well, talk about trustworthy and reliable insights. Very cool, isn't it? Now, this experience is fantastic, but for many of us, the reality is different. Your valuable data like product reviews often lives in your own data lake in formats like Apache Iceberg. But your analytics tools or a separate complex puzzle? Picking the right analytics engine, tuning and managing complex jobs, and choosing the right visualization tool, even training your business users to use them effectively. It's on a constant struggle. It creates performance bottlenecks, creates data silos and slows down your team. Not fun at all. What if you could bypass all of that complexity? Imagine getting the high performance and drawbacks security of a data warehouse directly on your existing lake house. That is the core promise of lake house analytics with Snowflake. Let's quickly breakdown the architecture that's powering the AI insides. The product reviews data is in an AWS S3 bucket in Apache Iceberg table format and it is managed by AWS Glue Catalog. This is a pretty typical setup as customers have data across multiple formats and catalogs. With this Lake house analytics blueprint you can securely analyze all of your data using Snowflakes single high performance engine with no data movement required. Using Catalog linked databases, we then lay around Snowflake Intelligence that empowers your business users, such as our Product Quality Associates team to self-serve immediate insights. And to top it all, the entire process is governed by Snowflakes Horizon Catalog, which applies consistent fine grained access policies to ensure users only see the data they're permitted to. We're starting with the raw product reviews data, which is landed in stored in an AWC bucket as our central data lake. The key to this architecture is Iceberg and OpenTable format that provides database like functionality directly on top of your files in three. To manage this data, we use the AWS Glue as our metadata catalog, which we connect to via a REST catalog in our face. To directly analyze the data in your 3 lake house from Snowflake, you can create eyes sort tables in Snowflake and leverage its iceberg capabilities for analytics. For a couple of tables, it's easy. What if you have 10's and hundreds of iceberg tables in your registry? Listing each iceberg table in Snowflake would be a cumbersome process. This is where the catalog linked database comes in. The catalog linked database simply points to your lake house data. And as the name says, the database links to your catalog, which is the Glue catalog in our case and allows you to read from and write to any number of iceberg tables managed by the catalog seamlessly and securely. Well, in this case, I created an external value pointing to this essay bucket where my data is stored, and then I create a catalog integration to the Glue catalog. And with an external volume and a catalog integration created, I can create a catalog linked database. With the catalog linked, database discovery and data refresh is automated. So you're always able to query the freshest data. As you can see, the sync interval is 60 seconds. So every 60 seconds the data in your 3 managed by the Glue catalog is being synced into this catalog linked database. Your data stays in place within your storage, no migration or movements required at all, and you can leverage the power of Snowflakes high performance query engine for analytics. Additionally, if you have Delta tables, we offer Delta Direct that enables you to query these tables in place without data movement by translating the JSON metadata into Iceberg metadata. Let's say we want to find product satisfaction levels by category. The query to find the answer to this question would like something like this. While this might look like a complex query, and for the purpose of the demo, this is a complex query involving a joint operation on the ascend column. And the data set only has 100K reviews. And I'm the only person querying this data. In real production environments, we have seen queries with 10's and hundreds of joints that are run on millions if not billions of rows, and we might have 10's and if not thousands of users across the organization querying the same data concurrently. Well, in any case. As you can see, the query just runs. We do not have to tune the job site, the cluster, or really do any performance tuning at all. It just works. That is the power of snowflakes. Highly performant. Highly concurrent creature. And finally, it is also important to ensure only the right users and teams have access to the right data. Using Snowflake Horizon, you can create fine grained access controls to secure and manage your Iceberg tables. Let's say I have a product Quality Associates team. Around books and they should only have access to product reviews of books. Well, you can now create a row access policy, call it Books category policy, and define the access control, and then you can attach it to say a seller's book raw, which essentially gives access to only these users associated with the role. Which means now only the books team folks can access the book reviews and not the rest of the data. This role based access control is also applied to users querying your data from Snowflake Intelligence as well. And once we have that defined, when I run the same query as before, I would only see the books categories, product volume and average rating because now I only have access to the books and the book reviews of data. To quickly review everything we've done so far, our product reviews data was in an awful three bucket in the form of iceberg tables managed by AWS Glue Catalog. That was our lake house. We then created a catalog linked database in Snowflake to directly connect to the lake house and query and analyze the data without moving the data at all. And then we used Snowflakes high performance query engine to run analytics on our lake house. Thanks to Snowflake Horizon, we also created robust access control policies to ensure data security and governance is in place on your lake house. How powerful. I hope now you have the blueprint you need to run Lake House Analytics securely and reliably from Snowflake. The focus of today's demo is Lake House Analytics. The Snowflake Intelligence experience I showed you at the beginning of the demo is powered by Cardex analysts. To learn how to build the Snowflake Intelligence experience very similar to what I showed, please refer to the Quick Start guide that you would have in the link. And with that, thank you so much.

Bhuvan Chadha, graphic

Trusted Data Solutions Advisor | Data Platforms Consultant

2w

Great demo and insightful information! Looking forward trying this out.

Shivkumar V., graphic

Cloud Data Architect

2w

Nice demo!!

Tycologics, graphic

2w

Snowflake’s Lakehouse Analytics looks like a real game-changer for data!

Sai varun Gadewar, graphic

Sai varun Gadewar

Site Reliability Engineer | Cloud DevOps Engineer | Azure | AWS

2w

Great demo Vino Duraisamy, very helpful.

Dunith Danushka, graphic

Dunith Danushka

Product Marketing at EDB | Writer | Data Educator

2w

Love the technical storytelling angle!

Gourav Sengupta, graphic

Gourav Sengupta

Head - Data Engineering, Quality, Operations, and Knowledge

2w

Superb

See more comments

To view or add a comment, sign in

More Relevant Posts

Jeremy Kadlec

MSSQLTips.com Co-Founder, Editor and Author - LinkedIn Top Voice
3w
Report this post
Snowflake vs Databricks vs Fabric: What the Benchmarks Don’t Tell You by Ron L'Esteve >>> https://lnkd.in/eqAjEB8w This tip is structured into several sections comparing both platform and data warehousing strengths and limitations of Databricks, Fabric, and Snowflake. We will dive into comparisons for platform architecture, Spark performance, data engineering, CI/CD, automation, real-time analytics, AI/ML, governance, compliance, networking, and more. By the end, you will have a better understanding of when and why to choose one over the other.

Snowflake vs Databricks vs Fabric: What the Benchmarks Don’t Tell You mssqltips.com
Like Comment
To view or add a comment, sign in
Varun Singh

Data Engineer @EXL | I write about Data Engineering, AI/ML and LLMs
3w
Report this post
Too often, conversations in the data world turn into debates — “Which one is better?” Snowflake or Databricks. But the truth is, they are not direct competitors. Both platforms solve different problems, and in fact, they complement each other beautifully: 🔹 Snowflake excels as a powerful cloud data warehouse, optimized for structured data, SQL analytics, and seamless scaling. 🔹 Databricks shines as a unified analytics and AI/ML platform, enabling advanced data engineering, machine learning, and handling unstructured/semi-structured data at scale. 👉 Instead of “either/or,” think in terms of “and.” Organizations today need both data warehousing for analytics and data lakes for AI/ML innovation. Using Snowflake together with Databricks unlocks the best of both worlds — governed, scalable analytics with cutting-edge data science.
Carlos Bossy

CEO & Chief Architect at Datalere
3w Edited

As a data architect, I get this question often: Databricks or Snowflake — which one is better? Since most companies use both, the better question is: “How can they be paired in a unified architecture?” Our client Janus Henderson Investors proved this is possible. In the recent Datalere webinar, Mark Goodwin, Data Architect at Janus Henderson, shared how they built a hybrid setup: Databricks to power advanced transformations and ML pipelines, and Snowflake to operationalize the data for BI and reporting. Michael Spiessbach Speissbach, Datalere’s Manager of Data Architecture & Engineering, expanded on the design principles behind such a setup. A hybrid approach comes down to a few key practices: 1. Store once, serve many — keep curated data in open formats (e.g., Apache Iceberg) so both engines can query without duplication. 2. Separate by workload — run iterative pipelines and ML in Databricks; use Snowflake to deliver governed dashboards, ad hoc queries, and external data sharing. 3. Ingest efficiently — Databricks’ Autoloader simplifies streaming data, while Snowflake’s Marketplace brings in third-party feeds without heavy ETL. 4. Govern consistently — Unity Catalog and Snowflake’s data governance policies are improving, but most enterprises overlay an enterprise catalog and lineage system for end-to-end visibility. 5. Balance flexibility with discipline — notebooks in Databricks and proprietary SQL in Snowflake both accelerate productivity, but require standards for reuse and portability. 6. Control costs — serverless options help reduce friction, but workload isolation and usage policies are essential for predictable spend. 7. Plan for evolution — both platforms are converging in features. Designing around open formats keeps the door open for shifting workloads over time. Every enterprise makes different trade-offs, but Janus Henderson’s experience shows how a unified architecture can let Databricks and Snowflake complement one another — rather than forcing a choice between them. ▶️ Watch the full replay here: https://lnkd.in/gQmjxnxz
Like Comment
To view or add a comment, sign in
Naveen kumar

Data Architect at Ecolab || Big Data ||Data Engineering || Spark || Azure ||Databricks||Snowflake||ADF|| DBT || AI-ML/MCP
3w
Report this post
Achieving True Data Mesh with Iceberg Tables on Snowflake & Databricks. The complexity of running a modern data stack with both Snowflake and Databricks often creates a tangled web of redundant ETL jobs and duplicate data copies. This is precisely the "data mesh" friction we've all felt. The interoperability delivered by Apache Iceberg tables changes everything. By adopting this open table format, we can:- Eliminate Redundancy: Retire ETL pipelines whose sole purpose was moving data between the two platforms. Decouple Compute from Storage: Both engines can now query the same single copy of data in cloud storage (S3/ADLS/GCS). Simplify Governance: Enforce security and governance on one table, not two or more copies. This shift allows us to focus on data consumption and innovation, not data movement.
Carlos Bossy

CEO & Chief Architect at Datalere
3w Edited

As a data architect, I get this question often: Databricks or Snowflake — which one is better? Since most companies use both, the better question is: “How can they be paired in a unified architecture?” Our client Janus Henderson Investors proved this is possible. In the recent Datalere webinar, Mark Goodwin, Data Architect at Janus Henderson, shared how they built a hybrid setup: Databricks to power advanced transformations and ML pipelines, and Snowflake to operationalize the data for BI and reporting. Michael Spiessbach Speissbach, Datalere’s Manager of Data Architecture & Engineering, expanded on the design principles behind such a setup. A hybrid approach comes down to a few key practices: 1. Store once, serve many — keep curated data in open formats (e.g., Apache Iceberg) so both engines can query without duplication. 2. Separate by workload — run iterative pipelines and ML in Databricks; use Snowflake to deliver governed dashboards, ad hoc queries, and external data sharing. 3. Ingest efficiently — Databricks’ Autoloader simplifies streaming data, while Snowflake’s Marketplace brings in third-party feeds without heavy ETL. 4. Govern consistently — Unity Catalog and Snowflake’s data governance policies are improving, but most enterprises overlay an enterprise catalog and lineage system for end-to-end visibility. 5. Balance flexibility with discipline — notebooks in Databricks and proprietary SQL in Snowflake both accelerate productivity, but require standards for reuse and portability. 6. Control costs — serverless options help reduce friction, but workload isolation and usage policies are essential for predictable spend. 7. Plan for evolution — both platforms are converging in features. Designing around open formats keeps the door open for shifting workloads over time. Every enterprise makes different trade-offs, but Janus Henderson’s experience shows how a unified architecture can let Databricks and Snowflake complement one another — rather than forcing a choice between them. ▶️ Watch the full replay here: https://lnkd.in/gQmjxnxz
Like Comment
To view or add a comment, sign in
Carlos Bossy

CEO & Chief Architect at Datalere
3w Edited
Report this post
As a data architect, I get this question often: Databricks or Snowflake — which one is better? Since most companies use both, the better question is: “How can they be paired in a unified architecture?” Our client Janus Henderson Investors proved this is possible. In the recent Datalere webinar, Mark Goodwin, Data Architect at Janus Henderson, shared how they built a hybrid setup: Databricks to power advanced transformations and ML pipelines, and Snowflake to operationalize the data for BI and reporting. Michael Spiessbach Speissbach, Datalere’s Manager of Data Architecture & Engineering, expanded on the design principles behind such a setup. A hybrid approach comes down to a few key practices: 1. Store once, serve many — keep curated data in open formats (e.g., Apache Iceberg) so both engines can query without duplication. 2. Separate by workload — run iterative pipelines and ML in Databricks; use Snowflake to deliver governed dashboards, ad hoc queries, and external data sharing. 3. Ingest efficiently — Databricks’ Autoloader simplifies streaming data, while Snowflake’s Marketplace brings in third-party feeds without heavy ETL. 4. Govern consistently — Unity Catalog and Snowflake’s data governance policies are improving, but most enterprises overlay an enterprise catalog and lineage system for end-to-end visibility. 5. Balance flexibility with discipline — notebooks in Databricks and proprietary SQL in Snowflake both accelerate productivity, but require standards for reuse and portability. 6. Control costs — serverless options help reduce friction, but workload isolation and usage policies are essential for predictable spend. 7. Plan for evolution — both platforms are converging in features. Designing around open formats keeps the door open for shifting workloads over time. Every enterprise makes different trade-offs, but Janus Henderson’s experience shows how a unified architecture can let Databricks and Snowflake complement one another — rather than forcing a choice between them. ▶️ Watch the full replay here: https://lnkd.in/gQmjxnxz
35 Comments
Like Comment
To view or add a comment, sign in
Jacqueline Cheong

Co-Founder & CEO at Artie (YC S23)
6d
Report this post
High-throughput pipelines shouldn’t punish your data warehouse. But when append-heavy streams — like events or transactions — all land in one giant table, they do. Merges slow down. Costs spike. Performance becomes unpredictable. That’s why we built Soft Partitioning. Artie now supports logical, time-based partitioning at the ingestion layer. Instead of endlessly appending to a single target table, Artie automatically routes rows into time-sliced partitions — like 𝘶𝘴𝘦𝘳_𝘦𝘷𝘦𝘯𝘵𝘴_𝟚𝟘𝟚𝟝_𝟘𝟠, 𝘶𝘴𝘦𝘳_𝘦𝘷𝘦𝘯𝘵𝘴_𝟚𝟘𝟚𝟝_𝟘𝟡, etc. — while maintaining a unified view (𝘶𝘴𝘦𝘳_𝘦𝘷𝘦𝘯𝘵𝘴) that queries seamlessly across all partitions. The best part? It works independently of your destination’s native partitioning — so performance stays predictable across Snowflake, Redshift, or Databricks. Why teams love it: ✅ Steady, predictable writes for high-volume streams ✅ Lower merge and update costs by limiting heavy work to recent slices ✅ Full lifecycle control — prune, compact, or roll up old partitions ✅ Unified querying experience across all partitions A huge step toward making real-time replication truly warehouse-friendly. 📖 Full changelog in the comments.
2 Comments
Like Comment
To view or add a comment, sign in
ASYVA InfoTech

491 followers
1w
Report this post
LinkedIn Post: "Data Vault: The Architecture Built for Change" 🧱 Why Data Vault Is (Still) a Powerful Modeling Approach In fast-changing data environments, traditional modeling techniques — like 3NF or even star schemas — often fall short. That’s where Data Vault shines. It was designed to solve a real problem: 🔄 How do you model data that is constantly changing — without breaking everything? 💡 Core Concepts of Data Vault: Hubs → represent unique business keys Links → define relationships between hubs Satellites → store context (descriptive data + history) ✅ Key Advantages: Scalability for large, evolving datasets Historical tracking built-in Auditability and lineage support Fits well with modern ELT & cloud data lakes 🧰 Tools like dbt, Snowflake, and Databricks are making Data Vault more approachable and scalable than ever. 🔍 But it's not perfect: It requires discipline in modeling Querying across multiple joins can get complex Not always the best fit for real-time use cases 🚀 When does it shine? Enterprises needing historical fidelity and regulatory compliance Complex data landscapes with many disparate sources 👇 What’s your experience with Data Vault? Have you used it in production — and if so, what were your lessons learned? #DataEngineering #DataVault #DataModeling #ETL #ModernDataStack #CloudDataWarehouse #Snowflake #Databricks #dbt #ELT #DataArchitecture
Like Comment
To view or add a comment, sign in
Jochem Zwienenberg

Helping Sales Teams Sell Smarter | End-to-End Data | Analytics | AutoML | GenAI | Sales Enablement Specialist
3d
Report this post
This is a great read if you’re into how real-time data and AI are changing the game. Qlik open Lakehouse, powered by AWS and Apache Iceberg, makes the Lakehouse model faster, smarter, and more open than ever. Love seeing how #Qlik is helping shape the future of data architecture.

Building Your Next-Gen Lakehouse with Qlik, AWS, and Apache Iceberg qlik.com
Like Comment
To view or add a comment, sign in
Aljawharah Aldukhayni

Sr.Expert Data Engineer | Data Integration & Advanced Analytics | AI & Big Data Solutions | Data Governance | Automation | CDMP | ITIL | Cloudera
3d Edited
Report this post
❄️ Snowflake and the Modern Data Lake ✨ The traditional data lake has evolved — from a raw data dump into a structured, governed, and analytics-ready ecosystem. At the heart of this shift is Snowflake, enabling organizations to centralize storage, processing, and analytics within a single, scalable cloud platform — built around the medallion architecture: 🔹 Bronze Layer – Raw data lands in its original format (JSON, Parquet, CSV), preserving full history and traceability. 🔹 Silver Layer – Data is cleaned, standardized, and integrated across domains — ensuring reliability and consistency. 🔹 Gold Layer – The trusted, business-ready layer powering dashboards, KPIs, ML models, and strategic analytics. What makes Snowflake powerful: ✅ Separation of storage and compute for elastic scalability and cost control ✅ Native support for structured and semi-structured data ✅ Robust governance with role-based access and security ✅ High performance without complex tuning or indexing This approach transforms the data lake from a simple repository into a cloud-native, intelligent data platform, where data flows seamlessly from Bronze to Gold — accelerating time-to-insight and driving innovation. 💡 Snowflake isn’t just a warehouse in the cloud — it’s the foundation of a modern data ecosystem that evolves as your business grows. #Snowflake #DataLake #DataEngineering #MedallionArchitecture #CloudData #BigData #DataGovernance #Analytics
1 Comment
Like Comment
To view or add a comment, sign in
Koundinya Srinivasarao

Product Adoption and GTM at Databricks
3w
Report this post
Neat writeup / howto by Ben Dunmire on Architecting Databricks SQL for High-Concurrency, Low-Latency Data Warehouse use cases. This new blog reveals frameworks and best practices to deliver production-grade analytics platforms that scale with usage while simultaneously managing cost. Lots of actionable recommendations on compute, data layout and observability for technical buyers.

Architecting a High-Concurrency, Low-Latency Data Warehouse on Databricks That Scales databricks.com
Like Comment
To view or add a comment, sign in
Sai Kumar Kunchapu

Snowflake Solutions Architect | Expert in Data Engineering, BI & Data Science Integration | Driving Cloud-Native Architecture & Cost-Optimized Data Platforms
2w
Report this post
Snowflake is no longer just a data warehouse — it’s becoming the operating system for modern data engineering. One area I see many teams struggling with is balancing performance, governance, and cost at scale. A few hard-earned lessons from real-world implementations: 🔶 RBAC done right matters more than you think Don’t just replicate org hierarchies — design role layers (functional, environment-based, object-level) that make onboarding & audits effortless. 🔶 Pipelines ≠ observability Building ingestion pipelines is easy. Knowing why a flow failed, where latency builds up, and how freshness impacts KPIs is where the real challenge lies. Observability frameworks (telemetry models + golden KPIs + anomaly detection) are the future. 🔶 Compute cost ≠ governance Optimizing warehouses is step one. True cost governance means forecasting future consumption, tracking per-team usage, and simulating workload changes before they hit your bill. 🔶 Lineage is no longer optional As Snowflake integrates with BI, ML, and real-time apps, being able to trace data across flows isn’t “nice to have” — it’s the backbone of trust. Snowflake is evolving fast with features like Cortex AI, Streamlit, and marketplace listings — but without a technical architecture mindset, it’s easy to scale complexity instead of value. 🔮 My take: the next frontier is AI-assisted observability & governance accelerators that will let Snowflake teams predict and prevent issues before they impact business. What’s one Snowflake challenge you’re solving right now — latency, lineage, or cost? 👍 #Snowflake #DataEngineering #CloudArchitecture #Observability #CostGovernance #Lineage
Like Comment
To view or add a comment, sign in

Snowflake Developers

50,566 followers

View Profile Connect

Explore content categories