Robert Long, PhD’s Post

View profile for Robert Long, PhD

Data Engineering Consultant | Building Azure Data Platforms that enable Analytics | PhD in Theoretical Geophysics

The Medallion architecture wave is here once again... and it still sucks. It solves problems that most data teams don't actually have. Here's three reasons why this architecture leads to more pain than progress: 1. The layer names aren't informative Make intent clear and just use meaningful labels: ➜ Instead of Bronze use Raw or Historical ➜ Instead of Silver, use Processed or Transformed ➜ Instead of Gold, use Serve, Curated or Operational This change makes data lineage and onboarding so much better! 2. Data should not land in Bronze This is risk for risk's sake. ➜ If you're ingesting untrusted data, use a dedicated landing zone. ➜ It protects your infrastructure from compromised accounts or bad actors. Secure-by-Design is particularly relevant to highly regulated domains. 3. You (likely) don't need exactly 3 layers Architecture should serve your data, not restrict it. ➜ If you only need two layers, then just use two. ➜ If your use cases need more, add more. This makes your engineer's job much easier, they will thank you. Medallion isn't best practice. It's just a pattern. Stop forcing your data to fit this pattern. Let your use cases, costs, security and governance drive your architecture. Do you actually use Bronze, Silver, and Gold layers? - I use a layered architecture all the time - just not Medallion.

Josue “Josh” Bogran

VP of Data + AI @ zeb | Advisor to Estuary | Databricks Product Advisory Board & MVP / Subscribe @ Youtube.com/@JosueBogranChannel

2d

Bronze, Silver, and Gold = A term both business and tech folks can wrap their minds around. “Historical”, “Transformed”, etc are much more technical terms. You can have multiple sublayers within each of those layers, it doesn’t matter. Your business person doesn’t need to know how many layers of transformation, etc And to your point, you are right: sometimes you just need one layer or two, and I do see some tech folks confused around this. So, to your question: I don’t use bronze, silver, and gold. I use the terms to describe the work I/our team is doing.

Adam Machanic

Head of Data | Data Architecture | Data Engineering | Distributed Computing | SQL | Python | Finance

2d

All good points and I’ll add: It’s not an architecture; It’s just some labels for stages in a process. 

Jean-Pierre Riehl

CTO Data | Community-Driven | Innovation-addict | Speaker | Business Geek

2d

My preferred interview question (both for candidates and clients 😂) "Do you really need a Silver Layer ?" (aka "in-the-middle layer").

Louis Davidson

Data architect, technical writer and editor specializing in Relational Databases and all types of technology. Blog located at drsql.link.

2d

I am confused. You don't seem to be arguing against the architecture, just the names? And maybe to keep the raw data in a different layer? The reason we use these sorts of things is that when we choose the "best" architecture, this is code for some people to say "we don't need to do all that work" without even realizing what that work is going to mean in a year, or two, or ten. Really hate when we do something like skipping the landing/bronze zone/Kimball prep area and then a few years later we realize that we messed up and forgot something... but we have already thrown own the source because "Oh, we didn't think we would need that." Then we spend our time repairing the mess. Of course, there are cases where it doesn't make sense to alter the pattern, but design/architecture patterns aren't just roadblocks to getting answers or just another way we can argue with people with more experience than us. They typically have merit.

Daniel Palacios

Your Personal Data Engineer | Helping businesses turn messy data into clean, automated pipelines, real-time insights & scalable cloud systems

2d

The medialian architecture is popular to bring up because it's actually one of the easiest ways to explain to business managers how we can organize their data and initialize that within the company culture

Georgian Pirvu

Helping Enterprises Scale Data & AI on Azure & Databricks | Databricks Champion | 10+ Years in Cloud Architecture | Tech Blog Writer on Medium

2d

Medallion Architecture is actually not an architectural design. It is a higher level abstract for pre-sales teams, executives and juniors engineers to have a guideline. But if you are an experienced engineer stopping at “medallion architecture” will make you just sound as everyone else instead of somebody who can really solve data problems. Here is what you have to know if you are an engineer in this data space. https://medium.com/towards-data-engineering/medallion-architecture-is-not-enough-4d9cfde85e8c

Niek Visscher

Data & AI Leader | Driving Data-First Decision Making | Scalable Platforms & Business Impact

2d

We're actively trying to merge bronze and silver by pushing transformations to the source. We've found that this medaillon architecture led us to have a lot of tables that are hard to discover and govern - also we have to do a lot of transformations in the gold models which creates many dependencies etc.

Thiruvikraman Sridharan

Data Architect / Solution Consultant | Analytics | Databricks, Snowflake | CSPO®

2d

Isn’t that “landing zone” basically what most people already call the Bronze layer in Medallion? It’s where raw, untrusted data lands and we usually run quality checks and some preprocessing there before moving the cleaned-up stuff to Silver. So if we’re already isolating risky data in Bronze, how is that different from a “dedicated landing zone”? Feels like we’re just renaming things rather than solving a new problem. Unless I’m missing something, this sounds more like a naming preference than a real architectural issue.

Like
Reply
Eric Hilton

Skeptic | Technology Leader | Always a Lover | Sometimes a Fighter

2d

Who says AI can't generate a post that gets lots of comments?

See more comments

To view or add a comment, sign in

Explore content categories