The Medallion architecture wave is here once again... and it still sucks. It solves problems that most data teams don't actually have. Here's three reasons why this architecture leads to more pain than progress: 1. The layer names aren't informative Make intent clear and just use meaningful labels: ➜ Instead of Bronze use Raw or Historical ➜ Instead of Silver, use Processed or Transformed ➜ Instead of Gold, use Serve, Curated or Operational This change makes data lineage and onboarding so much better! 2. Data should not land in Bronze This is risk for risk's sake. ➜ If you're ingesting untrusted data, use a dedicated landing zone. ➜ It protects your infrastructure from compromised accounts or bad actors. Secure-by-Design is particularly relevant to highly regulated domains. 3. You (likely) don't need exactly 3 layers Architecture should serve your data, not restrict it. ➜ If you only need two layers, then just use two. ➜ If your use cases need more, add more. This makes your engineer's job much easier, they will thank you. Medallion isn't best practice. It's just a pattern. Stop forcing your data to fit this pattern. Let your use cases, costs, security and governance drive your architecture. Do you actually use Bronze, Silver, and Gold layers? - I use a layered architecture all the time - just not Medallion.
All good points and I’ll add: It’s not an architecture; It’s just some labels for stages in a process.
My preferred interview question (both for candidates and clients 😂) "Do you really need a Silver Layer ?" (aka "in-the-middle layer").
I am confused. You don't seem to be arguing against the architecture, just the names? And maybe to keep the raw data in a different layer? The reason we use these sorts of things is that when we choose the "best" architecture, this is code for some people to say "we don't need to do all that work" without even realizing what that work is going to mean in a year, or two, or ten. Really hate when we do something like skipping the landing/bronze zone/Kimball prep area and then a few years later we realize that we messed up and forgot something... but we have already thrown own the source because "Oh, we didn't think we would need that." Then we spend our time repairing the mess. Of course, there are cases where it doesn't make sense to alter the pattern, but design/architecture patterns aren't just roadblocks to getting answers or just another way we can argue with people with more experience than us. They typically have merit.
The medialian architecture is popular to bring up because it's actually one of the easiest ways to explain to business managers how we can organize their data and initialize that within the company culture
Medallion Architecture is actually not an architectural design. It is a higher level abstract for pre-sales teams, executives and juniors engineers to have a guideline. But if you are an experienced engineer stopping at “medallion architecture” will make you just sound as everyone else instead of somebody who can really solve data problems. Here is what you have to know if you are an engineer in this data space. https://medium.com/towards-data-engineering/medallion-architecture-is-not-enough-4d9cfde85e8c
We're actively trying to merge bronze and silver by pushing transformations to the source. We've found that this medaillon architecture led us to have a lot of tables that are hard to discover and govern - also we have to do a lot of transformations in the gold models which creates many dependencies etc.
Isn’t that “landing zone” basically what most people already call the Bronze layer in Medallion? It’s where raw, untrusted data lands and we usually run quality checks and some preprocessing there before moving the cleaned-up stuff to Silver. So if we’re already isolating risky data in Bronze, how is that different from a “dedicated landing zone”? Feels like we’re just renaming things rather than solving a new problem. Unless I’m missing something, this sounds more like a naming preference than a real architectural issue.
Who says AI can't generate a post that gets lots of comments?
VP of Data + AI @ zeb | Advisor to Estuary | Databricks Product Advisory Board & MVP / Subscribe @ Youtube.com/@JosueBogranChannel
2dBronze, Silver, and Gold = A term both business and tech folks can wrap their minds around. “Historical”, “Transformed”, etc are much more technical terms. You can have multiple sublayers within each of those layers, it doesn’t matter. Your business person doesn’t need to know how many layers of transformation, etc And to your point, you are right: sometimes you just need one layer or two, and I do see some tech folks confused around this. So, to your question: I don’t use bronze, silver, and gold. I use the terms to describe the work I/our team is doing.