Can the data mesh transform your organization?

Bastian Nehrke

Developing Data-Centric Organizations | Data & AI Strategy

Published Dec 20, 2022

'Becoming data-driven' and 'putting data first' have become top-level strategic priorities for decision makers. Even before they became the center of attention, corporate data landscapes have been growing in size and complexity. Today, we see organizations going back and forth in their attempts to either centralize or decentralize data operations and data related responsibilites and investing significant budgets in order to maximize their impact.

The Data Mesh concept offers a number of interesting takes on how to organize corporate data capabilities. In a rather short amount of time it has gathered a substantial number of supporters and has motivated numerous transformation initiatives in international organizations. Irrespective of whether it will prevail as a new paradigm or transition into new concepts, it has already changed the way we think about organizational data sharing today.

How Data flows impact organizational development and complexity

A key challenge of increasingly connected work environments is moving the right data to the right use case. Often, data capture and use case realization are happening in separate locations: physically, technically, and also organizationally.

Renowned data practitioners like Bill Schmarzo have pointed out time after time that without finding, assessing and realizing data use cases, the economic value of data cannot be captured, no matter if these use cases aim at reducing waste, strengthening customer relations, avoiding cybercrime, enhancing brand reputation, or identifying new sales opportunities.

Therefore, successful data monetization hinges on data flowing freely through the organization, from the point of capture or creation to the point of usage in dashboard visualizations, algorithms or applications. The design, monitoring and maintenance of these data flows - no matter if centrally governed or not - creates workloads, which organizations typically respond to by assigning roles and responsibilities.

A common approach to organize multiple data flows is to store relevant data in a Data Warehouse or Data Lake from where it can be managed by a dedicated team of data specialists. By using one-to-many connections, the same data set is made accessible for a variety of use cases and consumers, effectively reducing the overall complexity of a setup with multiple one-to-one connections per data source - or so it might seem.

The well-known breaking point of these approaches is - ironically - that a lot of effort goes into bringing data into this one place, thus reducing complexity in a certain area by creating it somewhere else - a struggle that is similar to a problem in software development known as the law of conservation of complexity, or Tesler's Law.

"Every application has an inherent amount of irreducible complexity. The only question is: Who will have to deal with it — the user, the application developer, or the platform developer?" - Tesler's Law

The ever-increasing complexity quickly and reliably brings centralized Data Units to their knees. The consequences are well known and often observable: Declining service levels and stakeholder satisfaction, delayed progress, and fluctuation among frustrated data architects and engineers.

And the list goes on: Once a central data unit cannot cope with incoming requests and operational tasks anymore, adverse effects occur. In the end, instead of having one governing body streamlining end-to-end data flows, it is not unlikely for organizational silos to emerge as soon dependant stakeholders across the firm run out of patience and begin building own data handling teams and platforms independently of the central unit.

Further complexity is arising on the side of governance and documentation. For many years, the dominant approach to Data Warehouse design focused on upfront data model design to which then the data would be fitted subsequently – a tedious process that required careful scoping. With the rise of the Data Lake an anti-pattern emerged where data was stored - often without any context - in cloud repositories, which simply moved the challenge to the end of the project lifecycle. Both approaches failed to deliver on the promises of data driven value creation (even though the latter might have delivered no value, quicker ;-) ) when time-to-value was lacking. While there have been without any doubt many successful data warehouses as well as data lake initiatives over the years, the challenge of managing complexity remains a constant struggle.

When Zhamak Dheghani in 2019 published her much-noticed take on the Data Mesh paradigm, she struck a chord in many Data Engineers and Data Architects in the way she described these struggles with great accuracy. We will refrain from explaining the entire concept, because rich content is available, however, we will briefly summarize some key elements which are relevant to understand the organizational implications on which want to focus: Data Products, Data Providers and Data Consumers.

Are Data Products the next evolutionary step to Data Assets?

Product Management has been around for 90 years. Large technology corporations like Microsoft and Google have adopted the approach for software development and many industries followed. Gartner recently recommended that any digital capability should be producticed. Martin Erikson located the role of a Product Manager by means of a Venn Diagram at the intersection of business, technology, and user experience:

In 2012 DJ Patil, former United States Chief Data Scientist, generally defined a data product as a product that facilitates an end goal through the use of data. Simon O'Regan then proposed a matrix of different Data Product types in 2018 whereas Justin Gage in 2019 described the Job of Data-as-a-Product teams - as opposed to Data-as-a-Service teams - as difficult to hire for, because they basically have just one, dreary job: Providing data to consumers in the organization.

These are obviously very different definitions of a Data Product, and this is something to keep in mind when discussing concepts like the Data Mesh.

Still, we will not propose another definition, as for assessing the transformational impact all these definitions are suitable: The key is product thinking.

"For a distributed data platform to be successful, domain data teams must apply product thinking [...]; considering their data assets as their products and the rest of the organization's data scientists, ML and data engineers as their customers. - Zhamak Dheghani

Data products can be a good vehicle to overcome the discussed challenges. Even though no formally agreed definitions exist, one could argue that 'Data Products' focus more on sharing and using data whereas the term 'Data Asset' implies inherent value just from storage or ownership.

Considering all the different definitions and approaches, one thing is for sure: The first step of working with Data Products is to actively create a working definition within the organization.

If Data Product Managers throughout the organization sought to understand the needs of their internal customers, monitored the 'market' for available data, designed their Data Products for an optimized user experience and then made them technically addressible and findable in a format that could be easily consumed, managing complexity would become a team effort.

Again, this does not immediately solve the problem, but it spreads the work on many shoulders. It also shifts the attention to another challenge: Why would Data Product Managers do this in a decentralized manner, without being explicitly instructed?

Data Providers are a long neglected group of stakeholders

It is probably safe to say that only a fraction of Data Providers in organizations today consider themselves Product Managers. On the contrary, the typical Data Provider tasked with data collection, storage and distribution has good reasons not to be too transparent about their data, as Data Products create without any doubt additional work: There are documentation duties, compliance standards, ad-hoc access requests, data-related questions, incidents with already established data flows and many other tasks, not even considering the pro-active parts of the job, like advertising and research. This is, by the way, true for both, dedicated Data Product Managers officially labelled as such as well as Data Providers who actually have a completely different role and are just assuming data provisioning responsibilites as a side-task.

Organizationally, data providers are often situated in IT departments, following a long tradition of separating cost centers from what the organizations perceived as value generating core competencies. This is just a personal observation and we could not find a suitable study for this claim, but the cost pressure on IT side appears to be always a bit heavier, creating more reason to push away additional tasks compared to the business side.

The difference between Data Providers and Data Product Managers is very similar to the one between Data Assets and Data Products: Product Managers have an active interest in their customers, even potential ones, aim at maximizing value instead of strictly following the principle of minimum effort and generally act with more autonomy.

Data Mesh helps pinpointing the fundamental challenge

While the fundamental problem has certainly been around for a while, Zhamak Dheghani shed new light on the relevance of Data Providers. To us, much of the transformative potential of the Data Mesh can be attributed to an increased awareness of this particular role.

Before you say "But Data Providers have always been important!" think of which group of data stakeholders has received more attention in recent years: Initiatives aimed at faster access to data, boosting self-service capabilities, increased stability, better oversight of data assets, certainly aimed at bettering the situation of Data Consumers and resulted in additional responsibilities for Data Providers. The correct and important conclusion of use cases being the driver of data monetization might have contributed to that development as much as the fact that Data Engineering is something that - like programming - is understood by few and happens 'under the hood' whereas the benefits of a data-driven use case are typically picked up by everyone with ease.

As a consequence, departments that happened to be technically responsible for relevant data were faced with additional tasks and standards: data protection guidelines, internal compliance policies, data catalogs that came with documentation duties, onboarding requests to data sharing platforms, and even legal audits.

With the spotlight on Data Meshes and more importantly on Data Products, Data Providers are finally receiving the attention they deserve as the key element of a data-driven organization. Realizing this also raises the question of motivation. As long disclosing potentially relevant Data Assets to the rest of the organization to support the creation of Data Product primarily results in additional tasks, relatable constraints remain.

Supporting the Data Mesh Transformation

TLDR: Yes, the Data Mesh has the potential of transforming organizations.

How to lift it? The Data Mesh cannot be bought from a technology vendor, because it is not a technology capability or a tool. But the underlying technology concepts are vital to enable the actual transformation. This is true for what Zhamak Dheghani calls computational governance, but even more for providing incentives to Data Providers.

How to address these challenges in particular and what means exist for creating incentives to disclose and share Data Products, we will continue to discuss in a following article.

About the Authors:

Jens Kretzschmar Jens has spent more than 20 years working with data & analytics. He held numerous positions in teams responsible for realizing value from data and is currently workinig in the corporate transformation organization at SAP

Bastian Nehrke Bastian is a Senior Manager at Accenture with a professional focus on data-centric transformation, building pragmatic organizational structures, processes and governance bodies that enable lean data management

Pierre Mousselet

Making Big Data Simple | ML & AI | Solutions & Resources

Thank you Bastian Nehrke and Jens Uwe Kretzschmar for those very interesting insights, especially the importance of being able to balance the workload on all shoulders. This is, I think, a key point to avoid frustration and increase collaboration in the data team...