Navigating Your Data Landscape ...
My article "Making Your Data Strategy More Effective" shared the analogy of a Data Strategy being like an orchestra: the instruments are the Data Landscape capabilities while the music is the business value delivered by the capabilities.
My vision here is to enable improvements in the creation of Data Strategies by commoditising large portions of the Data Strategies and sharing useful principles for the remainder. In doing so, less time can be spent describing the capabilities and more time spent describing how the value chains of capabilities will deliver the business value.
I have reviewed many Data Strategies and have created three Wardley Maps to describe the Data Landscape of capabilities, together these form a commoditised framework.
Simon Wardley created the concept of the Wardley Map. His maps all have the same two axes, chosen for their importance:
The three Data Landscape Wardley Maps in this article should be useful for those working with Data Strategies because:
I started with one Wardley Map for the whole Data Landscape but as I reviewed more and more Data Strategies, and added extra capabilities, it became too busy. I've therefore settled on three Wardley Maps for the Data Landscape, one that covers all types of data, one for operational data, and one for analytical data.
As described in my previous article, the interlock between data and the business is pivotal. Achieving success here requires focus across the Wardley Maps:
Wardley Map #1: All Types of Data
The first of the three Wardley Maps includes the capabilities that are common to all types of data.
The Data Strategy is driven by outcomes defined by the Business Strategy, and has clear Executive Sponsorship. It decomposes via the Data Target Operating Model into a series of Capabilities. Many of these capabilities have four sub-capabilities: a tool, the configuration of the platform, the payload and the processes captured as Ways of Working.
Each of the major capabilities can become a Centre of Excellence (CoE), e.g. Integration & Transformation CoE, Master Data Management CoE. I have seen Digital Transformations both with and without CoEs. Those without typically make rapid progress in the short term but the lack of standardisation leads to increased TCO and lower development velocity in the medium term. It's worth noting that once workstreams have been contracted out to different third parties, it's commercially difficult and inefficient to extract common functions into CoEs. Therefore CoE decisions should be made as part of the Data Strategy, well before contracts are agreed.
Source System data fragmentation is common. Data fragmentation is the inability to relate similar data entities across multiple Source Systems. Master Data Management (MDM) defragments an enterprise's data by creating and maintaining mappings between similar data entities, the so-called 'golden record'. If your enterprise has multiple overlapping Source Systems without MDM then the downstream consequences will be seen across the enterprise:
For organisations with scale, the Data Strategy should always include MDM.
Data Observability is the active monitoring of data flows across all types of data. It should produce near real time datasets that include record counts, response times and return statuses. These datasets can then be monitored through manually defined thresholds or ML derived thresholds. I have implemented Data Observability to improve integrity:
The array of capabilities clustered around the Data Governance Framework are a good example of defensive capability. Within this framework, data ownership by the business is particularly important:
Large Language Models (LLMs) are an emerging SaaS capability that will impact many aspects of the Data Landscape. LLM foundations are the Generative Models which can be localised through training. Because of the likelihood of bias in the training data, Gen Model Compliance will protect organisations from the downstream consequences. LLM value is surfaced through Conversational AI (e.g. ChatGPT) and Intelligent Agents, many of which will be embedded in other products.
It will be interesting to see how LLMs impact the Data Landscape. One parallel could be the automation in civil passenger aircraft which has seen the role of the pilot move from hands-on to hands-off; a modern pilot is now primarily a monitor of systems. This abstraction has made flying more safe on average, though there are still accidents when the pilot-systems handshake breaks down, the pilots lose situational awareness, become thoroughly confused and unexpected consequences follow.
Wardley Map #2: Operational Data
Operational Data is the backbone of data in an enterprise. Operational Data is mastered across the Source Systems, Document Management systems and Record Management systems.
Data Quality (DQ) is the unsung hero of data value. If an organisation has poor DQ then all downstream data activities - including LLM work, data insights and Digital Twins - will be compromised. Poor DQ is insidious because the negative effects may not be immediately obvious but can gradually lead to serious problems such as imperfect decisions, reduced efficiency, eroded trust, increased costs and regulatory risk. Because the converse of these consequences often represent the business value of a Digital Transformation, Digital Transformations should define 'adequate DQ' and include the mechanisms to achieve this.
DQ is primarily an Operational issue - rather than an Analytical issue - and DQ should be fixed where the data is mastered. Improving DQ is a broad and deep challenge comprising:
The implications of DQ should limit the complexity of Source System data models and hence the scope of Source Systems. 'Less really is more' in that with unbounded scope (and hence large numbers of attributes per entity), Source System DQ may be unmanageable.
Digital Transformation value is often associated with automation and hence having less people to do the same amount of work. This value is delivered through three capabilities: Business Process Automation, Operational Data Store (ODS) and Data Digitisation.
The Data Strategy should make explicit decisions around the key flows of data across an enterprise, e.g. should all the Operational data be cached in an ODS, should the ODS be centralised or specialised per domain, should the ODS flush through to the Data Lake.
The need for Data Digitalisation is related to the fact that E2E value chains may not be all digital. This is particularly true when the data lifecycles are measured in years, e.g. for large construction projects. Having led the delivery for a digitalisation CoE, I have seen how complicated, expensive and slow this digitalisation can be. Given this reality the Data Strategy should extend upstream and assert the importance of digital E2E value chains without the need for digitalisation.
Digital Twins are essentially simulations of real world systems. They are a specialised example of applications and will need accurate input data for their outputs to be trusted and useful. Though their immediate data source may be the ODS, Digital Twins may depend upon new data feeds from Source Systems.
Wardley Map #3: Analytical Data
The Business outcomes are delivered across four classes of analytics:
These outcomes represent the primary offensive capabilities across an Enterprise and are delivered by the value chain of Data Integration, Data Products, ML Analytics, Data Visualisation. The Data Strategy should describe how this value chain will deliver specific business value, and also match emerging Data Maturity (notably trust) to the classes of analytics.
The value chain of Data Integration -> Data Products -> ML Analytics -> Data Visualisation should be a good example of the difference between platform and payload:
3D printing is a good example of this distinction, 3D printers (= the platform) are configured through the submission of digital designs to deliver the physical products (= the payload)
Success criteria for the pipelines should be defined in the Data Strategy and will include the obvious e.g. functionality, integrity, availability, outcomes and the less obvious but externally important to the business i.e. development-time-to-value. Development-time-to-value will be delivered by patterns, automation, abstraction; its importance will only be experienced in BAU but the need to optimise for this (along with the framework to measure & improve) should be baked into the Data Strategy, product evaluation and the design of the pipelines.
The red capabilities in the diagram represent forecasts of how capabilities can further commoditise. In the case of ML Analytics this references low code and no code platforms that through abstraction allow business users to directly own and configure use cases. I've seen how these platforms enable much higher development velocity, but also how over-dependence can lead to bill shock and downstream projects to port the work back to the mainstream platforms.
The Data Strategy should describe how the low code / no code and the mainstream platforms are wrapped into a single Way of Working. Gartner's Pace Layered Model could be useful here: 'Systems of Innovation' would use the low code / no code platforms to quickly understand the problem by understanding the data, and to experiment, whilst 'Systems of Differentiation' would leverage the learnings from the Systems of Innovation for configuring the mainstream platforms. This point is a good example of how the Data Strategy should be shaped for learning, and reducing development-time-to-value.
Conclusions
This article has described ways of thinking that should enable Data leaders to navigate their Data Landscapes and craft Data Strategies that are clear, actionable and effective.
In navigating the Data Landscape it's essential to go beyond listing capabilities and instead focus on how those capabilities come together to deliver real business value. This is much like an orchestra, where the instruments are your data capabilities and the music is the business value they collectively produce.
By applying Wardley Mapping, capability descriptions can be commoditised, and delivery methods can be aligned to the maturity of each capability, The three Wardley Maps presented - covering all data, operational data, and analytical data - offer a reusable framework to support strategic alignment, drive consensus, guide delivery and improve Data Strategies.
Core principles such as embedding Data Owners in the business, treating Data Quality as an operational concern, and implementing foundational elements like MDM and Data Observability are critical for enabling trust and performance.
As the landscape evolves with emerging technologies, like low-code platforms and LLMs, it's vital to manage innovation alongside differentiation and core systems, something well-supported by Gartner's pace-layered model. Above all, successful Data Strategies must optimise for outcomes and development-time-to-value, ensuring that data capabilities not only exist but deliver measurable and sustained impact across the organisation.