Navigating Your Data Landscape ...

Navigating Your Data Landscape ...

My article "Making Your Data Strategy More Effective" shared the analogy of a Data Strategy being like an orchestra: the instruments are the Data Landscape capabilities while the music is the business value delivered by the capabilities.

My vision here is to enable improvements in the creation of Data Strategies by commoditising large portions of the Data Strategies and sharing useful principles for the remainder. In doing so, less time can be spent describing the capabilities and more time spent describing how the value chains of capabilities will deliver the business value.

I have reviewed many Data Strategies and have created three Wardley Maps to describe the Data Landscape of capabilities, together these form a commoditised framework.

Simon Wardley created the concept of the Wardley Map. His maps all have the same two axes, chosen for their importance:

  • Vertically: a value chain of capabilities, with the users at the top and the least visible capabilities at the bottom
  • Horizontally: the evolution of the capabilities from 'Genesis' to 'Commodity'

The three Data Landscape Wardley Maps in this article should be useful for those working with Data Strategies because:

  1. Without the clarity of maps, people may appear to agree but these agreements will be brittle. Each organisation will therefore benefit with one set of maps; a primary aim of these maps should be to achieve and document real consensus.
  2. Regular reviews of data progress can use the maps as a common framework
  3. The delivery approach per capability can be guided by the horizontal axis: Genesis and Custom-Built delivery will efficiently enable the creation of high-risk capabilities (using, for example, the Agile methodology) whereas the delivery of Commodity capabilities should be tightly controlled (using, for example, Lean Six Sigma). This logic can be extended to engaging third parties where each work bundle should be at a similar evolutionary stage.
  4. They represent a taxonomy, which can be used to track budget and expenditure; this is particularly useful once the capabilities have been categorised as 'offensive' or 'defensive'
  5. With ChatGPT, it's possible to cross-check a Data Strategy against the maps; a Venn diagram output will reveal the capabilities in the maps but not in the Data Strategy, and vice versa.

I started with one Wardley Map for the whole Data Landscape but as I reviewed more and more Data Strategies, and added extra capabilities, it became too busy. I've therefore settled on three Wardley Maps for the Data Landscape, one that covers all types of data, one for operational data, and one for analytical data.

As described in my previous article, the interlock between data and the business is pivotal. Achieving success here requires focus across the Wardley Maps:

  • Operational, Analytical and Master Data: the Community of Practice and Data OKRs drive up Data Literacy through Continuous Improvement in the Data Culture
  • Operational, Analytical and Master Data: Data Owners should be in the business
  • Operational Data: Improving Data Quality should be owned by the business
  • Analytical Data: because trust in the insights is so important, Self Service by the business for Data Visualisation is preferred
  • Analytical Data: The Analytical Outcomes seed business owned Value Storytelling and Comms

Wardley Map #1: All Types of Data

Article content

The first of the three Wardley Maps includes the capabilities that are common to all types of data.

The Data Strategy is driven by outcomes defined by the Business Strategy, and has clear Executive Sponsorship. It decomposes via the Data Target Operating Model into a series of Capabilities. Many of these capabilities have four sub-capabilities: a tool, the configuration of the platform, the payload and the processes captured as Ways of Working.

Each of the major capabilities can become a Centre of Excellence (CoE), e.g. Integration & Transformation CoE, Master Data Management CoE. I have seen Digital Transformations both with and without CoEs. Those without typically make rapid progress in the short term but the lack of standardisation leads to increased TCO and lower development velocity in the medium term. It's worth noting that once workstreams have been contracted out to different third parties, it's commercially difficult and inefficient to extract common functions into CoEs. Therefore CoE decisions should be made as part of the Data Strategy, well before contracts are agreed.

Source System data fragmentation is common. Data fragmentation is the inability to relate similar data entities across multiple Source Systems. Master Data Management (MDM) defragments an enterprise's data by creating and maintaining mappings between similar data entities, the so-called 'golden record'. If your enterprise has multiple overlapping Source Systems without MDM then the downstream consequences will be seen across the enterprise:

  • Centralised reporting may exist but this will be constrained to Source System reporting as opposed to organisation-level reporting. It's worth considering this point before investing in a Data Platform.
  • The number of customers or assets will be an opinion as opposed to a trusted fact. A single view of the customer or an asset will be missing, and so Board or Regulatory reporting will be compromised
  • Customers may be complaining because their discounts are insufficient, or because they receive multiple copies of the same correspondence
  • Asset maintenance costs may be too high because a one size fits all maintenance schedule has had to be adopted

For organisations with scale, the Data Strategy should always include MDM.

Data Observability is the active monitoring of data flows across all types of data. It should produce near real time datasets that include record counts, response times and return statuses. These datasets can then be monitored through manually defined thresholds or ML derived thresholds. I have implemented Data Observability to improve integrity:

  • operational data: we tapped an API feed in front of a Cloud Fabric and were able to see emerging incidents hours before the Monitoring and Control Centre - with their synthetic transactions - raised the incident. This approach is much the same as IoT monitoring for proactive maintenance.
  • analytical data: careful design of the ELT pipelines in a Data Platform produced datasets that once monitored allowed pipeline incidents to be clearly seen. In turn this allowed us to squeeze out unexpected failures such as Oracle caching time outs.

The array of capabilities clustered around the Data Governance Framework are a good example of defensive capability. Within this framework, data ownership by the business is particularly important:

  • The Data Models should define the entities for Data Ownership
  • Ownership of each entity should be established over the complete Data Landscape. This obviously implies ownership of the data entity in the Source Systems but also ongoing ownership once the entity has been extracted into a Data Lake, processed into a Data Warehouse, reported on, extracted from the reports etc. This E2E thinking will need to figure in the Data Strategy and impact the data platform design.
  • Being a Data Owner or a Data Steward takes time, expertise and tools. These are the realities of Data Governance, and imply that a considered approach is needed. The Data Strategy should describe the principles to ensure that the nominated individuals have the spare bandwidth, the data maturity and access to named tools to allow them to understand their data (noting that, in many Source Systems, the data is so normalised and abstracted that a simple dump of the data won't be helpful)

Large Language Models (LLMs) are an emerging SaaS capability that will impact many aspects of the Data Landscape. LLM foundations are the Generative Models which can be localised through training. Because of the likelihood of bias in the training data, Gen Model Compliance will protect organisations from the downstream consequences. LLM value is surfaced through Conversational AI (e.g. ChatGPT) and Intelligent Agents, many of which will be embedded in other products.

It will be interesting to see how LLMs impact the Data Landscape. One parallel could be the automation in civil passenger aircraft which has seen the role of the pilot move from hands-on to hands-off; a modern pilot is now primarily a monitor of systems. This abstraction has made flying more safe on average, though there are still accidents when the pilot-systems handshake breaks down, the pilots lose situational awareness, become thoroughly confused and unexpected consequences follow.

Wardley Map #2: Operational Data

Article content

Operational Data is the backbone of data in an enterprise. Operational Data is mastered across the Source Systems, Document Management systems and Record Management systems.

Data Quality (DQ) is the unsung hero of data value. If an organisation has poor DQ then all downstream data activities - including LLM work, data insights and Digital Twins - will be compromised. Poor DQ is insidious because the negative effects may not be immediately obvious but can gradually lead to serious problems such as imperfect decisions, reduced efficiency, eroded trust, increased costs and regulatory risk. Because the converse of these consequences often represent the business value of a Digital Transformation, Digital Transformations should define 'adequate DQ' and include the mechanisms to achieve this.

DQ is primarily an Operational issue - rather than an Analytical issue - and DQ should be fixed where the data is mastered. Improving DQ is a broad and deep challenge comprising:

  • Data Modelling to understand where to start (the 'Key Data Elements' tracked by Metadata Management)
  • Cleansing the current data in the Source Systems
  • Process Modelling to understand the as-is processes that have produced the poor DQ
  • Continuous Improvement applied to processes to fix the root causes of poor DQ

The implications of DQ should limit the complexity of Source System data models and hence the scope of Source Systems. 'Less really is more' in that with unbounded scope (and hence large numbers of attributes per entity), Source System DQ may be unmanageable.

Digital Transformation value is often associated with automation and hence having less people to do the same amount of work. This value is delivered through three capabilities: Business Process Automation, Operational Data Store (ODS) and Data Digitisation.

The Data Strategy should make explicit decisions around the key flows of data across an enterprise, e.g. should all the Operational data be cached in an ODS, should the ODS be centralised or specialised per domain, should the ODS flush through to the Data Lake.

The need for Data Digitalisation is related to the fact that E2E value chains may not be all digital. This is particularly true when the data lifecycles are measured in years, e.g. for large construction projects. Having led the delivery for a digitalisation CoE, I have seen how complicated, expensive and slow this digitalisation can be. Given this reality the Data Strategy should extend upstream and assert the importance of digital E2E value chains without the need for digitalisation.

Digital Twins are essentially simulations of real world systems. They are a specialised example of applications and will need accurate input data for their outputs to be trusted and useful. Though their immediate data source may be the ODS, Digital Twins may depend upon new data feeds from Source Systems.

Wardley Map #3: Analytical Data

Article content

The Business outcomes are delivered across four classes of analytics:

  • descriptive analytics: what has happened ?
  • diagnostic analytics: why has it happened ?
  • predictive analytics: what may happen ?
  • prescriptive analytics: what action to take ?

These outcomes represent the primary offensive capabilities across an Enterprise and are delivered by the value chain of Data Integration, Data Products, ML Analytics, Data Visualisation. The Data Strategy should describe how this value chain will deliver specific business value, and also match emerging Data Maturity (notably trust) to the classes of analytics.

The value chain of Data Integration -> Data Products -> ML Analytics -> Data Visualisation should be a good example of the difference between platform and payload:

  • The platform is the out-of-the-box functionality that the platform provider has created in order to facilitate the payload, plus the locally created configuration e.g. scripts
  • The payload is the data flowing along the pipeline

3D printing is a good example of this distinction, 3D printers (= the platform) are configured through the submission of digital designs to deliver the physical products (= the payload)

Success criteria for the pipelines should be defined in the Data Strategy and will include the obvious e.g. functionality, integrity, availability, outcomes and the less obvious but externally important to the business i.e. development-time-to-value. Development-time-to-value will be delivered by patterns, automation, abstraction; its importance will only be experienced in BAU but the need to optimise for this (along with the framework to measure & improve) should be baked into the Data Strategy, product evaluation and the design of the pipelines.

The red capabilities in the diagram represent forecasts of how capabilities can further commoditise. In the case of ML Analytics this references low code and no code platforms that through abstraction allow business users to directly own and configure use cases. I've seen how these platforms enable much higher development velocity, but also how over-dependence can lead to bill shock and downstream projects to port the work back to the mainstream platforms.

The Data Strategy should describe how the low code / no code and the mainstream platforms are wrapped into a single Way of Working. Gartner's Pace Layered Model could be useful here: 'Systems of Innovation' would use the low code / no code platforms to quickly understand the problem by understanding the data, and to experiment, whilst 'Systems of Differentiation' would leverage the learnings from the Systems of Innovation for configuring the mainstream platforms. This point is a good example of how the Data Strategy should be shaped for learning, and reducing development-time-to-value.

Conclusions

This article has described ways of thinking that should enable Data leaders to navigate their Data Landscapes and craft Data Strategies that are clear, actionable and effective.

In navigating the Data Landscape it's essential to go beyond listing capabilities and instead focus on how those capabilities come together to deliver real business value. This is much like an orchestra, where the instruments are your data capabilities and the music is the business value they collectively produce.

By applying Wardley Mapping, capability descriptions can be commoditised, and delivery methods can be aligned to the maturity of each capability, The three Wardley Maps presented - covering all data, operational data, and analytical data - offer a reusable framework to support strategic alignment, drive consensus, guide delivery and improve Data Strategies.

Core principles such as embedding Data Owners in the business, treating Data Quality as an operational concern, and implementing foundational elements like MDM and Data Observability are critical for enabling trust and performance.

As the landscape evolves with emerging technologies, like low-code platforms and LLMs, it's vital to manage innovation alongside differentiation and core systems, something well-supported by Gartner's pace-layered model. Above all, successful Data Strategies must optimise for outcomes and development-time-to-value, ensuring that data capabilities not only exist but deliver measurable and sustained impact across the organisation.


To view or add a comment, sign in

More articles by Andrew Johnson

  • Making Your Data Strategy More Effective

    As a data programme manager I have led multiple data projects and programmes, often within the context of broader…

    2 Comments

Others also viewed

Explore content categories