• Digital Economy Dispatches
  • Posts
  • Digital Economy Dispatch #165 -- AI’s Data Quality Challenge: Data as a Byproduct versus Data as an Asset

Digital Economy Dispatch #165 -- AI’s Data Quality Challenge: Data as a Byproduct versus Data as an Asset

Digital Economy Dispatch #165 -- AI’s Data Quality Challenge: Data as a Byproduct versus Data as an Asset
7th January 2024

As we drive forward with AI-based digital transformation, it is now clear that dealing with data quality will be a key factor in an organization’s success. Data holds immense value as a key asset for understanding customer behaviour, personalizing capabilities to diverse contexts, and enhancing service delivery. Some go as far as to call data “the new oil”.

As a result, in many of my discussions with organizations I hear people talk about how much data they have collected over the years, the inventories of data they are undertaking, and the many ways in which they expect to exploit the data they have amassed. As we enter the latest AI era, organizations now want to use this data to automate decision making in operational scenarios, provide insights to redefine business processes, and feed large language models (LLMS) that sit under Generative AI tools to optimize customer service.

Unfortunately, these organizations must face up to a critical failing. Many of these aspirations about becoming data-driven will not materialize due to the poor quality of their data. While they may have access to large libraries of data collected over many years, much of the data they hold is not appropriate for the kinds of data-driven activities they now want to pursue.

There are many reasons for this, but at the heart of the concerns is a key issue. At some cost, they are recognizing the distinction between data as a by-product and data as an asset: The reality they face is that a lot of this data was collected as a by-product of other activities with little regard for many of its key usage characteristics. Despite best efforts, no-one is in a position to answer basic questions about essential aspects of the data such as its timeliness, completeness, accuracy, provenance, history, and relevance. As Josh Simon’s summarizes in his book “Algorithms for the People: Democracy in the age of AI”, the optimistic picture of AI automating decision making based on vast collections of well managed data is a myth. The reality is a far messier mix of technical and human curating of data involving substantial “data cleaning” to make is fit for purpose. Indeed, “the data sets that are assembled by humans reflect the structures, opportunities, and disadvantages of a very human world”.

Data-Driven Government

To illustrate this issue in more detail, consider the way that data-driven approaches are being pursued in the public sector. As society become increasingly interconnected and complex, the ability to effectively collect, analyze, and utilize data has become a critical factor in ensuring the efficient, equitable, and responsive delivery of public services. Consequently, transforming to digital public services is well underway.

It is now widely accepted that a critical foundation for adoption of AI in the public sector is a mature approach to the collection, management, and use of data. Data is revolutionizing the way governments operate, providing a powerful tool for improving service delivery, enhancing efficiency, and promoting evidence-based policymaking. The accuracy, relevance, and utility of data is recognized as a critical underpinning to achieve the AI-based goals the UK government has announced.

This dependency on data is particularly evident in emergency services such as policing where use of data drives more informed, efficient, and equitable law enforcement strategies. Consider a local police department leveraging data analytics to optimize resource allocation. By analyzing historical crime data, they identify high-incidence areas, enabling them to strategically deploy officers to deter criminal activity and respond promptly to emerging issues. But, only if the data is accurate, reliable, and free from bias.

Unfortunately, it has been found in practice that use of large data stores such as the Police National Database (PND) are fraught with challenges. Not only are there problems with the accuracy and completeness of the data itself, policy and procedural issues often prevent appropriate sharing and use of the data in situations where it would add value. These concerns are not confined to policing. They are seen across the UK government, as reported by the National Audit Office (NAO) who concluded in their report that the UK government needs “to resolve fundamental challenges around how to use and share data safely and appropriately, and how to balance competing demands on public resources in a way that allows for sustained but proportionate investment in data”. 

From Data Byproduct to Data as an Asset

To move forward on the goals organizations have for AI-based digital transformation, understanding the distinctions between data as a byproduct and data as an asset is pivotal is critical for leaders navigating the digital era.

Data as a Byproduct: Traditionally, data was often regarded as a byproduct of day-to-day operations. It was generated incidentally through routine business and operational activities without intentional collection or strategic utilization. This data was typically viewed as an operational residue—a side effect of processes rather than a deliberate resource. In this context, its value was largely unrecognized and underutilized. Typically, there was minimal investment in gathering, managing, updating, and controlling access to such data.

Organizations historically stored this incidental data in silos as individual databases. The cost of managing it was considered an overhead, a necessity for compliance or archival purposes rather than a tool for generating insights. This approach overlooked the latent potential residing within the data. The lack of proactive analysis and strategic integration limited its utility, resulting in missed opportunities for innovation and efficiency gains.

Data as an Asset: In contrast, contemporary perspectives recognize data as a potent asset—a valuable resource that, when harnessed effectively, can revolutionize operations, drive innovation, and enhance decision-making. Especially with advances in the sophistication of AI algorithms, this data is the basis for action. Viewing data as an asset involves intentional collection, organization, and analysis with the goal of extracting actionable insights to guide strategic initiatives.

The paradigm shift towards data-driven decision-making underscores the immense value embedded within datasets. By leveraging advanced analytics, machine learning, and artificial intelligence, organizations can unlock the latent potential of data, extracting meaningful patterns, trends, and correlations.

The Path to Data-Driven Success

A key point here is that organizations with a long history of treating data as a by-product are finding it difficult to switch to viewing data as an asset. Not only are the large collections of data they have available inappropriate for many of the data-driven needs, the associated data management practices are ineffective for ensuring the data is used responsibly.

Digital leaders must recognize this distinction and take advantage of these insights to make informed decisions, optimize processes, predict market trends, and personalize customer experiences. The core of this approach involves placing a priority on 4 areas.

The distinction between data as a byproduct and an asset lies in its strategic utilization. Transforming data from a byproduct to an asset necessitates a shift in mindset and operational approach. This involves:

  1. Investment in Data Infrastructure: Implementing robust systems for data collection, storage, and management is crucial. Cloud computing, data lakes, and advanced analytics tools form the backbone of an effective data infrastructure.

  2. Cultural Embrace of Data: Fostering a culture that values data-driven insights is paramount. Encouraging data literacy and creating cross-functional teams that collaborate on data initiatives can embed a data-driven approach into the organizational DNA.

  3. Strategic Analysis: Deploying sophisticated analytics techniques to extract actionable insights is imperative. Predictive and prescriptive analytics enable organizations to forecast trends, mitigate risks, and identify opportunities proactively.

  4. Monetization and Innovation: Identifying opportunities to monetize data and drive innovation is a hallmark of treating data as an asset. Whether through new products, services, or enhanced customer experiences, data-driven innovation can open new revenue streams.

The Power of Data

Recent experience has highlighted that the digital age demands a fundamental shift in our relationship with data. In the past, data was seen as a byproduct of other activities, not a first-class asset to be nurtured and utilized. Now we're having to deal with the consequences when we try to use this data to drive AI-based decision making.

The delineation between data as a byproduct and an asset signifies a pivotal transformation in organizational thinking. It necessitates a cultural shift towards leveraging insights to drive decisions. By embracing the "asset" mindset, we unlock a world of possibilities, where information guides our decisions, fuels innovation, and shapes our collective future. Leaders who grasp this distinction and cultivate a data-centric culture will drive their organizations towards innovation and competitive success in the dynamic landscape of the digital age.