- Digital Economy Dispatches
- Posts
- Digital Economy Dispatch #168 — AI’s Data Dilemma: Confronting the paradox of poor quality data in the age of AI
Digital Economy Dispatch #168 — AI’s Data Dilemma: Confronting the paradox of poor quality data in the age of AI
Digital Economy Dispatch #168 — AI’s Data Dilemma: Confronting the paradox of poor quality data in the age of AI
28th January 2024
Over the past few years, the digital revolution has resulted in a deluge of data. From sensors and smart devices to social media and financial transactions, data is generated at an unprecedented pace, promising to revolutionize everything from core business processes to public service delivery. And the amount of new digital data being generated shows no sign of slowing up. By some accounts, we are now generating, managing, and sharing hundreds of zetabytes of data per year.
Availability and access to this data is a key component of the current wave of AI adoption. Vast amounts of data are essential for AI algorithms to recognize patterns, distinguish subtle differences, and ultimately make accurate predictions. More data allows them to distinguish the nuances of the domain in focus, learn from past experiences, and refine their decision-making abilities. In broad terms, the more data AI has, the more complex tasks it can tackle, and the closer it gets to replicating human-like intelligence.
Yet, amidst this abundance lies a major dilemma: the quality of this data often falls woefully short of its potential. As a result, despite our best efforts, we find that access to more data isn't a magic wand for unlocking deeper insights. While it might seem counterintuitive, but an abundance of data in some cases makes things worse and leads to information overload. Similarly, low-quality data, rife with inaccuracies or inconsistencies, can result in misleading conclusions. Without proper analysis and filtering, more data can become a noisy distraction, hindering the ability to identify meaningful patterns and trends. It's not the quantity of data that matters, but its quality, relevance, and effective utilization that unlocks the true power of insights.
The Data Quality Discussion
This is a central challenge we, as digital leaders and practitioners, face today. The fuel driving our AI ambitions – data – is often infused with inaccuracies, biases, and inconsistencies. Consequently, this data dilemma poses a significant threat to the responsible and effective deployment of AI, potentially undermining public trust, perpetuating inequalities, and ultimately hindering progress.
To delve deeper into this critical issue, I was delighted to have the opportunity this week to lead a discussion with a panel of experts:
Yvonne Gallagher, Digital Director, UK National Audit Office.
Christine Ashton, Chief Information Officer, UK Research & Innovation.
Stefan Crossfield CEng CMgr, Chief Data Officer and Head of Information Exploitation, British Army.
Rashik Parmar MBE, CEO, British Computer Society.
The discussion brought this panel together to explore several related topics, starting with the difficulties of gaining access to data trapped in legacy systems and technologies. The panellists then paid particularly attention to the evolving landscape of data, highlighting the growing disconnect between the sheer volume of data and its quality. This is highlighted in recent surveys such as Data Orchard’s 2023 review of data maturity in non-profit organizations. They have seen over 6,000 people complete their data maturity assessment, and have three years of data measuring and benchmarking how organisations are performing. Their review show that while organizations are increasingly aware of the importance of data governance and management, recent trends paint a concerning picture:
Diminishing data quality: Fewer organizations report having good quality data, with 57% expressing doubts about its completeness, accuracy, and timeliness in 2022-23, compared to 44% in 2020-21.
Widespread data illiteracy: Staff data literacy is on the decline, with 58% of organizations reporting a lack of data literacy among their employees in 2022-23, compared to 47% in 2020-21.
Erosion of data security confidence: Concerns about data security are rising, with only 51% of organizations expressing confidence in the security of their data in 2022-23, compared to 61% in 2020-21.
These alarming trends are echoed in examinations of the way public sector organizations are approaching the task of managing and maintaining data quality even as it becomes more essential to the task of delivering efficient and effective public services. As Gareth Davies, Head of the NAO, aptly stated in 2019: "Government has lacked clear and sustained strategic leadership on data, and individual departments have not made enough effort to manage and improve the data they hold." Similarly, the UK MOD's 2021 "Data Strategy for Defence" took these concerns one step further by acknowledging the challenges making use of newly acquired data are increasing: “Despite a rising volume of data from our increasing arsenal of sensors, we’re finding it harder than ever to isolate the insight from the information”.
Debating AI’s Data Challenge
The crux of the matter lies in this fundamental question: Are we prepared for the data demands of the AI age? Can we bridge the gap between the data we have and the data we need to power trustworthy, ethical, and impactful AI?
This is where the conversation with the expert panellists began and these questions guided the direction of the discussions we held. Each panellist brought a different perspective to the topic in highlighting the opportunities and challenges we face to deliver responsible and robust AI solutions. While they made many important points in the discussion, a few key points are worth considering.
For Yvonne Gallagher the need to increase access to data tied up in the government’s legacy systems was raised as a key challenge. Over many years, the data being collected and managed has been too often inaccessible and incomplete due to a range to technical, organizational, and financial issues. Improving the focus on opening up and interfacing with this data is critical.
In the experience of Stefan Crossfield, an essential starting point has been to recognize the role of AI in both driving efficiencies and creating operational advantage. In both cases, a focus on data quality is essential to achieve these goals. Introducing standard descriptions of data and defining common ontologies for data has been an important step in this journey. This helps to ensure data can be shared in appropriate and secure ways.
Christine Ashton raised the importance of collaboration across data as central to it future value. In particular, data is a critical asset in connecting different stakeholders within and across organizations. It needs to be managed and supported as a key corporate asset to ensure it is accessible, accurate, and appropriate.
To maintain data quality, Rashik Parmar highlighted that the professionalism of the workforce is an important factor. Ensuring systems and data quality begins with improving the quality and skills of the people responsible for these solutions. It is critical that people developing and maintaining digital solutions are equipped with the right skills, experience, and tools. This is particularly important in safety-critical domains where errors in automated data-driven decision making can have extreme impacts.
These insights provide important lessons for all of us. The conversation on this topic concluded by bringing the focus of attention on several crucial questions that must be addressed by individuals, teams, and organizations as they address AI’s data dilemma:
What is the current maturity level of data and data practices being used to drive operational activities?
What new demands will be placed on data quality in the age of AI? Are we equipped to meet these challenges?
Do we possess the necessary data management processes, governance structures, and skilled workforce to navigate this data landscape effectively?
How can we ensure future data quality to fuel responsible and trustworthy AI development?
Time to Raise Your Data Quality Game
The topic of AI and data quality is a complex and important one. In our recent conversations, we only had time to begin to address several important strands of the discussion. Take a look and listen in to our discussions on these topics.
But, more importantly, engage your own teams and colleagues in this vital conversation. The answers to these questions will determine the course of your digital future. By prioritizing data quality, fostering data literacy, and establishing robust data governance frameworks, we can unlock the true potential of AI, not just for technological advancement and short-term gains, but for the betterment of society as a whole. Let us all rise to the challenge and ensure that the data we have, and the data we generate, becomes the foundation for a thriving, equitable, and responsible AI-powered future.