Digital Economy Dispatches
Posts
Digital Economy Dispatch #183 -- Confronting Technical and Delivery Debt on the Road to AI-at-Scale

Digital Economy Dispatch #183 -- Confronting Technical and Delivery Debt on the Road to AI-at-Scale

Alan Brown
May 12, 2024

Digital Economy Dispatch #183 -- Confronting Technical and Delivery Debt on the Road to AI-at-Scale
12th May 2024

Many organizations have dipped their toes into the world of AI, exploring its potential through pilot projects and proof-of-concepts. These explorations have provided valuable insights and initial successes. However, the true transformative power of AI lies not in isolated experiments, but in its widespread adoption across the organization. This transition from experimentation to exploitation requires a shift in focus. We must move beyond the initial "wow factor" of AI's capabilities and delve deeper into the critical factors that enable its successful adoption, deployment, and integration. This requires an "AI-at-Scale" approach, one that fosters a culture of continuous learning and adaptation throughout the enterprise.

AI-at-Scale acknowledges that AI implementation goes beyond encouraging pilot schemes and demonstrating technical prowess. It encompasses the education and upskilling of the workforce, the reinterpretation of regulations to accommodate this new technology, and the restructuring of processes to leverage AI's decision-making capabilities. Embracing this approach allows organizations to move past the initial experimentation phase and unlock the potential of AI to allow intelligent systems to augment human expertise, drive innovation and support business growth.

Technical Debt in AI-at-Scale

The transition to AI-at-Scale, while promising, faces a significant roadblock: technical debt. Just as in traditional software development, technical debt in AI refers to the accrued cost of taking shortcuts during the initial stages of development and not paying sufficient attention to the evolving usage context. These failings prioritize speed and expediency over long-term maintainability and scalability. Technical debt in AI manifests in three key areas:

Existing Technical Debt: Organizations often embark on AI projects without addressing underlying technical issues within their software infrastructure. Incompatibility with new tools, outdated programming languages, and inflexible architectures can create significant hurdles when moving toward AI-at-Scale.
Data Debt: The foundation of any AI system is its data. Low-quality data, riddled with errors or inconsistencies, leads to unreliable models with biased outputs. Incomplete data sets further limit the effectiveness of AI solutions. Addressing data debt requires robust data governance practices, data cleaning procedures, and strategies to collect and integrate comprehensive data sets.
AI-Specific Technical Debt: The rapid evolution of AI tools and infrastructure introduces a new layer of technical debt. Managing a complex ecosystem of AI models, integrating them with existing systems, and ensuring their ongoing functionality present significant challenges. Additionally, the intricate web of rules, algorithms, and models within large-scale AI systems requires meticulous documentation and version control to maintain transparency and facilitate future modifications.

Each of these three areas individually brings challenge and uncertainty for AI projects. Collectively they can completely derail an organization’s efforts to systematic deliver AI-at-Scale.

Exposing the Hidden AI Technical Debt

The implications of technical debt are multifaceted and significantly impact the performance and stability of a software system. Spending time to understand the causes of technical debt is essential. In the context of AI, technical debt can arise from several sources, including algorithmic complexity, data quality issues, and architectural shortcomings. For example, neglecting proper data preprocessing techniques or failing to address biases in training data can introduce technical debt that undermines the performance and fairness of AI models. Similarly, overlooking scalability considerations when designing AI systems can result in technical debt that impedes the deployment and management of AI solutions at scale.

These broad observations are helpful. However, a deeper review is required. For instance, in 2015, Google produced an important paper looking at the hidden technical debt in machine learning (ML) systems, the heart of most AI systems and tools. In their paper, they acknowledge that while ML systems can be developed and deployed quickly, maintaining them over time is challenging and expensive. Beyond the traditional issues found in all software systems, the paper identifies several areas where AI introduces additional concerns:

Hidden Debt: ML systems can incur hidden debt that is difficult to detect because it exists at the system level, not the code level. Traditional abstractions may not apply to ML, making it challenging to isolate and fix problems.
Entanglement: When multiple factors are combined in an ML model, it becomes difficult to improve one aspect without affecting others. This entanglement makes it difficult to make changes to the system.
Data Dependencies: Dependencies on data can be more difficult to manage than code dependencies. Data can change over time, unlike code, which can lead to unexpected issues in the ML system.
Feedback Loops: ML systems can influence their own behaviour over time, creating feedback loops that are difficult to predict and address. This can make the system unstable and unreliable.
Glue Code and Pipeline Jungles: Complex systems often rely on glue code and pipeline jungles, which are custom code solutions to connect different parts of the system. This code can be difficult to maintain and update.
Configuration Debt: The configuration of an ML system can become complex and difficult to manage over time. This can lead to errors and inconsistencies.
External Changes: The real world is constantly changing, and ML systems need to be able to adapt to these changes. This requires ongoing monitoring and maintenance.

These technical concerns raise important challenges for Ai specialists. However, they are by no means the only issues to address when delivering AI-at-Scale.

The Challenge of AI Delivery Debt: An Example from the Defence Sector

Looking beyond the issues of technical debt, it is also important to acknowledge that moving to AI-at-Scale forces an organization to confront other forms of debt: What we will broadly call “AI Delivery Debt”. This comes from the recognition that adoption of AI has implications on many aspects of how digital solutions are procured, managed, governed, maintained, and replaced.

To understand this more, consider the challenges of AI-at-Scale adoption in a UK defence context. A recent report from the Alan Turing Institute examines AI adoption in defence by considering its impact on Defence Lines of Development (DLoD). This is essentially the total lifecycle view of a system from procurement to retirement.

The DLoD comprises nine critical aspects to be addressed throughout the development and sustainment of any military capability. These DLoDs range from training and equipment to organisation and interoperability, with the overarching theme of ensuring seamless integration and effectiveness in operational contexts.

The report discusses the context of largescale adoption of ML capabilities for defence and security applications and highlights the importance of Machine Learning Operations (MLOps) as the basis for a culture, best practices, and processes to streamline ML development, integration, and deployment sustainably at scale. Through this lens, the report points out that the adoption of AI-at-Scale raises critical questions about the alignment of ML capabilities with defence principles, ethical considerations, and operational objectives. In addition, decision-makers must evaluate whether ML is the optimal solution, how it is sustained, and the challenge of obtaining user trust. Similarly, the organisational and personnel DLoD requires considerations regarding role allocation, skills development, and incentivisation to foster a conducive culture for ML integration.

On a practical level, the report also mentions that challenges pertaining to equipment, infrastructure, and training underscore the complexity of deploying and sustaining ML capabilities. From integrating ML with existing systems to managing hardware resource requirements and ensuring adequate training data, each aspect requires careful attention and planning.

By considering this defence context, we can observe that AI adoption highlights several important issues that take us beyond technical concerns. All of these require detailed review of current ways of working and corresponding adjustments to existing practices.

How to Reduce the Debt

Effective management of technical debt is crucial for ensuring the resilience and maintainability of AI systems, especially as organizations strive to deploy AI-at-scale. By proactively identifying and addressing technical debt, AI teams can mitigate the risk of system failures, optimize performance, and enhance the overall reliability of AI solutions.

Furthermore, to move to AI-at-Scale requires a proactive approach to AI technical debt to build a culture of continuous improvement and innovation, enabling organizations to adapt swiftly to changing market dynamics and maintain a competitive edge in the rapidly evolving landscape of AI technology. In addition, for today’s digital leaders, addressing AI delivery debt is critical.

Several actions must be taken to reduce the effects of AI debt and support rapid transition to AI-at-Scale:

Conduct an AI technical debt audit: Acknowledge technical debt across various areas - infrastructure, data quality, and AI-specific complexities. Digital leaders can initiate an audit to assess the current state of their AI projects and identify existing technical debt. This will help prioritize areas for improvement and inform future resource allocation.
Invest in data governance and MLOps practices: Focus on building clean, high-quality data and well-managed AI systems for successful AI-at-Scale implementation. Digital leaders can invest in data governance practices to ensure data quality and build robust data pipelines. Additionally, establishing Machine Learning Operations (MLOps) practices will help manage and maintain AI models throughout their lifecycle, addressing AI-specific technical debt.
Develop a comprehensive AI delivery strategy: Face up to "AI Delivery Debt," and the broader challenges associated with integrating AI across the organization. Digital leaders can address this by developing a comprehensive AI delivery strategy. This strategy should consider not just technical aspects but also ethical considerations, workforce training needs, and alignment with overall business objectives. The UK defence sector example highlights the importance of examining how AI adoption impacts existing workflows and processes throughout the system lifecycle.

In contrast, ignoring AI debt can lead to a crippling domino effect. Integration issues, unreliable data, and poorly managed AI models can snowball, hindering performance, increasing maintenance costs, and ultimately derailing AI initiatives. By acknowledging and proactively addressing these forms of AI debt, organizations can pave the way for a smooth and successful transition to AI-at-Scale.