I challenge you to spend a day without any exposure to news or opinion about Artificial Intelligence (AI). It may be positive, espousing the benefits of a future world run by the technology, or it may be more sanguine describing the dangers to society. But you can’t ignore it.
Like many topics that dominate the media, it has become polarised. I’ve found that a significant number of organisations currently fall into one of two diametrically opposed tribes.
- Adoption at all costs. At one end of the spectrum we have those organisations that are charging in with boundless enthusiasm. “Let’s switch on Copilot and see what happens”. “It’s impossible to visualise the potential solutions from a historical perspective; we need to try it out and see where it leads us”. “Move fast and break things is our dictum”. But what if the thing that is broken is the entire organisation?
- Cautious Paralysis. At the other we have those organisations that want to make sure they fully understand what they are doing before they put the business at risk. They might be choosing to do nothing until the technology has been proven by the early adopters. Or they may be trialling a variety of Proof of Concepts (PoCs) but never getting through Proof of Value (PoV) and into production. They may comfort themselves by dreaming of a future where they can sit smugly saying “I told you so” surrounded by the devastation of early adopters. But the alternative future is one where they have been left behind and are struggling to catch up.
Like most polarised topics, surely the wise move is to chart a course between the two extremes. Tony Blair was famous for his consistent adoption of “The Third Way”1. This article describes a similar approach to AI adoption, and the management of risk on the route to the benefits that lie ahead.
AI or just Generative AI
While the approach being described could, and I’d argue should, be applied to all AI initiatives, including machine learning (ML) and narrow AI, this article focuses more on Generative AI (GenAI). The risks from careless employment of GenAI are greater, and the ease with which it can be adopted – particularly compared to ML – mean it is less likely to have robust governance wrapped around its adoption.
1. What problems are you trying to solve?
Our objective should be tangible, measurable benefits delivered through the employment of AI with a tolerable degree of risk. We want to do this again and again, consistently, and efficiently. One mode of failure would be wasting resources in experiments that never get into production to earn value. Another mode would be the deployment of an initiative into production but with devastating consequences from risks we didn’t consider that have impacted.
To get to that objective though we need to decide what problems we are really trying to solve. It may seem obvious, but I suspect a lot of experimentation at the moment is being led by the technology and not by a genuine business need. What problems does your business have? Would you like to take cost out of your overheads? Or you want to differentiate your market offering with a more personalised customer experience? Perhaps you want to increase your marketing ROI by increasing your content with tailored messages that respond to changing sentiment and market news?
These Mid-sized Hairy Audacious Goals2 (MHAG) are aspirations. The mind should be open and not encumbered by the art of the possible. But MHAGs are too broad or ill-defined to execute. They need to be broken down into bounded, measurable initiatives that can be assessed, triaged and executed on their own. By breaking the MHAGs into a discrete set of initiatives you can keep them loosely coupled so that if one isn’t currently possible the others can proceed.
The remainder of this article focuses on the how to conduct this assessment, triage and execution.
2. Define the solution
You can’t design what you haven’t defined. The objectives of each initiative need to be defined. What are you planning to achieve and how are you going to do it? What part does technology play, what type of technology, what data and information will it need, what will it do to these inputs, and what output will it generate? What part will people play, and what other resources and infrastructure will be involved?
It is also important to define what good looks like. What would a high-quality transactional output or outcome be, and how would you recognise a low quality one? What is the quality threshold? This will be essential when you are doing the PoC and PoV, because you will be able to objectively determine whether the concept can be achieved to the desired level of quality. Don’t stop at the threshold though; also define what the more challenging scenarios or edge cases will be. Getting a solution to calculate 2+2 to prove that they can do mathematics is not going to be a particularly representative test if the business objective is to be able to solve forth order differential equations.
Each initiative, no matter how trivial, is going to require time and resource and will also attract some risk. Just like any business case, the benefits need to be defined to justify the cost, and to be validated in the PoV.
3. What could possibly go wrong?
There will be risks of many different types. It is far better to consider these risks before committing significant resources, than to wait for them to impact in production where the consequences may be dire.
Most risks are manageable, but by considering them before the outset you can decide whether “the juice is worth the squeeze”. Is the likelihood and consequence of a risk impacting too great for the benefits? By understanding the risks up-front you can also design tests in the PoC and PoV to assess whether the safety and quality concerns are likely to manifest themselves.
Model Safety & Quality
We are used to considering technology to be logical, predictable and consistent. If you feed numbers into your calculator you can rely on the answers that are produced. The most you might worry about is whether there are any rounding errors, but the number of significant digits produced mean this is unlikely to be a significant concern. Modern AI systems are not deterministic in this way. The answer you get will be dependent on the way you phrase the answer, the training data that it was taught on, and whether it has learnt from any new data since you asked the question last. This introduces a number of risks to the quality of the output and the safety of using that output to carry out an action.
- Hallucinations. When the model doesn’t have sufficient information to respond to a particular prompt, current GenAI models will invent what it considers the most believable response. Lawyers have been sanctioned by judges for citing fictional case law and precedent that was generated by GenAI3. This is now more widely recognised, but there is not yet a reliable solution to the problem. It is a core characteristic of the way these GenAI models operate4 5. What would be the consequences of a hallucination generated in your initiative?
- Bias. GenAI models are trained on a huge volume of information, primarily from the internet, that has been generated by people over the years (notwithstanding the risk of Model Collapse below). We know that people are biased, whether consciously or subconsciously. Even if the authors didn’t produce biased information, the sheer volume of material carries biases in its own right. For instance, far more material is available on the internet about some demographics than others. Certain demographics are more likely to be portrayed in a particular light than others. These biases can flow through into the conclusions that the models produce. What would be the consequences to your business if it an initiative acted based on biased outputs?
- Model Collapse. GenAI models produce progressively lower quality outputs when they are taught on the output from AI models. This concept, called Model Collapse, has been researched and proven6. Over time more content will be published that has been AI generated and as this becomes a greater proportion of the total training data used by models there is a risk that this starts to undermine the models themselves. Similar in effect to hallucinations and bias, it has the potential to create systemic weaknesses with significantly greater impacts.
- Unintended Behaviour. Not even the engineers who design the tools can truly explain why they behave in the ways that they sometimes do. Early research already has demonstrated behaviours that are of concern. In one example the latest versions of ChatGPT and DeepSeek were pitted against the most advanced online chess game and instructed to ‘win against a powerful chess engine’. On multiple occasions the AI beat the opponent by illegally modifying the location of the pieces, or cheating for want of a better word7. The rationale the AI gave was that “The task is to ‘win against a powerful chess engine’ – not necessarily to win fairly in a chess game”.
- Fundamental Limitations in Models. While we might treat AI tools as analogous to human intelligence and reasoning, this is a deception. There are fundamental differences between human cognitive capabilities and the way that current AI reasons. They have no comprehension of truth (as described in the FT article referenced in the Hallucinations section above). They also have no appreciation of the fact that there is more that they don’t known than what they do; we wrote an earlier article on this area of risk8. This places limitations on the capabilities of these models, which can have a direct impact on the safety of the decisions they make.
Information & Data Security
Initiatives will generally require us to provide our own information and connect the AI solutions up to other systems we use. This introduces new data and information security risks that need to be considered. They are much the same as any other changes made to your digital services, but there are some that are more specific to the use of GenAI:
- Contextual Blindness. If the information you provide the model to analyse and use as the basis of its output doesn’t define any context then the model won’t be able to differentiate context and act accordingly. For example, let’s say you are a consultancy firm and have confidential information about a significant number of clients. You have been asked by one of your clients for advice on their strategy and use a GenAI tool, perhaps Microsoft Copilot, to do the analysis. If this tool has access to all clients’ information and there is nothing to identify the client that a document relates to, there is the risk that it will employ and possibly reproduce confidential information from another client when conducting the research. This could very easily breach confidentiality and non-disclosure clauses when released to the requesting client.
- Data use in Model Training. If you are using a multi-tenanted or SaaS AI tool then you need to find out how customer data is used to train the model. If training from customer data is isolated and only used in analysis and responses for that customer, then there should not be a problem. But if general training makes use of customer data (this tends to be the default case when people are using the free versions) then there is the risk that your own, or your client’s confidential information may leak into responses to other customers’ transactions.
- Scope Creep. GenAI tools have a voracious ability to consume data, and once it has been consumed there may be no way to erase the memory. You should therefore aim to constrain the information it can access, limiting it only to the information it needs for the task or intended purpose. Afterall, if you recruited a new employee to write your marketing copy you wouldn’t give them access to every piece of information in your organisation!
- Specific AI Attack Modes. There are a variety of malicious tactics that can be used to attack, manipulate or otherwise corrupt AI models and their data. The following three are the most frequent to be a concern:
- Data Poisoning involves the attacker deliberately corrupting the data used to training the model to cause it to produce inaccurate or deliberately misleading outputs.
- In Prompt Injection, like SQL Injection attacks on web applications, carefully crafted entries are made in the hope that they will get through any input validation and cause the model to generate incorrect outputs, release sensitive information, or trigger deliberately damaging actions.
- Model Inversion involves extracting sensitive information or training data from the model by making inferences about the knowledge that trained the model from the outputs generated in response to specific inputs.
- Conventional Cyber Risk. Whether you are hosting the technology yourself or using a SaaS service, there will be all the normal cyber risks that need to be considered and controlled. Depending on the nature of your business and the data you are giving the tools access to, there may be additional regulatory obligations to meet as well, such as privacy regulations if the datasets contain personal information.
4. Do you have what it takes?
So, you have decided what your objectives are, how the solution will achieve them, how you will recognise good in the more challenging circumstances, and what risks the organisation may be exposed to. You still haven’t started to implement anything, but you do need to assess whether you have the data and other resources necessary to make it work.
Availability of data and information is the dominant issue here. GenAI has made this simpler than conventional ML, where the quality, structure and availability of datasets at scale presents a high bar to clear. But even with GenAI, you will need quality, structured (or at least unambiguous) data at scale to produce reliable answers to all but the most trivial problems. And beyond availability you will also need to confirm your authority to use it. The technology may be immature, but legislation and regulation has hardly even got out of the starting blocks.
Data isn’t the only cost though. There will be implementation and recurring costs for the technology, and any other contributions made by people. But the most frequently overlooked costs are the business change that will be required to realise the benefits. What policies and procedures will need to be changed? What training and education will be necessary? Will the transition be directed or encouraged? What impact with the change have on other areas of the business, and does it need to be coordinated with a broader programme of change to avoid inefficiency just being moved around rather than eliminated? Change is hard, and applying new technology to the same old processes is just a more expensive way of doing the same old thing.
5. Triage, prioritise and execute
Many of the original initiatives identified in first stage may already have been either discarded or deferred to sometime in the future when the technology has improved further. But hopefully you have a few initiatives that appear viable and worthwhile. You should also be confident that you have considered the pitfalls, understand how you will manage the risks, and have a high probability of being able to take most of them quickly through PoC, PoV and into production.
Prove the concept
In the PoC you are seeking to prove that the hypothesis is possible. A fully working solution is tested at low scale to verify whether it meets the objectives. The more challenging edge cases are run to see whether they still meet the quality thresholds.
Prove the value
You can now scale up the trial, using it for for its intended purpose, but with tight oversight. Full trust in the output still can’t be taken for granted. Frequently you will want to conduct the PoV in parallel with the legacy approach to identify divergence and quality issues. Or you may put a manual check in place if no legacy approach exists. The test is whether the solution is reliably delivering the expected benefits for the anticipated costs.
Manage in production
And now we have the opportunity to reap the rewards. We know that it works, are confident that we have the risks covered, have a change programme in place to drive adoption and transition, and are ready to go.
It is necessary to remain cautious though. The risks identified up front may still be present, despite gaining confidence that they are tolerable through PoC and PoV. A level of supervision and monitoring to ensure what was predicted is borne out in reality may be wise. Models learn, and so they change. The way one behaved yesterday does not guarantee that it will behave in the same way tomorrow. You also may have overlooked certain risks.
As suggested in a previous article of ours9, the concept of trust in AI has many similarities with the level of trust we have in people. Trust is established over time; there will always be a limit to the level of trust you have, and any trust needs to be regularly revalidated.
Rinse, and Repeat
The aim of the approach described in this article is to successfully adopt AI, delivering the greatest benefits while managing the consequential risks. By following this approach consistently an organisation can minimise the investments that never get into production. Highlighting the risks that arise from AI adoption does not deny the benefits that can be delivered; it enables those benefits to be realised with the greatest chance of success.
- Wikipedia, Third Way ↩︎
- Ambitious but small than Jim Collin’s Big Hairy Audacious Goals (BHAG), ↩︎
- Reuters, 2025, AI ‘hallucinations’ in court papers spell trouble for lawyers ↩︎
- Financial Times, 2025, Generative AI models are skilled in the art of bullshit ↩︎
- Hannigan, McCarthy, Spicer, 2024, Beware of botshit: How to manage the epistemic risks of generative chatbots ↩︎
- Nature, 2024, AI models collapse when trained on recursively generated data ↩︎
- Apollo Research, 2024, Frontier Models are Capable of In-context Scheming ↩︎
- Digility, 2024, Does AI “appreciate” its own ignorance? ↩︎
- Digility, 2024, (How) Can we trust AI? ↩︎