Banking System Failures

Friday, 07 March 2025
By Gill Ringland and Patricia Lustig

broken screen

BBC’s Today Programme had a short piece on 6th March - “Nine major banks and building societies operating in the UK accumulated 803 hours of tech outages in the years 2023 and 2024”. This is equivalent to 33 days: at this level the outages are clearly impacting UK’s GDP.

In September 2020 we wrote, “We think that software is a problem flying just under the radar, ready to leave devastation in its wake. It could crash our planet.”

So we welcome the statement by Dame Meg Hillier, Treasury Committee chair. “putting the data in the public domain would encourage banks and the regulator to see if there was anything more that could be done to reduce the disruption”. The BCS (the Chartered Institute for IT) has asked government to take a lead on this, for instance publicising the impact of public sector outages.

The radio programme included comments from industry experts, who highlighted two factors contributing to outages: legacy banking infrastructure, and software.

Patrick Burgess of BCS suggested that “ the traditional banking sector has not kept pace with the investment needed to modernise its infrastructure.”

The BCS, working with the National Preparedness Commission and the Business Continuity Institute have highlighted that modern software is built of components from multiple sources, with no common set of assumptions – the Backbone of agreed assumptions is fractured. It is a complex tightly coupled system that, when it fails, will fail unpredictably.

What can be done?

Clearly, banks and building societies make investment decisions based on a judgement of the balance of risk of outages and the cost of the work needed to replace it. Legacy infrastructure will persist.

There are three avenues which could reduce the number and scope of outages in banking service.

  1. Given the expectations of society that the banks will operate 24/7, one essential response for organisations is a focus on resilience: here meaning the resilience of services to users – reducing the fallout to users from digital systems failure. Building a resilient culture depends on a number of factors such as safe spaces for whistle blowers, anticipation and remediation of potential sources of failure, and ability to recover quickly across the organisation. The FS regulator has defined a process for focussing on important business services and reducing the level of outages.
  2. Second is an emphasis on the IT skills needed to “keep the lights on”. IT is so central to our economy and society that more focus is needed on training, education and career development of IT professionals into the areas needed for dealing with these complex essential services. Topics such as system thinking, cross-organisational networking, forensic failure analysis, supply chain management, and use of AI to monitor and flag danger signals, are on the agenda for the next wave of qualifications of IT professionals. Board members do not need IT skills but they need a language to connect with IT professionals – the FS regulator’s terminology of important business services, and tolerances of outages, is one medium.
  3. Finally is the introduction of systematic measurement and publication of the true cost of outages. This cost needs to include the internal costs to fix the outage plus the costs incurred by users through their lack of access to banking systems.

The NIS framework defines metrics for the cost of failures in terms of impact to users. These are: cost of “lost user hours”, cost of data breaches, cost of damage to life or health, and significant financial impact to users. The framework is used by the ICO to regulate Relevant Data Service Providers (RDSPs), but it is not yet a widely used framework for costing the economic impact of service failures.

The NIS framework would measure the impact of the banking outages, not just in number of hours but in number of lost user hours which are a magnitude larger. This would enable government to start to assess the impact on productivity and the economy, and Boards to assess their risk from claims by users.

Gill Ringland and Patricia Lustig, March 2025

svg.lf_footer_svg{ height: 30px; width: 30px; }