National Resilience And Threats From Software Failure

Monday, 04 December 2023
By Gill Ringland & Ed Steinmueller

“software is a problem flying just under the radar, ready to fall into the soup, leaving devastation in its wake. It could crash our planet.”1

burnt equipment

The UK Government Resilience Framework2 is built around three fundamental principles: that we need a shared understanding of the risks we face; that we must focus on prevention and preparation; and that resilience requires a whole of society approach.

We3 endorse these principles. However, we find that there is a lack of shared understanding of the potential impact of software failure on national resilience.

We have recently published a report4 that points out that because of the lack of shared understanding of the risk to our national resilience from software failures, prevention and preparation is less effective. Further, it is clear that the whole of society is the victim when software fails. Service outages due to software failures reduce society’s productivity, security, health, and welfare5.

There is evidence that services provided by digital systems are increasingly liable to service outages due to failures in software, hardware, user errors and cyber-attacks among other causes, and that these outages are increasing in scale and duration as well as becoming less predictable in timing6.

Our report makes recommendations under five headings to tackle the lack of shared understanding and to facilitate prevention and preparation.

Recommendation 1 - Develop a common language for classifying the type of impact – the consequence - of service outages.

Recommendation 2 - We recommend the adoption of the Network and Information Systems (NIS) framework7 for classifying and measuring the impact of service outages following software failures in order to develop a better knowledge of the magnitude of the incidence and impact of software failures: This framework focuses on four measures:

  • Availability (lost user hours);
  • Loss of integrity, authenticity or confidentiality of data stored or transmitted;
  • Risk to public safety, public security, or of loss of life, and;
  • Material (financial) damage to users.

While adoption of this framework does not indicate strategies for prevention or preparation, it does improve shared understanding and provides a quality benchmark for data collection.

We see that the government is in a unique position to improve national resilience of services using software by classifying and measuring its own performance, and by sharing this data across society. This transparency would show leadership and promote the use of digital methods to increase efficiency.

We recommend that the government should:

a) take a lead in publishing data on service outages of government services due to software failures, using the NIS framework, and:

b) set up either a government or a non-profit organisation tasked with collecting, collating, and publishing data about software failures and related service outages across all sectors.

We also suggest that the government should consider a backstop for re-insurance against the impact of catastrophic outages8.

Ambulance

Recommendation 3 - We recommend that the remit of regulators in OES’s (Operators of Essential Services) should include a requirement to report on digital service outages, using the NIS framework - this would enable regulators to address and set standards for service resilience.

We highlight a new risk resulting from the increasing dependency of Industry 4.09 (and digitalisation in general) on infrastructure systems. The obligation to provide essential services is embedded into laws and regulations because social well-being and economic health depend on it. But infrastructure organisations do not yet appreciate the potential scale and impact of the risks to these services from software vulnerabilities, so they are not prepared for the consequences resulting when software failures occur. Infrastructure failures have knock-on effects on the economy and society, and infrastructure organisations are particularly liable to service outages due to software failures due their complex supply chain.

Recommendation 4 - All organisations that use or supply services involving software (very few do not!) should develop the understanding of their exposure to software failure. This involves thinking more holistically about how service outages may impact on businesses delivering on their purposes and meeting their commitments. A process that has been tested is for a business to identify its critical business services and to define tolerances for the failure of these services after software failures in terms of user disruption (e.g. how long access is unavailable, how many users are affected) – the service delivery approach10. This requires considering how critical services often depend on suppliers of linked services.

Boards need to expect software failures. So a design of alternatives or workarounds consistent with the organisation’s tolerances of failure is prudent. These should include restoration of data generated during the outage; investing in the additional human capacity or skills needed; and implementing robust communication and operational protocols.

robot

Recommendation 5 – Training and preparedness for IT, risk professionals, C-suited and Boards. In developing the needed capabilities, we see parallels with the Cyber Essentials Certification11 as a possible model for IT and risk professionals, and the need for Post Graduate University Courses.

Recent surveys find that the C-Suite are overwhelmingly unaware of the risks to their business and reputation from service outages due to software failure. Awareness of the economic and societal impact of software failures, and their impact on services, should be part of management education.

We recommend that Government, Boards and C-suite should take steps to improve their confidence in their organisation’s service resilience against software failures. This could include: activities to engage the imagination of senior managers about failure possibilities and consequences through simulation games or working through software failure scenarios; dialogue structured around the service delivery approach and leading to action planning to improve resilience to software failures across their supply and demand chains; and management education of the next generation of C-Suite to ensure better understanding of the role of resilience in delivering services.

The Working Group has now dissolved itself, and members will be carrying this message forward to improve competence, shared understanding and national resilience.


Gill Ringland, Emeritus Fellow, SAMI Consulting and Ed Steinmueller, Emeritus Professor, SPRU, University of Sussex Business School, December 2023


References

1 Global Risks – Is Software The Vlieg In De Soep? - Long Finance

2 The UK Government Resilience Framework (HTML) - GOV.UK (www.gov.uk)

3 The authors are writing on behalf of the BCS ’s IT Leaders’ Forum Service Resilience Working Group, the Business Continuity Institute (BCI) and a RoundTable including the National Preparedness Commission

4 Service Resilience And Software Risk 2023 (bcs.org)

5 What Is Cutting UK Productivity? - Long Finance

6 https://www.bcs.org/media/9679/itlf-software-risk-resilience.pdf

7 The NIS Regulations 2018 - GOV.UK (www.gov.uk)

8 Is state intervention needed for cyber insurance? (ft.com)

9 https://www.weforum.org/about/the-fourth-industrial-revolution-by-klaus-schwab/

10 See for instance Operational resilience in financial services | National Preparedness Commission

11 See for instance About Cyber Essentials - NCSC.GOV.UK

svg.lf_footer_svg{ height: 30px; width: 30px; }