Are Digital Systems Fit For Purpose?

Monday, 29 April 2024
By Patricia Lustig & Gill Ringland

nyegi-fWrzJGo89Ks-unsplash

The world is becoming increasingly dependent on digital services. In this connected world, we ask: are these services fit for purpose? In this Pamphleteer we give evidence that they’re not, we use a metaphor – Fractured Backbones - that helps us understand the underlying causes; and we suggest some ways to mitigate this lack of resilience.

Evidence

Capita provides software and IT services to government and private companies. In March 2023 details of more than half a million members of the UK’s private sector pension schemes were hacked. Separately in May, details on local council benefit payments were exposed[i].

Between April and July 2023, all three major cloud providers suffered regional outages. The largest AWS region (US-East-1) degraded severely for 3 hours, impacting 104 services: Fortnite matchmaking, McDonalds and Burger King food orders stopped working. A Google Cloud region (Europe-West-9) went offline for about a day. Azure’s West Europe region partially went down for about 8 hours due to a major storm in the Netherlands.

Blackbaud specialises in financial, fundraising and admin software for educational institutions and non-profits. In 2020 the data of over 20 UK universities and non-profits including the National Trust was hacked. Data on Labour Party donors was also taken. The company in 2023 reached an agreement to pay $49.5m to resolve claims that it violated state and federal laws. The Information Commissioner’s Office in the UK also reprimanded it.

Meta’s Facebook and Instagram services were down on 5th March 2024. A more than two-hour outage impacted hundreds of thousands of users globally. The outage was probably caused by an issue with a backend service such as authentication: at the time it was suggested that there had been corruption of the backup data which made it essential to close the platform completely to restart.

The British Library’s 31st October 2023 cyber-attack led to a leak of employee data and resulted in the library's website being down until January 2024, making it impossible for library readers globally to locate or order materials. The Rhysida ransomware group claim to be behind the attack shared an image on the web showing documents which appear to be HMRC employment contracts and passports.

On March 30th 2024, AT&T published information on a data breach: a dataset found on the “dark web” contains information such as Social Security numbers for about 7.6 million current AT&T account holders and 65.4 million former account holders. This means that all 73 million users have had their passwords reset.

Something is clearly broken – after this level of disruption in physical services, accident reports are published and follow up actions are often introduced to avoid repeat occurrences. Why does this not happen for digital services? What can organisations do to protect their business and their customers?

Fractured Backbones

We (the authors) suggest that the concept of Fractured Backbones could be useful in exploring how to improve the resilience of digital services, and provide thoughts on what organisations can do in the meantime.

What are Backbones? Backbones enable societies to work well and to be effective.

They can be explicit – as in rule of law, or implicit – “the way we do things around here.” Backbones cover many aspects of life such as financial services, governance, and international (technical and professional) standards and regulations. ‘Hard’ Backbones are physical infrastructure and depend on agreements developed over time; ‘Soft’ Backbones also depend on regulations and agreements such as currency exchange platforms, ISO standards for technology e.g. engineering components, telecommunications systems, manufactured components, internet protocols.

The soft Backbones underpinning service delivery seem to be fractured and are no longer fit for purpose. How could they be restored? How can organisations adapt to living with Fractured Backbones which are affecting their ability to deliver services to their customers?

Mitigation

matt-artz-lt2GzPlOAmc-unsplash

Awareness of the problem to be solved is the first step in mitigating – restoring - Fractured Backbones. As Einstein said, “If I had an hour to solve a problem and my life depended on the solution, I would spend the first 55 minutes determining the proper question to ask, for once I know the proper question, I could solve the problem in less than five minutes”[ii].

The resilience of digital Backbones has been the subject of concern recently, with the UK’s National Preparedness Commission’s report highlighting the growing importance of software in the economy and society. Further, the lack of awareness of its fragility and the impact of loss of service to the economy, society and people is not yet on the wider agenda. Increasing this awareness is a step towards restoring Fractured Backbones.

Wider awareness could lead to measuring the impact and establishing the case for investment. A BCS and BCI RoundTable suggested that governments could take a lead by sharing data on the impact of service outages and data breaches in public sector services. The US Underwriters Lab supports the AI Incident Database – a non-profit organisation and website tracking all the different ways the technology goes wrong. The website has catalogued over 600 unique automation and AI-related incidents so far.

Publicity about the impact of service outages could alert politicians and economists to the severity of the problem of service resilience and its effect on productivity.

What could governments and organisations do to restore the Fractured Backbone underpinning the delivery of digital services?

The BCS and BCI RoundTable examined this question and suggested that an approach initially defined for financial services provided a roadmap to improve resilience. This requires organisations – public or private sector – to define their most important service in terms of financial impact to users in case of outage; to test this service under a wide range of potential failure conditions; and to resolve all potential sources of outages. This approach builds consensus and shared accountability across the organisation.

In the meantime, governments and organisations can adapt to the knowledge that digital services are subject to failure by planning for manual backup services that can cope with minimal disruption to the customer. The connected world includes people – their role in resilience is easy to underestimate.

Gill Ringland Patricia Lustig 5th April 2024

[i] Financial Times “Capita shares slide 22% after higher than expected loss”, 7th March 2024.

[ii] Quote by Albert Einstein: “If I had an hour to solve a problem and my life...”

svg.lf_footer_svg{ height: 30px; width: 30px; }