A capacity index to replace flawed incident-based metrics for worker safety (P1/2)

A new paper from Sid Dekker and Michael Tooma. This explored a new “capacity index” that combines the presence of capacities (under a S-II / SD / resilience engineering perspective) and duty of care under health and safety law, as an alternative to existing injury frequency rates.

Given the length of this paper, I can only provide a high-level and shallow review. I’ve divided into two articles – this is part 1.

First they cover existing issues with injury frequency rates. One problem they argue is how poorly calibrated the metric is for comparisons between industries or even business units – in one way because of the inconsistency with how to define an injury – with supervisors and safety folk alike making their own call on whether to record an injury or not. This can lead to gaming of metrics.

Another issue is the lack of statistical power with injury metrics [NB. see Matt Hallowell’s recent study], which, in part, found the occurrence of recordable injuries to be almost entirely random). They explain other reasons, and potential downfalls of these metrics, which I won’t cover.

Next they move on to due diligence, which is concerned not just with corporate boards receiving info related to incidents, hazards and risks but also considering and responding to that info. It’s argued that injury metrics (eg TRIFR) is by its nature “ill suited for that insight in that its purpose is to show a trend in lost production time and not the safety conditions that led to the injury … Furthermore, TRIFR is a record of injuries and not incidents. An incident may have significant potential but result in no injury. In that respect, TRIFR would fail to facilitate due diligence” (p3).

Fear of legal liability has driven the growth of injury metrics which are said to have a “veneer of simplicity”, which has contributed to its popularity and persistence. However, for company boards and organisations, it’s said that TRIFR provides no assurance of legal compliance, nor providing a legal liability defence.

Drawing on literature, including Deming, it’s argued that outcome measures, like injury metrics, shouldn’t be the primary variables of interest to companies but rather the inputs into performance. Moreover, a myopic focus on injury metrics “can function as a decoy, taking organizational attention way from the build-up of risks and a possible drift into failure in other areas. Underlying risks can then be left to grow misconstrued or unnoticed” (p5). TRIFR is said to be an exemplar of counting what can be counted but not what actually counts.

They then cover the capacity index as an alternative. The capacity index sees performance as the presence of capacities to make things go well rather than simply the absence of negative events.

Based on a review of jurisprudence, six items that do count for due diligence are then covered (not repeated here). Before covering the six capacities as part of their capacity index, they discuss literature on safety, resilience and cognitive engineering to discuss how these insights feed into the six capacities.

A. The capacity to acquire and maintain safety knowledge

This looks at the anticipation of future failure paths, where monitoring conditions and threats associated with the scenarios are necessary to ensure adaptations for successful work.

This can include building the pattern recognition of groups via scenarios and other techniques. However, it’s argued that developing responses to scenario constructions is easier said than done. In part because constructing plans and responses may become excessively symbolic and decoupled from reality (fantasy planning), and/or organisations become locked into existing mental models of safety and risk, unable to update their beliefs even after new evidence emerges during the escalation of potential harm (fixation error).

Importantly, they highlight that the production of plans, reports, JSEAs etc. aren’t in themselves evidence or demonstration of relevant knowledge.

B. Capacity to understand the nature of operations and their risks

They note the importance of this hallmark is understanding the nature of operations and risks from the place where work occurs every day. There’s always gaps between work-as-imagined versus work-as-done, since work involves dealing with surprises, variations and goal conflicts.

Thus, generating insights needs to be focused at the interface of work and from the people who can generate the best ideas on how to bridge the gaps. Problematically, aggregate measures that boards and managers typically see “tend to hide the normal ebbs and flows of strains and shortages that parts of the system are locally under” (p11), and the first instance managers see of gaps are via post-incident investigations, which are then characterized as non-compliances. Seeing these messy realities as non-compliances is argued to be “unfruitful to learn about how quotidian operations actually take place and their risks get managed“ (p11).

Activities focused on learning, like learning teams are suggested as ways to help drive learning where it should be focused [but also in my view pre-mortems, quality circles and the like], in a way that reveals the messy details and trade-offs.

C. Capacity to adequately resource safety

Predictably, this hallmark covers ensuring sufficient resources are invested in performance. Notably it’s stated that most organisations exist to deliver a product or service and not exist “to be safe”. It’s thus one of many goals that needs to be navigated.

It’s discussed how understanding the goal conflicts and interactions are crucial if safety is to be properly resourced, where “Having a clear line of sight of that trade-off at a Board level is crucial to proper decision making” (p13).

Logically then for due diligence, demonstrating adequate and calibrated resources directed at the safety of work (operational issues) rather than safety work (paperwork, administrative OHS stuff etc.) is a far stronger demonstration of commitment.

D. Capacity to respond to risks and unsafe events

Here the focus is on the adaptive capacity required to deal with risks and issues as they emerge from actual work. This requires a type of devolving decision authority, where authority can be pushed to the people closest to the issues and/or best placed to manage them (rather than necessarily centralised); depending on requirements.

The adaptive capacity to deal with these issues can be grown by (para-phrased from page 14):

Promoting diversity of voices around influence & decision making
Letting expertise guide decisions rather than power structures
Promoting psychological safety and structures to allow safety optimisations during acute production pressures
Focusing on design and operational improvements led by the frontline without reliance on audits or inspections as triggers
Fostering ongoing pride in workmanship

It’s said that these capacities are all measurable or at least demonstrable.

They then go over the importance of restorative approaches compared to retributive regarding failures, rules etc. But importantly, restorative approaches better generate forward-looking accountability.

E. Capacity to demonstrate engagement and compliance

Directly quoting the paper, it’s said that a major obstacle in demonstrating compliance is “the extent of ill-calibration in boards and management (and often even supervisors and workers) about what needs to be complied with (and by whom)” (p16).

A problem is that the majority of stuff to be complied with is internally generated and enforced – not regulatory. Interestingly it’s said that many of these rules “typically have no correlation with actual legal obligations or safety outcomes” (p16), but concomitantly contribute to worker frustration, loss of productivity and front-end non-compliance.

Most importantly, managing a vast array of self-imposed rules dilutes the importance of the few, safety-critical things that organisations and boards should “command attention”.

F. Capacity for assurance

One thing this section highlights is that safety in complex systems doesn’t arise from centralised control or standardisation, but from “acknowledging that variability is inevitable” (p17).

Guided adaptations to local conditions and challenges are more likely to generate greater performance improvements, sensitive to context and daily operations, than standardised centralised methods will. Thus, adaptive capacity in their view, “is the ultimate demonstration of assurance” (p17).

Relating to complex environments, because of the vast number of interacting components the things that can go wrong is huge. One thing that may assist in revealing weaknesses or buttressing adaptive capacities is around experiments to test the operational systems and how they respond to scenarios or simulated failures (in a safe to fail way). Although the obvious example is emergency evacuations, this goes so much deeper and with further innovate thought could apply to lots of other examples (testing procurement elements, communication, trade-offs etc.)

Authors: Sidney W.A. Dekker, Michael Tooma, 2021, International Labour Review

Study link: https://doi.org/10.1111/ilr.12210

Link to the LinkedIn article #1: https://www.linkedin.com/pulse/capacity-index-replace-flawed-incident-based-metrics-p12-hutchinson

Link to the LinkedIn article #2: https://www.linkedin.com/pulse/capacity-index-replace-flawed-incident-based-metrics-p22-hutchinson