Learning From Organizational Incidents: Resilience Engineering for High-Risk Process Environments

This study undertook field studies across several plants to understand whether an “almost incident-free” chemical company had the adaptive capacities necessary to respond to changing risks, using a Resilience Engineering perspective.

Providing background:

· For years safety improvements were driven by evaluating incidents, errors etc., yet current developments “in safety science, however, challenge the idea that safety can meaningfully be seen as the absence of errors or other negatives”

· Improvements are often directed at operators via training, awareness, discipline, new procedures

· The “system is considered to be basically safe because safety is something that is engineered into a system”

· Another view, but complimentary, is an emphasis on the positive contribution of people at all levels in a company

· This view is said to acknowledge that the demands from high consequence and low probability events can’t “always be handled by matching situational symptoms with scripts of coordinated action used in training demand situations that lie beyond procedural reach”

· They argue that “it is not possible to find out if a system is safe or not by deconstructing it”; hence, resilience engineering “abandons the search for safety as a property … and it considers the system as a whole”

· They note that while a raft of resilience engineering principles “sound reasonable”, do they really work in practice? This paper tries to address this question via a resilience engineering audit

Results

Most of the value in this paper are in detailed qualitative accounts, but for reference these were some of the key findings:

· While the company had an almost incident-free recent history, it “turned out to be ill-equipped to handle future risks and many well-known daily problems”

· Safety was seen to be borrowed from in order to meet acute production goals

· Organisational learning from incidents was fragmented into small organisational or production units without company-wide learning

Next I’ll go through some of the key areas of the paper.

Safety Versus Production

Both operators and managers believed that safety and production were the two most important goals and had to be met simultaneously. This resulted in situations were employees had to decide between safety and production but couldn’t do both.

It’s said that while the incompatible goals “arose at the organizational level”, the goal conflicts were “pushed down into local operating units”. These goal conflicts were negotiated and solved by workers I the form of “thousands of daily decisions and trade-offs”.

While operators and managers did try to deal with the conflicting goals simultaneously, trade-offs were “always the final resolution”.

Small incidents were also neglected and often considered as normal side-effects of daily work. The daily trade-offs also meant operators would work around some processes, like not wearing all required PPE because this would eat up necessary minutes of production time.

Relating to Hollnagel’s ETTO principle – people genuinely tried to meet their internalised goals (do what they’re supposed to do, or believe is reasonable), while being as thorough and efficient as possible, without being too thorough or using unnecessary effort.

The paper nicely articulates that: “Underlying organizational pressures and preferences were reproduced in what individual people did and valued (or undervalued), in a way that was invisible for the organization as a whole”.

Importantly, these daily optimising trade-offs were considered by workers as a source of considerable professional pride and a sign of expertise.

While people relayed the business logics of ‘safety first’, and seemingly contradicted this with their actions, further probing revealed a more nuanced belief system. Firstly, the main goal was rather seen as keeping production running to meet client expectations. Middle managers and others doubted that “safety was unconditionally put first by their superiors”, and felt that other goals like production, costs or delivery were at least equal.

This is an example of “conceptual integration” in organisations, or doublespeak.

The daily trade-offs were evident based on discussions and observations. It’s unreasonable to be expected to manage both safety and efficiency to a great extent in a given time, so sacrificing decisions have to be made to deal with the goals.

The Internalisation of External Pressure

They go over goal conflicts and pressures again that workers and managers face every day. “Safety is never the only goal in systems”, with economic pressures cost, production and more. Many operators have “internalized these multiple goals and try to achieve all of them simultaneously with their best efforts”.

One person related the process of monitoring a process every 15 mins. If you monitor after 16 mins then you have to fill in a report and the ‘paper error’ gets filed in the office. To avoid that, they write down that they checked the process on time.

One manager related that the higher up the hierarchy you are, the more goals you have to balance. The lower down the hierarchy, the more localised are your goals.

However, the authors argue that the opposite relationship existed in practice because goals are usually cascaded down the hierarchy; known as the management by objectives principle.

The findings indicate that it is “very difficult to put safety first”. To be successful, it’s said that every level in the plant and company must “recognize the hazards of these external pressures and seriously internalize safety first”.

The Normalisation of Daily Risk

Next the paper spoke about how big accidents happened zero to 5 times a year according to workers, and minor accidents happen all the time.

Most workers reported events like skin burns and acid in the eyes as minor incidents. Workers seemed so used to the frequency of these events that didn’t consider a burning incident as a big thing.

They argue that crucial for managers to be aware is that this normalisation of risk exists, and is normally not reported when events happen. Workers didn’t “consider these daily events as incidents and believed they were naturally associated with normal work”.

Further, their boundaries of what is an incident was redefined based on the frequency of the events; hence, things of concern to managers were normalised to workers.

Additionally, the departure from procedures or manuals were also normalised, where “procedures and manuals functioned as guidelines instead of fixed and mandatory rules”.

Considering that operators faced about 400 overall procedures and 10-30 local procedures in every plant, procedural departures wasn’t surprising. Hence, “Sometimes the sheer number of procedures or manuals made it impossible to follow them”. Experienced operators remarked that they knew which situations were dangerous or not.

The authors ask whether these are examples of people ignoring good rules or if the rules are bad and unsuited to the demands of real work? It’s observed that “real practice is continually adrift from official written guidance, settling at times, unsettled, and shifting at others”.

Departures from the original rules become normalised and routine over time.

Moreover, written instructions don’t cover all issues in reality – sometimes they’re underspecified, others over-specified. People adapt to procedures locally to create safety because there’s always a gap between a rule and practice.

They argue that’s often more fruitful to “look at such gap-closing as a creation of safety than at the ‘‘violation’’ of some procedure”. Our goal should be to make the gaps visible and “provide a basis for learning and adaptation where necessary”.

Hence, organisations should be rather focused on monitoring and understanding the gaps and improving the system to suit.

Organisations must also develop ways to support a person’s skill at judging when and how to adapt, and use of formal MOC processes to manage process changes.

Closing the Loop of Learning

They also found that a major source of dissatisfaction among operators was how the company learned (or didn’t) from failures.

Organisational learning seemed to be more based on individual experience rather than on structured learning processes that covered the whole company.

Operators didn’t really know what was happening in other plants but wanted to. The company relied on the intranet to distribute safety alerts etc.; hence, the company’s ability “the chance to create such a collective mindfulness relies on the intranet as a single source”.

So, people, even if they were inclined to do so, had to search for information on the intranet; which was difficult and not user-friendly.

Also, workers wanted to know more about failures than what was reported in the official accounts.

While the learning from failure hinged on the collection of incident reports, placing memos about incidents on the intranet, and use of a safety department, there seemed to be no active and structured way of distributing information through the whole company except for the intranet; hence, no clear answer to how the company actually learns from failure.

Most operators relied on their own expertise to avoid similar failure, but this left responsibility for corporate learning up to individuals.

They caution that we should be wary “about thinking that safety can be engineered into a system by extensive error counting, or by removing or tinkering with small elements that seem to be unreliable (e.g. individuals, procedures, and equipment)”.

Managers wanted more time to understand the challenges and risks in their plants, complaining that they had to produce reports for annual or quarterly results and meeting production goals. Workers complained about voicing the same issues ad nauseum, believing that management never listened.

From their perspective, they were “pushed into inadvertently taking procedural short-cuts to meet the requirements of parallel production processes”.

Also, some operators worked 16-h shifts for several days in a row to meet production and staffing constraints, and while this is common across the industry, the researchers “found little evidence of conscious discussions about how this company could decide to relax the pressure on throughput and put safety first”.

Another aspect is the incidents etc. recorded in the past doesn’t necessarily help with the future, because “the systems are already very safe, ‘‘the next accident has never been seen before. It may involve a series of already seen micro incidents, although most have been deemed inconsequential for safety’’.

In concluding, they provide these points:

· Safety should be considered first, but there are always uncertainties and goal conflicts making this difficult in practice

· Incidents can be valuable to analyse, but it can be challenging for managers and operators to agree on what counts as an incident

· Importantly in my view, “analyzing an incident is not the same as learning from it”

· There will always be gaps, to varying degrees, between written guidance and actual practice

· An essential capability for organisations is to be sensitive to this gap as indication of system improvements, as “reasons for this may lie buried more deeply in the organization or operation”

· Plants need sufficient operators and managers to operate the plant safely, but definitions of what’s sufficient is negotiable and based on incomplete knowledge

· Formalised safety alerts etc. should be complimentary approaches but not exclusive, and instead person-to-person should be cultivated for “only then are there real opportunities for sharing narratives about risk that people can use for vicarious learning”

Ref: Huber, S., van Wijgerden, I., de Witt, A., & Dekker, S. W. (2009). Learning from organizational incidents: Resilience engineering for high‐risk process environments. Process Safety Progress, 28(1), 90-95.

Study link: https://www.humanfactors.lth.se/fileadmin/lusa/Sidney_Dekker/articles/2009/_Huber_al_2009__Learning_from_organizational_incidents_-______resilience_engineering_for_high-risk_process_environments.pdf

LinkedIn post: https://www.linkedin.com/pulse/learning-from-organizational-incidents-resilience-ben-hutchinson-vglsc

SafetyInsights.org

Home of safety & risk research summaries

Learning From Organizational Incidents: Resilience Engineering for High-Risk Process Environments

Published by Ben Hutchinson

One thought on “Learning From Organizational Incidents: Resilience Engineering for High-Risk Process Environments”

Leave a comment Cancel reply

Share this:

Related

Published by Ben Hutchinson

One thought on “Learning From Organizational Incidents: Resilience Engineering for High-Risk Process Environments”

Leave a comment Cancel reply