Most of my clients have already implemented some form of alarm improvement process, from low hanging fruit nuisance alarm reduction to full blown alarm rationalization and documentation. For most, their efforts reflect the amount of support they received from management and the resources they had available at the time. Unfortunately, after nearly 20 years, our industry is still struggling with abnormal situations and operator response because of poor alarm management practices.
The question is, how close are you to a fully effective, high performance, incident preventive alarm system? Is it almost perfect? Could it be better? Are you out of resources and need help getting your system to the next level? If you had an incident and your alarm management practices was compared to the ISA 18.2 would your regulator find any gaps? Do you think you are compliant with the international standard? Are you ready to implement flood management through dynamic alarming? What can I do today to get my system up to par?
An alarm management audit will answer all these questions and more, but what if you don’t have money allocated for alarm management or it’s difficult to justify additional funds? I hope this article helps you get to the next level in alarm management.
You might have noticed, the global recession hurt many manufacturers resulting in a strategic focus to reduce costs and focus on operational excellence. Poor alarm management is a major barrier to achieving this goal. Research has identified that alarms have a major impact on unplanned downtime, which can cost some 24 x 7 facilities up to $1M/hr. Alarms impact safety too, remember; Three Mile Island (PA), the Milford Haven Refinery (UK), Texas City Refinery (TX), and the Buncefield Oil Depot (UK), which all resulted in significant cost – injury, loss of life, equipment and property damages, fines, and damage to company reputations. Alarm management was cited as a major contributing factor to each of these major incidents and regulators are aware of this.
In June of 2009 the standard ANSI/ISA-18.2-2009, “Management of Alarm Systems for the Process Industries”, was released. Both OSHA and the HSE have identified the need for improved industry practices to prevent future incidents. In 2014 the ISA-18.2 evolved into the IEC 62682 standard, both are recognized as generally accepted good engineering practices (RAGAGEP) by both insurance companies and regulatory agencies. The EEMUA 191 was published back in 1999, it took hundreds of engineers and safety professionals nearly 2 decades to develop these best practices. Alarm management is a serious issue and regulatory agencies are aware of the amount of effort that went into these standards.
I personally got involved in alarm management in 1998, yep that was 18 years ago! Back then I was working for a small company that built a software tool that would replace the loud, paper eating, annoying printer in the control room. The idea was to save thousands of trees, make it easier for engineers to search through the alarms with a simple search engine, and to create quick reports for management. In the beginning, this was a printer replacement solution but each year it evolved and soon became the first and only alarm analysis tool used for alarm reduction/rationalization projects. That was a long time ago and the software was not very expensive. So, very little justification was needed. Engineers time is very valuable, if it takes an engineer half a day to sort through printed alarms to get the data the site manager needed versus immediate results with a quick query, you can bet saving time made it easy to get approval for such an efficient tool.
In the early 2000’s engineers demanded more reports. Having the alarms and operator actions in electronic form made it easy to prove that operators were overwhelmed with nuisance alarms. If you give an engineer data and an efficient way to analyze it, they will dig as deep as they can. Engineers found that most of the alarms were configured for the operators when they needed information. Turns out they were using the alarm system to combat the key hole effect. The key hole effect is what happened when we removed the panel wall and replaced it with hundreds of graphics. The operators had to scan through tons of graphics for status checks. They thought it would be easier to create alarms to provide them with this information but quickly they were overwhelmed with alarms. Alarms are free and you can alarm everything now! Many alarms were also caused from maintenance issues and nuisance alarms quickly got out of control. Disabled alarms were found that should not have been disabled. On/Off delay timers were not used properly, deviation alarms were always going off, and operators were getting hundreds of alarms per hour. Sure, the operators had mentioned this before but now the engineers had data and lots of it. With a few key board clicks they could see things that the operators had been doing for years. Not only did the engineers see the magnitude of the alarm problem but they uncovered issues in the automation system. They realized that many operators preferred to operate in manual and were making changes offline. Some operators were not fond of our software and the engineer would find the server mysteriously turned off until engineers locked the server away in safe place to keep it running. Data is powerful when it’s turned into information.
After major incidents revealed that operators were overwhelmed with alarms and alarm management could very likely prevent these incidents, the UK HSE, the EEMUA, and the Abnormal Situation Management Consortium provided research, case studies, and developed documents on the subject. The little software company I worked for focused on new releases offering better reporting options for alarm clean up. This eventually led the company to include a master alarm database and rationalization tools, management of change, and auditing. Many engineers were proactively working on the alarms even though regulators did not enforce it. Without a Recommended and Generally Accepted Good Engineering Practice (RAGAGEP) to compare your alarm system against they could not really compare your alarm system to anything. With no regulatory enforcement and no measurable data to show a return on investment, that made it difficult to justify doing a full-blown alarm rationalization. Just imagine how much time it would take you to go through every single possible event that should be alarmed, determine the alarm priority, the proper set point, and document the cause, corrective action, and consequence of the alarm and include a procedure if needed. Don’t forget, this is not a one-person job. You need a team with a process engineer, veteran operator, maintenance expert, HSE specialist, and an unbiased facilitator to make sure the meetings are scheduled, productive, and each alarm is rationalized correctly. This could take a year or more!
The many engineers that attacked the alarm problem, before the regulators were aware of the issue, were “believers” or proactive engineers that could convince management that if the operators had proper notification of a problem, then they could prevent incidents. (Seems simple and easy for management to support. Not really:) The better the alarm system (situation awareness) the faster the operator can intervene. (Sees like a no brainer but believe me, we still have companies that do not have a truly effective alarm system). Most US and UK companies have done the work and many are still working on it. They are not all following the same methods and many don’t believe they are seeing measurable improvements. That is why we had to come up with a standard, a better way of achieving results, one that everyone could follow and measure.
In 2003 we all got together and started the development of the first alarm management standard. It took 6 years but we finally released the well-known ISA 18.2 document. This came after the EEMUA 191, which was used to compare your alarm rates to what they considered to be effective. However, it did not show you how to achieve the recommend rates (no more than one alarm per 10 minutes). So…the ISA 18.2 was developed as a lifecycle approach and became the “what to do” to achieve an effective alarm system. Still, even with the release of the ISA 18.2 we still have organizations that have not fully funded a complete and thorough alarm management project. Some say they just can’t justify it. Well, ISA 18.2 changed the game. This standard was published in 2009. The regulators are aware of this and they will hold you accountable. What do you mean, OSHA is not regulating alarm management…are they? In a way they are, I’ll explain:
First off, does the ISA 18.2 standard apply to you? If you have a DCS, SCADA system, PLC, Safety system, or any system that generates alarms that are being monitored by an operator or human, then yes, your alarm system can be compared to this standard and you can be held accountable if you ignore it. The ISA 18.2 states: “The practices and procedures of this standard “shall” be applied to existing systems in a reasonable time”. First off, when it comes to standards, “shall” means “mandatory” and “should” means “recommended”. What about the Grandfather clause? When it comes to standards and the words “reasonable time”, this standard was released 7 years ago! Is ISA a regulator? No. They are not like an OSHA or PHMSA. So why do I care what ISA says? ISA develops best practices and standards. You may have heard the term RAGAGEP which means “Recognized and Generally Accepted Good Engineering Practices”. OSHA has a “general duty” clause which states “The employer shall document that equipment complies with Recognized and Generally Accepted Good Engineering Practices”. ISA 18.2 is a RAGAGEP. Regulatory agencies recognize the amount of work it takes to write standards so they use this clause to recognize the industry best practices that experts have already written. This is real. Regulatory agencies have in the past and continue to fine companies based on REGAGEP.
A regulated industry can be expected to either comply or show that they are doing something just as good or better than the ISA 18.2. The OSHA regional process safety management coordinators and the CSB (Chemical Safety Board) now have approval to internally distribute ISA 18.2 to inspectors during investigations. ISA 18.2 is not a secret. In October 2009, OSHA specifically addressed RAGAGEP and ISA 18.2 at the ISA conference. The IEC recently adopted the ISA 18.2 as an international standard (IEC 62682). PHMSA has made it mandatory for pipeline companies to comply to strict alarm management guidelines which is based on the ISA 18.2. You may remember the BP Texas City explosion that occurred in 2005. Well, OSHA went back in 2009 with an additional fine of 87 million dollars (the failure to comply with ISA standards and ASME codes was the basis for this additional fine).
So…how do you get management to fund your alarm management efforts? Make sure he/she is aware that the ISA 18.2 is a RAGAGEP and regulators expect you to “do just as good or better”. If you’re looking for more information on the “how to” or if you have any gaps in your alarm management efforts, please contact us. Our ISA 18.2 audit will drill into your existing philosophy, progress, past efforts, tools, current condition, and identify a result oriented path forward. We can show you proven techniques that speed up the process of alarm rationalization, reporting, and dynamic alarming for flood management.
“Operators are the pilots of the process – So we give them wings”
UCDS is a Control Room Human Factors Company with 100 years of combined expertise in control room operations. Our design services help operators detect, diagnose, and respond to abnormal situations so they can prevent major incidents, recover quickly, and operate within the desired, safe, set boundaries of the operating limits. www.mycontrolroom.com