Power fail-safe state: After the power outage
Identifying a power outage – small or large – is only a fraction of the battle for your equipment.
After your power fails, the system is down and no energy was released in a way that endangered people or property. Good job. Now what? Blip! The power just came back.
Now we have to deal with the aftermath of the power outage. Controllers are powering back up, circuits are energized, and the system needs to stay safe. Generally, the protocol is to leave all the equipment in the power-off safe state until something external to the direct control verifies that all is well. This external entity might be a Safety Instrumented System, or it might be an operator who walks down the process and presses the reset button.
What if we had a power outage and no one noticed?
How do we define a power outage? Is it the drop in voltage? My stereo might drop offline if my nominal 120 Vac falls to 100 Vac, but my incandescent desk lamp might work fine (if dimmer) at 70 Vac. Every device has different voltage requirements, so it can be difficult to precisely define the minimum voltage. How about time? Clearly, if the power is out for an hour you would call it an outage. How would you refer to an outage that lasted one cycle, about 17 ms? The power supply in my desktop computer might notice, but my box fan certainly will not.
Pick a voltage drop of X% and a duration of Y cycles, or maybe a product X*Y that allows for variations in both, and you have just defined your own ‘power outage’ metric. Unfortunately, every device in the plant has a different metric and its own unique way of defining when it thinks an outage has occurred. You generally cannot know these limits on a device by device basis.
When the outages are long and deep, you can expect every device to notice the loss and to shut down. When outages are very short and shallow, you might get them regularly and never notice because all your equipment is able to ride through the episode.
What if we had a power outage and only some noticed?
Device A notices a short power outage. The enabling signal to a valve is released and the spring return forces it to the safe state. After the momentary outage, a controller is back online waiting for an operator to press the reset button before the valve is re-enabled. Device B does not notice the same short power outage, so it stays in normal operational mode. If Device A and Device B are not communicating or otherwise linked, an unsafe situation can easily result.
Some years ago, a facility I was associated with had multiple independent single-loop controllers operating a boiler. A short power outage caused some of these controllers to reset to their safe state. Others did not notice the outage and continued on in normal operation. Things went boom and that boiler suddenly had a slightly different shape.
How do we make sure everyone notices?
There are two ways to make sure that all devices stay up or go down together.
The first is to find the minimum incident that any equipment would notice, install a sensor that can detect those conditions, and kill power to everything whenever those conditions are met. Everything will go to safe state together, at the expense of frequent nuisance shutdowns of all the equipment.
The second is to install a supplemental power supply that will filter out all incidents smaller than that which would take out every device. No device would notice an outage until the outage is large enough that everyone notices. Such supplemental power supplies are generally prohibitively expensive.
If you cannot make sure that the devices all stay up or go down together, then the solution needs to be communication. Devices that notice the outage perform their shut down function. Devices that do not notice the outage must instead detect the shut down function of the other devices and take their own actions accordingly. These webs of communication and detection links tend to grow very rapidly, and the interrelationships between different systems and process areas can become very complex.
The perfect solution does not exist.
Every situation is different, and every risk evaluation is different. What works at my plant may be completely inappropriate at yours. The key is to recognize the risks that uncertainty about power failures can cause.
Power-fail events are not yes/no situations. There are wide ranges of maybe that must be considered. If you ignore these potential areas of risk, you might find yourself trying to explain just why that system went boom.
This post was written by Robert Henderson. Robert is a Principal Engineer at MAVERICK Technologies, a leading automation solutions provider offering industrial automation, strategic manufacturing, and enterprise integration services for the process industries. MAVERICK delivers expertise and consulting in a wide variety of areas including industrial automation controls, distributed control systems, manufacturing execution systems, operational strategy, business process optimization and more.