Monday, August 8, 2016

Alarm Management Needs Error Management

Does your system contain alarms that cannot actually ever occur?  Many do.

Alarm management focuses on understanding and controlling the alarms that can be generated in a system.  DBDOC provides techniques for walking through the configuration systematically to validate assumptions and look for problems. Effective alarm management is impossible without error management and configuration analysis.

Here are three example sets of alarms that do not work in a single powerplant configuration:
  1. Intermittent failure caused by duplicated exception report imports
  2. Detected failures to test blocks with quality
  3. Problems that can be found by using DBDOC to walk through the configuration easily
If you do not manage your errors and use the tools DBDOC provides to audit the configuration, you will not be able to manage alarms. Even worse, you can spend a great deal of money fooling yourself about alarms.

Example 1 - Frozen Import Block Error


INFI 90 allows block values from one module to generate "exception reports" when the block value changes status or value. In the case of an analog exception report, the new value is transmitted when it changes by an amount that is specified as "significant".

If a an exception report block value is imported more than once into a module, only one of the import blocks works. The values of any other blocks importing the exception report block value will remain frozen. Furthermore, when an on-line configuration is done, a second import block will silently begin working (!) and the block that was working will freeze at its last value. This applies to both analog and digital exception reports.

DBDOC detects and reports this error. However, when the module is compiled using Composer, the error is not detected.

      Error: Frozen import block [093]

      Duplicate off-module reference (DI/L, DI/L) to Module 1,01,05 Block 1230: 1020324 and
      1020342 [1010536]


The source value above is imported in two places in one module. This instance on a spare block page might have the active connection.


The following shows the intended logic.


There is a 50/50 chance that the correct logic is not working at any time. Simply put, tags 0D0730 "COND RETURN TANK HIGH/LOW LEVEL" and 0D0727 "AUX DEA HIGH/LOW LEVEL" might alarm, but they might not. It is a crap-shoot!

Note: It would seem reasonable to assume that AI/L block 1675, now spare, actually started life feeding S3 of block 1676. The duplicate import situation was discovered and the logic changed properly to get both values of S3 from a single AI/L block somewhere in the past. The problem was that the spare block still can be the active one.

Example 2 - Test Quality of a block that does not have quality to test


Powerplant A has 7707 TSTQ blocks. These four input blocks give a binary [1] as a result if one or more of their inputs have bad quality. Actually, 8592 blocks are being tested in all. Of these, 316 do not have quality to test at all.

Here is one example of an alarm (and shutdown) that will not work, identified by the message here.

       Error: TSTQ tests block with no quality [063]
 
       TSTQ Module 4,05,03 Block 2273 tests Module 4,05,03 Block 6170 (FC 15), which does not
       have quality [4050378]

Obviously, the simple solution to the problem of no alarm on bad quality for 1 GROSS MW TRANSDUCER A is simply to test the quality of the input at IREF P1E0388A instead of the output of the fudge factor sum block 6170.

Example 3 - Misconfiguration of a DAANG Block


DBDOC also allows you to go looking for trouble. Walking through all instances of a Function Code will allow you to look for things that were missed long ago. This example shows bad quality that will not alarm.

DAANG blocks are powerful tools when configured competently. Powerplant A has 6227 of them, the second highest number we have seen in a single plant. I was curious to see how consistently programmed they were. The table shows how a value places on Specification S8 can set bad quality.
This plant uses this extensively. This example shows how the constant 32.0 is applied to S8 with the green highlights. When the raw signal is found by the TSTQ block to be bad quality, the DAANG block will be marked bad quality as intended.
The following example shows that undetected errors exist. The red highlights show that bad quality on this raw input will result in good quality being forced. Oops!
Being interested, I started walking through the 6227 DAANG blocks looking for this particular issue.  I got bored after 200, taking 16 minutes, about 5 seconds per block. I found (and flagged with DBDOC annotations) the following reality:
  • 124 blocks with correct alarm configuration - setting 32.0 into S8
  • 22 blocks incorrectly setting 1.0 into S8
  • 1 block incorrectly setting 0.0 into S8
  • 53 of the first 200 DAANG blocks did not set the quality attribute in this way.

The bottom line is that 12% of the first 200 DAANG blocks (all tagged, by the way) would not alarm on bad quality. I have no reason to think this is not representative, which suggests that perhaps 800 of these tags will not alarm on bad quality in this one system.

By contrast, I checked a system with 7980 DAANG blocks. There were no alarm errors in the first 150 instances, with all quality inputs being generated correctly. At least in this respect, good configuration work would make alarm analysis valid.

In a second system with 5084 DAANG blocks, I found that only 2042 of them used the S8 to set bad quality if needed. In fact, this simple walkthrough with DBDOC showed that that 60% of the blocks I sampled were fed from signals that had no quality to test, but did not reinsert bad quality. That could mean about 2000 simple errors - tag does not show bad quality when it should.

Error and Configuration Management Trump Alarm Management


Alarms cannot be managed if errors prevent them from working. DBDOC can be used to improve alarms both by looking at errors and by walking through the project carefully. Otherwise, any alarm analysis will be incomplete and unacceptable.

Friday, August 5, 2016

DBDOC: Reads the Manual and Checks Your Specs.

DBDOC is unique in giving you significant help in finding integrity issues in your system. Every INFI 90 system in the world would benefit from improving operational safety and reliability. Our Error Browser and Error Marker development was designed to make it possible to manage errors and improve the integrity of your systems.

Here you see our much loved copy of the function code manual, dog-eared and duct-taped. You've probably read it too, but possibly not as intensely as we have.

Based on studying the documentation and the experiences of our clients, we added new tests for function block specifications that violate the documentation in ways that seem significant. Read on to see what we found.

We reviewed test data for 229 systems around the world, and found 1160 new errors of note in 77 systems involving 33 function codes - 15 errors per system. Of course, this means that 68% of the systems had none of the specification errors we looked for. It also means that over 1/3 of the systems had these new errors Composer did not find, but that DBDOC will.

I expect this will be the most boring blog I ever write. Holy Understatements, Batman!

Here are example of errors found in various specification values for various function codes from real systems. Some could bite!  All are now reported by DBDOC in the Error Browser.

FC35 - 45 instances in 21 systems
  • Module 5,65,02 Block 133 FC35 S2 value 005 is out of the valid range [000,102]
FC36 - 15 instances in 4 systems
  • Module 5,26,05 Block 6441 FC36 S9 value 9 is out of the valid range [0,8]
  • Module 7,101,06 Block 5800 FC36 S10 value -1 is out of the valid range [0,1]
FC45 - 22 instances in 6 systems
  • Module 2,48,02 Block 854 FC45 S2 value 20 is out of the valid range [0,2]
FC50 - 1 instance in 1 system
  • Module 3,01,04 Block 4082 FC50 S1 value -1 is out of the valid range [0,1]
FC69 - 9 instances in 4 systems
  • Module 32,04,05 Block 3285 FC69 S2 value 5 is out of the valid range [0,2]
FC80 - 61 instances in 26 systems
  • Module 2,78,02 Block 2151 FC80 S6 value 39 is out of the valid range [1-3 or 5-8]
  • Module 2,15,05 Block 2039 FC80 S17 value 246 is out of the valid range [0,7]
  • Module 1,11,02 Block 2494 FC80 S23 value 5 is out of the valid range [0,4]
FC82 - 4 instances in 3 systems
  • Module 3,02,20 Block 1992 FC82 S5 value 30 is out of the valid range [0,2]
  • Module 2,12,06 Block 15 FC82 S15 value 179 is out of the valid range [0,1]
FC83 - 501 instances in 4 systems
  • Module 11,30,15 Block 463 FC83 S1 value 65 is out of the valid range [0,63]
FC84 - 5 instances in 4 systems
  • Module 2,20,02 Block 6641 FC84 S2 value 10 is out of the valid range [0,1]
  • Module 2,20,02 Block 6641 FC84 S2 value 10 is out of the valid range [0,1]
FC86 - 1 instance in 1 system
  • Module 2,104,10 Block 1882 FC86 S6 value 013 is out of the valid range [000,101]
FC95 - 36 instances in 5 systems
  • Module 1,62,03 Block 979 FC95 S7 value 2 is out of the valid range [0,1]
  • Module 1,53,02 Block 9062 FC95 S12 value 7 is out of the valid range [0,1]
FC110, FC111, FC112 - 24 instances in 10 systems
  • Module 5,12,04 Block 4015 FC110 S1 value 20 is out of the valid range [0,3]
  • Module 5,23,02 Block 56 FC111 S1 value 10 is out of the valid range [0,3]
  • Module 7,09,03 Block 4140 FC112 S1 value 10 is out of the valid range [0,3]
FC123 - 12 instances in 1 system
  • Module 4,07,06 Block 3793 FC123 S7 value 22 is out of the valid range [0,11]
  • Module 4,07,06 Block 3793 FC123 S8 value 22 is out of the valid range [0,11]
FC124 - 6 instances in 2 systems
  • Module 5,33,03 Block 3194 FC124 S11 value 120 is out of the valid range [00,22]
  • Module 5,33,03 Block 3194 FC124 S13 value 60 is out of the valid range [00,22]
FC126 - 3 instances in 1 system
  • Module 1,14,11 Block 8501 FC126 S2 value 115 is out of the valid range [0,2]
FC129 - 92 instances in 33 systems
  • Module 1,15,02 Block 3405 FC129 S7 value 003 is out of the valid range [000,111]
  • Module 1,15,06 Block 3133 FC129 S8 value 1000 is out of the valid range [000,111]
  • Module 1,20,05 Block 7810 FC129 S9 value 002 is out of the valid range [000,111]
  • Module 1,23,03 Block 3995 FC129 S10 value 1000 is out of the valid range [000,111]
  • Module 5,15,04 Block 2559 FC129 S11 value 20002 is out of the valid range [0000,2222]
  • Module 1,44,04 Block 2509 FC129 S13 value 0003 is out of the valid range [0000,2222]
  • Module 1,24,03 Block 1808 FC129 S14 value 222 is out of the valid range [000,142]
  • Module 1,04,03 Block 3993 FC129 S15 value 23 is out of the valid range [0,1]
  • Module 3,54,04 Block 7764 FC129 S19 value 5 is out of the valid range [Any or All of 0,1,2,3]
FC132 - 64 instances in 14 systems
  • Module 1,01,15 Block 1346 FC132 S3 value 2 is out of the valid range [0,1]
  • Module 1,60,02 Block 448 FC132 S3 value 4 is out of the valid range [0,1]
  • Module 11,26,04 Block 1106 FC132 S3 value 7 is out of the valid range [0,1]
  • Module 34,46,03 Block 8150 FC132 S10 value 78 is out of the valid range [0,5]
  • Module 1,03,04 Block 1211 FC132 S13 value 50 is out of the valid range [0,5]
  • Module 2,01,02 Block 244 FC132 S16 value 150 is out of the valid range [0,5]
FC136 - 28 instances in 3 systems
  • Module 1,07,03 Block 1258 FC136 S16 value 3.0 is out of the valid range [0.0, 1.0, 2.0]
FC140 - 34 instances in 4 systems
  • Module 10,02,07 Block 1076 FC140 S4 value 8078 is out of the valid range [00,11]
  • Module 7,103,02 Block 3250 FC140 S5 value 212 is out of the valid range [0,63]
FC143 - 1 instance in 1 system
  • Module 1,20,09 Block 8872 FC143 S1 value 3 is out of the valid range [0,2]
FC149 - 60 instances in 10 systems
  • Module 5,41,02 Block 232 FC149 S3 value 8 is out of the valid range [0,1]
  • Module 11,01,04 Block 161 FC149 S11 value 5 is out of the valid range [0,2]
  • Module 11,01,04 Block 161 FC149 S12 value 5 is out of the valid range [0,2]
  • Module 11,01,04 Block 161 FC149 S13 value 5 is out of the valid range [0,2]
  • Module 11,01,04 Block 161 FC149 S14 value 5 is out of the valid range [0,2]
  • Module 11,01,04 Block 161 FC149 S15 value 5 is out of the valid range [0,2]
  • Module 11,01,04 Block 161 FC149 S16 value 5 is out of the valid range [0,2]
  • Module 11,01,04 Block 161 FC149 S17 value 5 is out of the valid range [0,2]
FC151 - 6 instances in 3 systems
  • Module 1,65,05 Block 4531 FC151 S6 value 4532.0 is out of the valid range [0.0,127.0]
  • Module 2,30,03 Block 7180 FC151 S7 value 3 is out of the valid range [0,1]
FC156 - 5 instances in 4 systems
Module 1,18,03 Block 6438 FC156 S19 value 32 is out of the valid range [0,1]
  • Module 3,03,02 Block 7284 FC156 S20 value 40 is out of the valid range [0,1]
  • FC166 - 8 instances in 3 systems
Module 4,05,07 Block 3697 FC166 S2 value 120 is out of the valid range [0,2]
  • Module 3,01,04 Block 3807 FC166 S8 value 32767 is out of the valid range [0,1]
  • FC182 - 19 instances in 1 system
  • Module 1,07,14 Block 56 FC182 S1 value 000 is out of the valid range [001-009, 020-124, 040-041, 064]
FC190 - 6 instances in 3 systems
  • Module 2,32,07 Block 2280 FC190 S2 value 7200 is out of the valid range [1,6553]
FC216 - 61 instances in 10 systems
  • Module 1,09,07 Block 8172 FC216 S3 value 19 is out of the valid range [1,16]
  • Module 20,38,07 Block 650 FC216 S4 value 051 is out of the valid range [000-109, 010-113, 020-125, 040-144, 060-161, 099-199]
  • Module 1,03,04 Block 1428 FC216 S5 value 2 is out of the valid range [0,1]
  • Module 6,60,08 Block 162 FC216 S11 value 0 is out of the valid range [16,24]
FC222 - 1 instance in 1 system
  • Module 1,09,02 Block 1023 FC222 S2 value 1022 is out of the valid range [0000-1209, 0010-1115, 0210-1214, 0300-1302, 0400-1400, 0500-1500, 0900-1900]
FC224 - 18 instances in 4 systems
  • Module 1,09,02 Block 1048 FC224 S2 value 1047 is out of the valid range [0,2]
  • Module 1,09,02 Block 1048 FC224 S3 value 1046 is out of the valid range [0,255]
FC226 - 8 instances in 2 systems
  • Module 20,10,31 Block 8358 FC226 S3 value 10 is out of the valid range [0,4]
FC247 - 4 instances in 2 systems
  • Module 1,13,04 Block 953 FC247 S5 value 5 is out of the valid range [1-4 or 6-8]
Holy Complications, Batman! A number of expressions in http://holysmokesbatman.com/directory apply. DBDOC helps improve your system's integrity.


Wednesday, August 3, 2016

Output Cusps - Exceptionally Artistic!

We came across the pretty but alarming output forms shown here at a client. The scale is one minute per cusp and just over 1% amplitude. This is important because it means that the HMI and historian would not see this at all. DBDOC Watch Window handles it nicely, as is evident.

So what is going on here?



The blue line is the output of a FC 80 Station block and the output cusps were, in fact, a significant issue. The green line is the raw PV input (nice and smooth). However, the APID block was being fed by an exception-reported version of the PV, shown in red. The significant change specification was 0.1%, so the exception-reported value actually was competently implemented, changing in steps, as expected.

The problem simply is that the APID algorithm used a derivative term. As the two diagrams make painfully obvious, this simply does not work. Although the PV signal is changing at a nice fairly constant rate, the controller, fed by the red step input, is hit with a huge change accumulated over one minute, but seen as happening in 250 ms. Thus, the derivative is calculated as being 240 times bigger than it actually is.






















The second image shows that, if the PV is changing so that the exception reports are triggered by the significant change, the cusps are smaller, as they have a better time base. but they are still misleading and problematic.

The bottom line is that PID and APID algorithms should never use a derivative term if the PV is affected by an exception-reported input PV. It cannot work mathematically, and it sure kicks the process in the teeth, again and again.

DBDOC can't yet detect this error situation automatically, but we are working on it.  Stay tuned.

DBDOC 10.7: Now Detects Unitialized Rung Stacks

We always have our eyes out for new error situations that DBDOC can detect and protect you from.

At a training session recently, we happened to notice a FC 111 rung block with mysterious output [1] (blue). This made no sense, because the logic is OUTPUT = (S13 OR S14 OR S15 OR S16) AND S17, but none of S13-16 are [1] (blue), so the output shouldn't be [1], but it is. We verified the specifications, which matched what the CLD showed.



There are actually two different reported errors on this particular block, as you can see from the presence of two error markers.  There is also an unreported error, which we will describe for completeness.

Rung block spec is wired but unused

The first (reported) error is reported as follows:

Rung block spec is wired but unused [ERROR 124]

Rung block Module 9,25,05 Block 7660 S12 is wired (block 3006) but unused by S2 value 000[92505F2A.CAD]
S2 is the operation that acts on input S12, but S2 is 0, which means ignore the input, and instead use the value from the top of the stack.  It may well be unintentional that S2's value be ignored.

Operation on an unwired input

There is actually another error which is not reported, because it makes no difference to the output:

The specs S2-S11 are operations on inputs S12-S21.  Some of these inputs (S16, S18-21) are unwired, and their corresponding operations are 0, as expected, except for S16.  S16 is unwired, but S6 is 12, not 0.   So to correct this error, S6 should be changed from 12 to 0.  However, the outcome is the same whether or not this "error" is corrected, so DBDOC doesn't report it.

New in 10.7: Rung stack not initialized

There is also another problem, the second reported error, which is the one causing the unexpected [1] output:

Rung stack not initialized [ERROR 306]

Rung block Module 9,25,05 Block 7660 logic invalid because S3 is an OR operation[92505F2A.CAD]
This is a type of error newly detected in DBDOC 10.7.  It is capable of causing a significant system error.

How these rung blocks work is that each operation is carried out in order, and the result of each operation is placed on the stack.  The next operation is typically an AND or OR with the current input and whatever is on the stack (i.e. the result of the previous operation).

  1. In this case, the first operation is S2 (0) performed on input S12.  0 means use the value from the stack, i.e. ignore the actual input value S12.  As it happens, the uninitialized stack value is [1]!   But in any case S2 accomplishes nothing.
  2. The next operation is S3 (12) performed on input S13.  This means OR the input S13 with the value on top of the stack.  Well, the value on top of the stack is [1], so the result of this operation is always [1], regardless of whatever S13 is.  So this operation is also pointless.
  3. Similarly, S4 and S5 also OR their corresponding inputs with [1], yielding [1] on top of the stack.
  4. Finally, S7 is 11, which means AND the input ([1] in this case) with the top of the stack (current top of stack is guaranteed to be [1]), for an inevitable [1] final output, regardless of the values of S2-S5.
The basic problem here is that the stack was not initialized. The logic would make sense if S2 was 10 (PUT) instead of 0.  S2=10 would mean put the value of S12 ([0]) onto the stack.  Then the next operation, S3 (12), would OR [0] with the value of S13, yielding the value of S13, instead of a guaranteed [1].  

(Technically, S3 could be 10 (PUT) instead (if S2 was actually meant to be ignored), hence the error message complaining that S3 is OR.  And either S2 or S3 could also be an 11 (AND) with no change in result).

The important thing is that the stack needs to be initialized sensibly with a PUT (x0) or an AND (x1) before it is used, or there will be unexpected results. 

Enjoy these examples from real systems.  DBDOC Version 10.7 and beyond will detect this uninitialized stack problem if you happen to have it lurking in your system.

Some details for those not familiar with DBDOC:
  • Blue lines show a signal that carries the quality attribute.
  • A white box shows value [0] whereas a blue box shows value [1].
  • DBDOC presents the rung logic as a little ladder diagram.
  • The little "top hat" symbol means a change from [0] to [1], triggering a [1] for one cycle.
  • Specs can be moved in DBDOC to get them out of the way.
  • The warning triangle shape is an Error Marker, informing you of a DBDOC message.
  • The yellow box is a data tip telling you about the Error Marker.
These examples come from power plants in Delaware, Pennsylvania, Michigan and Wales. Where the logic was not used, the question can be asked if that was caused by inability to commission it.