Friday, December 20, 2013

Monitoring Exception Report Performance

Are Dated Design Compromises Affecting Your HMI and History Data Precision?

In many INFI 90 systems, there is a significant loss of continuity between the system implementation and its current use and performance.  The experts that made the original decisions are long gone, with no documentation and justification for many decisions that were made.

How can you find out if the system performance tuning decisions made are still valid?  Why is it possible that the way things were done years ago is no longer valid?  

Systems have gone from NET 90 to INFI 90 (plant loop to INFI loop).  Additionally, the PCU now has much higher communication performance available in the form of the NIS21/NPM22 pair.  The old hardware could only generate (in a PCU) 400 exception reports (XRs) per second, whereas the new hardware can do many times that number.

What are the ramifications for INFI 90 systems?  Can limitations decided on long ago be backed off? Are the decisions made long ago still valid now after you spent all that money on new communication modules?

Especially if you have PCUs with new NIS21/NPM22 communication modules, you are very likely to have the ability to generate much more fine-grained data for your HMI / consoles and for your history system, essentially for free.  The capability is there.  Why not use it?  If your system has older hardware, it may have been set up so conservatively that there is actually lots of capacity available you can avail yourself of, too.

Here is my humble sketch of how you would use spare blocks (hopefully) and a lightly loaded module to monitor the performance of your system from the perspective of timeliness or delay of exception reports.

1. In a Module to be Monitored, Create An Exception Report Generated Once Per Second

The first outrageous suggestion is that you generate an exception report from the seconds value in any module you wish to monitor.  The sketch shows this.






To be able to stop the exception reports, you would simply put a transfer block and an ON/OFF block between block 22 and S1 of the AO/L block.  This would allow you to turn on the exception reports only when you wished to check.  However, it is very difficult to imagine that adding one exception report per second per module will really make a significant difference.

You can see that, as 1 second is more than 1% of 100, an exception report will be generated every second. The objective of this logic is to use the added load for good benefit.  You will have the seconds value in Loop 1, PCU 10, Module 4 available as an exception report.

2. Create Your Time Standard in Lightly Loaded Safely Manipulable Module

The rest of the implementation uses a lightly loaded module that you can manipulate safely.  Since the objective of this work is to increase the precision of a lot of data in the system, you have to decide if the effort is worthwhile, of course.

In most systems, the time is synchronized around the loop.  Do you know it works for sure?  If it does not, you will find out when you go further.  The second bit of logic, in the lightly loaded module (Loop 1, PCU 5, Module 4 in the example), simply puts an Output Reference on the seconds value in that module.





You can see that the value on the OREF "10504 SECONDS" ought to be pretty close to any other SECONDS value in the whole system.

3. Create an Exception Report Import and Comparison Logic

All that is needed now is to import the seconds value from the module being monitored and compare it to the reference value in our lightly loaded module.  Probably 5 seconds for an alarm is horribly high, but normal monitoring techniques will tell you easily enough how tight you can get the test.  If you define the tag "11004 SECONDS LAG" (or "11004_SECONDS_LAG"), you can get an alarm whenever the exception report is delayed (or simply log it).




Of course, your imagination is the limit.  You can send the difference to an AO/L block and monitor that using your historian.  You can filter and average and remove jitter if that is needed.  The sky's the limit!

Note how easily you could monitor all the modules this way that are purported to be heavily loaded.  The only significant work involves the lightly loaded module that you can manipulate safely.

4. Ramifications for Improved HMI and History Data

What happens if you find that there is no delay?  You also might have implemented the technique in the post "Monitoring Node Communication CPU Load" and found that your node CPU usage is very minimal.

The most important limitation in existing INFI 90 systems is the Significant Change specification.  This number tells either a percent of span (AO/L blocks, Station blocks) or an actual EGU value (FC 177, 222, 223) that triggers a new exception report.  Traditionally, 1% is the default value used to try to keep the load managed.  Is this still valid, if you have no exception report generation delay and idle communication modules?

If you have proven that you have excess communication capacity, you can start improving (reducing) the significant change specifications of important or too coarse tags to get better data to the operators, engineers and executives.  You can be sure that you have lots of capacity by keeping the CPU load reasonable (but higher than the minimal level found in many modern systems).  You have a direct test and warning if you are getting delays in the generation of exception reports.

5. Negative Ramifications of Current INFI 90 Practice with respect to Significant Change

In case you wonder, or have to justify even thinking about the monitoring suggested here, you will want to look closely at my post Improved Tag and History Data Precision in INFI 90 Systems.  It will give you "grist for the mill" as you try to make the data originating in your system better.  It will show you that there usually is a lot of room for significant improvement.

No comments:

Post a Comment