Friday, December 20, 2013

The Art of DBDOC - Live Specs and NVRAM Failure

Summary

NVRAM failure can be a big problem.  Doing a DBDOC Live Loop Annotation fetching the block specifications will fail if the NVRAM has failed.  This could be useful information to you.

Details

A client reported that DBDOC Hyperview could not fetch the specs of a block.  In fact, the specs could not be fetched in any block in that module.  This is shown in this image.  Live data could be fetched - only the reporting of the specifications was affected.


However, fetching specs worked fine in other modules.  The example in a clone of the problem one, as you can see.


What could cause this problem? 

The next two images show the difference between working and failing module status fetches. 




Guess what?  The problem module shows NVRAM failure status: Fail.  The one that is working says: Good.

From the Client

"I believe a NVRAM failure will prevent looking at any of the configuration."

Subsequent Perspective

"The module will continue to operate with an NVRAM error because a copy of the configuration is held in SRAM and executes from there. The next time the module is reset it will not startup but will fail.

"This module had failed earlier and was hard initialized during an outage and did not show the errors, but has NVRAM failure now."

New INFI 90 "art" has been crafted.  This is the first time we know of that the module status fetch we created has been used to solve a problem.

FC 222 and 223 Can Have Severe Exception Report Problems

Summary

If you have FC 222 and FC 223 blocks, you should check for the following problems:

  1. FC 222 S8 defaulted to 0.0 and FC 223 S9 defaulted to 0.0 which causes continuous exception reports if the block is tagged in any HMI.
  2. Spare FC 222 and FC 223 blocks tagged but not used which cause absolutely wasted continous exception reports.
  3. FC 222 and FC 223 blocks with all the significant change specifications set to 1, even when this value is not appropriate.

Details

At a client site, DBDOC was responsible for uncovering a significant loading problem involving FC 222 - Analog In/Channel that has been causing significant loading of the node communication capability. It turns out that FC 223 - Analog Out/Channel has the same problem. Here is an outline of the problem:
  • site has significant number of FC 222 S8 and FC 223 S9 blocks at default value of 0 significant change
  • even worse, many of these have "spare" tags identifying unused blocks brought into the HMI by exception report
  • the result is exception reports generated by the node at once per second, many utterly wasted, and the rest far too sensitive to be valid.
  • examination of other systems shows that some have all default values replaced, although there is clearly no understanding that the value is in engineering units (like FC 177) not in percent
DBDOC tools including the Significant Change Report and the extraction of all specifications made it easy to look for the problems once they were conceptualized.

In the site in question, a power plant installation, the bulk of the FC 222 blocks were done relatively recently, taking advantage of modern INFI 90 hardware.

There are 573 FC 222 blocks with default significant change of 0. Our tests with DBDOC Watch Window easily proved that each of these blocks, either if imported or in the HMI tag database, generated an exception report every Tmin seconds, that is, every second. In fact, 92 more FC 222 blocks had non-default significant change, because they had been properly configured.

What about these 665 FC 222 and FC 223 blocks?
  • One had no tag, but was imported by an AI/L block, so it is generating an exception report anyhow.
  • 196 were named spare blocks, none used in graphics or PI, so196 XR's per second wasted.
  • The other 469 blocks were giving an XR every second.
The PCU distribution of the load was:
  • PCU 2 - 640
  • PCU 3 - 16
  • PCU 4 - 8
  • PCU 5 - 1
There actually were 2181 tags defined in PCU 2, which means that every Tmax of 60 seconds, each generated an XR. Thus, the base XR load was:
  • 573 per second from default FC 222 blocks
  • (2181 - 573) / 60 = 27 per second from the other tags
That is, the base XR load on the node is 600 XRs per second. It is a good thing this system has NIS 21 / NIS 22 communication cards. As it is, this showed that the average communication CPU usage in the node of about 50% was a valid figure.

Why worry about this?

554 of the 640 tags and export blocks had default significant change. If the defaulted FC 222 blocks had 1% significant change, the base load would nominally be 2181 / 60 sec or 36 XRs / sec. Allocating 600 XRs per second nominally would allow the analog tags to be increased in sensitivity by a factor of 20, with less load on the system than there is now.

This investigation also brought our attention to FC 223, the Analog Out/Channel block, which also has a default significant change of 0.

Guess what? There are 80 of these in PCU 2. They all have the default 0 significant change, so they are generating 1 exception report per second. 31 of them are spare blocks, so that load is doing nothing.
Bottom line is that, unbeknownst to our client, the exception report loading on the node included:
  • 96 analog in and 31 analog out spares giving 127 utterly wasted XRs per second
  • 469 analog in and 49 analog out tagged blocks giving 518 very, very, precise values every second.
  • 26 XRs per second from the other 1536 "low class" tags, most at default significant change.
Thus, the load was about 600 XRs per second. If the spare tags were deleted and the defaulted ones set to 1% significant change, the load would be 518 / 60 per second (typically) or 9.

This node would generate 35 XRs per second as currently tuned. There is lots of XR capability to improve its tags and history data using the capacity that is nearly completely wasted right now.

How many other nodes with FC 222 and FC 223 blocks have this bad situation?

The good:
  • Large Australian power plant has 10663. One has default significant change. It is also not tagged.
  • Large Canadian power plant has 1610. None of the 161 that have 0 significant change is tagged.
  • Large American power plant has 1312, none of which is defaulted.
  • Medium size Canadian power plant has 192, all with sig change set to 1.
The bad:
  • Large Australian process plant has 66 with default values. 16 are tagged, which is a meaningful load.
The ugly:
  • These small American power plants with all values defaulted at 0 and over one-third tagged but spare.
  • Small American power plant with 88 of 754 with default 0. 86 of these have tags, so they are a load. 21 are spares, so they are a wasted load.
  • Medium size Canadian process plant has 496, all at default 0. 216 are in PCU 171 and 280 in PCU 172. Every one is generating one XR per second.
  • Medium size American process plant has 559 FC 222, of which 368 are tagged with significant change 0. No spare ones are tagged.
It is clear that the significant change specifications and tags for all FC 222 and FC 223 blocks should be examined. The load can be large, and it can be a waste of a precious resource, even stalling or losing exception reports.

Postscript

The general situation with FC 222 and FC 223 blocks that are not defaulted is to have the significant change specification set to 1, no matter what the span.  This is probably an error, too, suggesting that the work was done under the misconception that the specification is a percentage value, rather than an EU one.  1 in a span of 5 is 20%, whereas 1 in a span of 1000 is 0.1%.  Since both appeared in the same system, with the only value used being 1, it is likely the values should be studied even when they are all not defaulted.

Monitoring Exception Report Performance

Are Dated Design Compromises Affecting Your HMI and History Data Precision?

In many INFI 90 systems, there is a significant loss of continuity between the system implementation and its current use and performance.  The experts that made the original decisions are long gone, with no documentation and justification for many decisions that were made.

How can you find out if the system performance tuning decisions made are still valid?  Why is it possible that the way things were done years ago is no longer valid?  

Systems have gone from NET 90 to INFI 90 (plant loop to INFI loop).  Additionally, the PCU now has much higher communication performance available in the form of the NIS21/NPM22 pair.  The old hardware could only generate (in a PCU) 400 exception reports (XRs) per second, whereas the new hardware can do many times that number.

What are the ramifications for INFI 90 systems?  Can limitations decided on long ago be backed off? Are the decisions made long ago still valid now after you spent all that money on new communication modules?

Especially if you have PCUs with new NIS21/NPM22 communication modules, you are very likely to have the ability to generate much more fine-grained data for your HMI / consoles and for your history system, essentially for free.  The capability is there.  Why not use it?  If your system has older hardware, it may have been set up so conservatively that there is actually lots of capacity available you can avail yourself of, too.

Here is my humble sketch of how you would use spare blocks (hopefully) and a lightly loaded module to monitor the performance of your system from the perspective of timeliness or delay of exception reports.

1. In a Module to be Monitored, Create An Exception Report Generated Once Per Second

The first outrageous suggestion is that you generate an exception report from the seconds value in any module you wish to monitor.  The sketch shows this.






To be able to stop the exception reports, you would simply put a transfer block and an ON/OFF block between block 22 and S1 of the AO/L block.  This would allow you to turn on the exception reports only when you wished to check.  However, it is very difficult to imagine that adding one exception report per second per module will really make a significant difference.

You can see that, as 1 second is more than 1% of 100, an exception report will be generated every second. The objective of this logic is to use the added load for good benefit.  You will have the seconds value in Loop 1, PCU 10, Module 4 available as an exception report.

2. Create Your Time Standard in Lightly Loaded Safely Manipulable Module

The rest of the implementation uses a lightly loaded module that you can manipulate safely.  Since the objective of this work is to increase the precision of a lot of data in the system, you have to decide if the effort is worthwhile, of course.

In most systems, the time is synchronized around the loop.  Do you know it works for sure?  If it does not, you will find out when you go further.  The second bit of logic, in the lightly loaded module (Loop 1, PCU 5, Module 4 in the example), simply puts an Output Reference on the seconds value in that module.





You can see that the value on the OREF "10504 SECONDS" ought to be pretty close to any other SECONDS value in the whole system.

3. Create an Exception Report Import and Comparison Logic

All that is needed now is to import the seconds value from the module being monitored and compare it to the reference value in our lightly loaded module.  Probably 5 seconds for an alarm is horribly high, but normal monitoring techniques will tell you easily enough how tight you can get the test.  If you define the tag "11004 SECONDS LAG" (or "11004_SECONDS_LAG"), you can get an alarm whenever the exception report is delayed (or simply log it).




Of course, your imagination is the limit.  You can send the difference to an AO/L block and monitor that using your historian.  You can filter and average and remove jitter if that is needed.  The sky's the limit!

Note how easily you could monitor all the modules this way that are purported to be heavily loaded.  The only significant work involves the lightly loaded module that you can manipulate safely.

4. Ramifications for Improved HMI and History Data

What happens if you find that there is no delay?  You also might have implemented the technique in the post "Monitoring Node Communication CPU Load" and found that your node CPU usage is very minimal.

The most important limitation in existing INFI 90 systems is the Significant Change specification.  This number tells either a percent of span (AO/L blocks, Station blocks) or an actual EGU value (FC 177, 222, 223) that triggers a new exception report.  Traditionally, 1% is the default value used to try to keep the load managed.  Is this still valid, if you have no exception report generation delay and idle communication modules?

If you have proven that you have excess communication capacity, you can start improving (reducing) the significant change specifications of important or too coarse tags to get better data to the operators, engineers and executives.  You can be sure that you have lots of capacity by keeping the CPU load reasonable (but higher than the minimal level found in many modern systems).  You have a direct test and warning if you are getting delays in the generation of exception reports.

5. Negative Ramifications of Current INFI 90 Practice with respect to Significant Change

In case you wonder, or have to justify even thinking about the monitoring suggested here, you will want to look closely at my post Improved Tag and History Data Precision in INFI 90 Systems.  It will give you "grist for the mill" as you try to make the data originating in your system better.  It will show you that there usually is a lot of room for significant improvement.

Improved Tag and History Data Precision in INFI 90 Systems

INFI 90 is based on the concept of Exception Reports (XRs) that reduce the communication load needed to get good values to HMI systems, historians, OPC servers and to be used in other parts of the system.  Analog values have a "significant change specification" to prevent load from analog values that are not changing significantly.

History

This history is approximate, as told to me by gurus over the past years.  

Net 90 and INFI 90 started with a 1 MBaud plant loop communication ring, plus the impediment that a single XR was sent for each enrolled user.  These two factors made it easy to overload the capacity of the system.

Infi Loop introduced both a 10 MBaud communication ring and a protocol that allowed multiple destinations on the ring for an XR, so the load caused by an exception report no longer changed if more consoles used the tag, for example.

Pragma

Practical considerations were developed to avoid the loss of exception reports in the early systems, and developed further as the systems got both bigger and faster.  Bigger, of course, meant more load, but faster meant more capability.  Initially, the practical suggestion and default was that a change of 1% of span in and analog value was a good compromise.  Sensors were often enough not very accurate and it seemed logical.

The Modern World

In the modern world, there are aspects you should take into account:
  1. INFI 90 communication capability increases have not usually been translated into more precise data.
  2. Sensors have often changed from 2 1/2 digits to 3 1/2 digits, ten times as precise.
  3. Techniques now exist to easily monitor exception report delay and communication loading.
  4. Decisions based on data that is not sensitive enough are made regularly, probably with affecting your plant negatively at times.  This applies to both console operations and historical data.
A Case Study

In a plant, examining the DBDOC Significant Change Report, we noted a flow rate with a span of 7000 and default 1% significant change.  Because it was being imported into another module by exception report, we could study the suitability of the significant change setting.

There were 23 exception reports in the period shown, 18 of them at Tmax of 75 seconds.

Here are some salient aspects of the raw and imported values:

  • Raw value is the pink line.
  • Imported value is the blue stepped line.
  • Working range is under 400.
  • Tmax is 75 seconds, and all positive-going exception reports were caused by the timeout.
  • The negative-going exception reports were triggered by the drop of 70 before 75 seconds had elapsed.
The green sections show where the imported exception report value was less than the actual value.  The orange ones show where it was greater.  Interestingly enough (to me, anyhow), I had joked in front of clients about how bad it would be to be integrating exactly this sort of a variable.  You would find the accountants for the customer very happy when you told them you had used the blue step function to decide how much they should pay.  They would pay fast, and not quibble.

Out of curiosity, I estimated the performance using 0.1% significant change.  I did this by simply taking the raw value as a start and making a pseudo-XR when the value increased or decreased by 7.0 or more.  This gave me an estimate of how the imported values would look and what the performance would be.  The number of exception reports would have been about 140 in 1420 seconds, or about 6 per minute.  Only one period of 75 seconds went by when there would not have been a significant change. 

What does this look like from a numerical perspective?

Significant Change       1%      0.1%

Mean error            -8.1      -1.3
% of full span       -0.12%    -0.02%
% of working range   -2.02%    -0.34%

Mean error magnitude  29.0       3.5
% of full span        0.41%     0.05%
% of working range    7.24%     0.87%

The Bottom Line

By using the data available through DBDOC Watch Window (or Composer Trending, or any other block value monitoring package), you can see that it is perfectly possible for an exception reported value to be handled in a way that could be problematical.  The data comes from an exception report import block, but applies to every value on every graphic, and to every historical value in every INFI 90 system.

At a small cost in increased exception reports, using capacity that can be verified as being available safely, process values can be much more precise.  The full analysis that is done here is not necessary.  What you need to do is simply:
  • Identify imported values, tags and historical data that needs more precision
  • Monitor the node communication loading
  • Where there is capacity, use it by making the significant change specification tighter
The improvements in console, history and perhaps control and shutdowns will be significant.  You probably have the capacity to do this right now.

DBDOC and Spare Blocks

Sometimes, you have to find a spare block and the spare blocks / boneyard sheets have not been maintained.   DBDOC can make this task systematic and as easy as possible. We plan to add a feature one day to directly identify unused blocks, which will be even nicer.  Keep those cards and letters coming, folks!

Start with the Table of Contents - Miscellaneous Indices chapter.  Click on Function Codes.  In the Index to all Function Codes, click on the Function Code 30 entry to get to the list of all FC 30 blocks in your system.


You are now at a list of all the FC 30 blocks in the system.  Scroll down the list until you get to the module you want, in this case, Loop 11, PCU 10, Module 2 (Module 11,10,02 in DBDOC terms) and click on the highlighted entry to call up that block.  You will see the block in the CLD or CAD sheet, usually with a line showing it is (probably) used.  If you turn on "attributes" with the "A" key, you will see if the block has a tag. You should, at least for a start, assume that it is used if it has a tag or if it has an output reference.

The example shows that the first block is tagged "1-TI-2241" and that it has a line carrying its value someplace to boot.  Probably used, eh?


Now you simply walk through the FC 30 blocks with the "L" key ("shift L" to go up).  The same functionality is given by  andin the icons.

When you get to one that has no line and no tag, you have found a candidate spare block.


Clicking on the block number (1501) brings up the index of all the places in the world where this block is purported to be used, hopefully only its source right here.  This one certainly seems to be unused.


It makes sense to be satisfied that you have a good enough DBDOC build that you can count on the fact the block is not used.  For example, it is possible that it is imported by some other project, and you did not build all the projects together in your DBDOC build.  If that could be the case, you should do further analysis.

However, no other tools can get you this far this fast.  We are always ready to help you be sure you have a good DBDOC build that resolves everything.  You have a candidate spare block and it really did not take very long.

What do you do if you walk right through Module 11,10,02 without finding a spare?  You can go back through the blocks and click on the output block number, looking for blocks that are not used in any graphics.  Although these might be in your history system, you can go after that usage and perhaps find that the block can be re-purposed with no problem.


When you are walking through the blocks checking for how they are used, the key sequence is easy enough to do semi-automatically:
  • click on output block number
  • back-space to get back to the function code index
  • type "L" (or "l") to get to the next linked block and repeat
There are, of course, other ways to do this.  It is very nice when all spare blocks get put into boneyard pages.  However, that is not always the case.  When you have to find one and see how it is used or not, this DBDOC technique will certainly help you.




Thursday, December 19, 2013

Monitoring Node Communication CPU Load

INFI 90 systems are characterized by PCUs that are Nodes on a Ring.  Traditional systems include a NIS/NPM pair that handles the communication between the Modules on the PCU/Node and the rest of the system.  The performance of this interface can be monitored trivially using WatchWindow in DBDOC Hyperview.

Note that ABB Technical Bulletin TB1999054A gives details on the various performance statistics available from a node.  It also specifically mentions that this capability was introduced in INNPM01 firmware revision C.1, so it is not available in older versions.  It being at least a decade and a half since that firmware revision was introduced, let's hope nobody will be disappointed by not being able to do this.

Using DBDOC Watch Window to Monitor Communication CPU Usage

To see what percent of the communication module's processing power is being used, you simply monitor Loop L, PCU P, Module 0, Block 11.  The example shows Loop 1 PCU 4 being monitored and the monitoring of Loop 1 PCU 6 being defined.


Notes about this monitoring:

  1. This fetch is never done in turbo mode, so you only have 10 to 15 values per second available, and these are shared with all your other Hyperview users.  
  2. Running for one or two "Tmax" periods for exception report flushing should give you a good picture.
  3. Turn off the Watch Window data collection by clicking on the green clock icon when you have a picture of the loading.
  4. If you do a save of a module in the PCU while you are monitoring, you will get an idea of the maximum communication loading possible in the node.


Interpretation of the Data

Experimentation has shown that CPU loading above 90% shows a situation where exception reports (XRs) can be delayed or lost.  This is the bad side.

What about if you find the loading very low - under 10%? This is the good side. We believe this means you have a lot more exception report generation capability than you are using. Your system is capable of significantly better performance with respect to tighter values for the HMI and for the history system. Pushing the communication usage out of the idling range will potentially be very beneficial.

You might refer to Improved Tag and History Data Precision in INFI 90 Systems for more on what you can do to understand what you might be suffering from and how to get more out of your system.


Results

We would be happy to get a copy of the Watch Window data, which is in a .CSV file that is created by the monitoring.


Disclaimer

The monitoring indicated has been done on dozens of DBDOC systems with no known problems.  It is the same monitoring as done by Process Portal B and gets the same result (of course).  Monitoring only one block makes the load as minimal as possible.  However, if you have any question or hesitation, your consultation with your ABB support specialist should give you more insight.  We are very willing to work with your experts.

Wednesday, December 18, 2013

Function Code Quantity

A client wrote:
     "Can DBDOC tell me how many Function Code 156s we have in our project?"

The answer is:
     For sure!  Let us count the ways. 


1.  In DBDOC Hyperview, under Miscellaneous Indices, click Function Codes and then Function Code 156. This will give you a list of all of the blocks in your system that are function code 156. The blocks are numbered, so you can just scroll to the end of the list for a count.


2.  FC156.dbf in the Exports subfolder has all the FC156 blocks and all the specs, even the compiled input block numbers (and FCnnn.dbf has the same thing for the FC nnn blocks).


3.  File FCLIST.TXT in the build folder lists all the function codes used in the project and how many of each that there are.


4.  The MHD Module Info section of a module will tell you how many are in that module.  This can make module CPU loading possible to estimate.


5.  MASTER.DB is an SQLite database that has all the Function Code information and a lot more. SQLite tools will allow you to open the file, examine the records and do queries.


The bottom line is that DBDOC has a great deal of capability built in that most of our users never find, so they do not tap the resource they have available.  We have tried to do things that are not possible easily with existing tools, if they are possible at all.

Happily, one more site is now turned on to the capability.  

Over the last half year, I have presented Advanced DBDOC workshops and Hyperview training sessions totalling 23 days at 13 sites.  Without exception, the DCS teams found a mess of useful things in DBDOC that they had not guessed were provided.