5 Replies Latest reply on Mar 19, 2009 4:36 AM by MikeReid

    Best Practice for monitoring PI using IT monitoring packages?

    MikeReid

      Hi Everyone,

       

      Does anyone have any recommendations for what surveillance and corrective actions can be passed to an organisations' IT Server Support Team?

       

      We have a 24x7 Server Support Team that use BMC Patrol to monitor servers, and I am investigating whether we can get this team to provide some level of monitoring of PI as well. (We only have PI Support available during office hours.) This team has no PI knowledge, so I need to keep things as simple as possible.

       

      My initial thinking is that we will get this Team to monitor the list of PI Services I give them and notify us if anything stops; however, I would like to be able to give them more detailed instructions for actions they could take when a problem occurs outwith office hours. Obviously this is a bit more complicated, because while some services could just be restarted if they have stopped, others are interconnected and may require more complex actions.

      • All actions are manually initiated, i.e. we need to supply a help file which says what to be done for each trigger condition detected.
      • I am thinking of just classifying the services into two groups for those that can be restarted, and those that require a reboot (as the simplest way to do a full PI restart); but if anyone has more detailed recommendations i would be interested.
      • The Server Support Team can monitor Performance Counters too, I am thinking about looking at I/O Rates and Queue sizes etc, but any other sugegstions would be appreciated.
        • Re: Best Practice for monitoring PI using IT monitoring packages?
          andreas

          Hi Mike,

           

          this is a tough question. I prefer that people troubleshooting PI have some sort of PI training (you know, as someone how is infected with the PI 'virus' you always want to make sure your software get's the best treatment affordable ).

           

          Anyhow I see the requirement here but nevertheless there is not much you can do beside restarting a service that is not running (and make sure the guys don't complain that the shutdown subsystem is not running )

           

          What the Server Support Team clearly can do is monitioring the system and making sure that they call somebody if there is something more to fix than starting the service. In some cases starting the service or even rebooting the system will not help you at all (corrupted files etc.).

           

          As a guidance for the performance counters I suggest using the template provided with SMT. Setup the PI Performance Monitor Interface on the PI Server (use the ICU) and then use PI SMT 3 IT Points> Performance Monitor Points plug-in to create the tags and record the data to PI (If you use SMT 3 you can load a template for a PI Server that shows you all tags we consider as important for general monitoring) - monitor your system so that you have a reasonable baseline for the values. This would give your support team a basic idea of the health of the PI system.

           

          You might want to take a look at:
          http://techsupport.osisoft.com/techsupport/NonTemplates/Download%20Center/DownloadCenter.aspx?download_content=White+Papers
          To read about performance monitoring and increasing the reliability of the PI System.

           

          The training material is as well available:
          http://techsupport.osisoft.com/Techsupport/NonTemplates/Download%20Center/DownloadCenter.aspx?download_file=697FA2CE-2CC7-4362-AEA5-4C50799FA656
          and can provide you with some material to train your support team.

           

          Regards,

            • Re: Best Practice for monitoring PI using IT monitoring packages?

              Andreas

              I prefer that people troubleshooting PI have some sort of PI training (you know, as someone how is infected with the PI 'virus' you always want to make sure your software get's the best treatment affordable ).

               

               

              I couldn't agree with this more, especially if the PI system is critical to operations.

               

              Maybe if all the IT people joined vCampus and took advantage of the PI System Manager I online CBT in the Training Center.  Would give them enough knowledge of PI to understand what they are stopping/starting.

                • Re: Best Practice for monitoring PI using IT monitoring packages?
                  MikeReid

                  Thanks for the input guys

                   

                   

                   

                  I am under no illusions about the use of untrained IT support in this context hence my plan to limit them to restarting services and rebooting - outwith office hours. This is pretty much just on the hope that it may work, because we don't have PI 24x7 support, and I aint volounteering...

                   

                   

                   

                  I already have the PerfMon stuff from DevNet and various other things available in ProcessBook screens.

                   

                  In terms of surveillance of Performance Counters I am thinking about queue size for PI, and I/O rates on the basis that my most likely failure is loss / disconnection of the data sources. Most of the failure modes I've seen recently have been around buffering. Do I just have weak spot there, or are there any other specific failures modes that are common.

                   

                   

                   

                  The next step I am thinking about is how the PI team monitor for flat-line data.

                  • Right now we have a page of trends (noisy tags), so that it's easy to see what is / isn't updating
                  • I am think about adding PE, which look at the nmber of archive values in he last few hours, so that I can generate alarm states for animating a network diagram.
                  • Anybody tried this, or got other ideas? 
                  • (I need to look at tags for this because each data source is a DCS, and has multiple data sources internally.)
                    • Re: Best Practice for monitoring PI using IT monitoring packages?

                      Mike,

                       

                      Is AF an option for you?

                       

                      What I did recently is create a monitoring system using PI-AF, PI-OLEDB & Perfmon.  PI-AF was installed with PI-Notifications (linked through to Active Driectory) for the alerting side of the monitoring.  Notifications were based on behaviour changes in a whole host of perfmon tags, NetworkManager in particular.
                       
                      For data sources I took 2 approaches.
                      1) PointSource based quality (Bad tags) and updates (Stale tags).  For each pointsource create tags to store the Bad & Stale quality % and pull these into AF to be monitored & notifications sent accordingly.  What to look for here is behaviour changes, so the quality dipping by a certain % etc  What we initially noticed after implementing this is some systems update rates are lower than others, so no point saying all data sources have to have >= 90% updates within last 5 minutes.  You get to see some nice trends of how one DCS system connection failure has a knock on effect with the other point source (performance equations!).

                      2) Overall system quality & updates.  Similar to 1 but when you add in multiple servers across an enterprise you get a nice high level overview as a starting point.

                      Used PI-OLEDB to pull out the Stale & Bad tags (examples of this in the PI-OLEDB user guide) and wrote them to tags...very simple but very effective.
                      Don't forget about ISU too for interface status.

                       

                      Is an Enterprise Agreement an option?  What I am getting at here is ManagedPI, so you and/or your PI support engineers won't have to be up all night, let OSI do that.

                        • Re: Best Practice for monitoring PI using IT monitoring packages?
                          MikeReid

                          Hi Rhys,

                           

                           

                           

                          Right now my options are limited to what I can do internally with standard PI functionality.

                           

                           

                           

                          The client has been using PI for some time in an enclosed context, but things are changing very quickly now,  so EA and 24x7 Support are under consideration.

                           

                          We have one site which has plans to use AF / Notifications primarily to monitor a very large / complex data colelction network, so the outcome of that will propable help shape the course for the other sites in the longer term.