2 Replies Latest reply on Apr 28, 2017 11:21 AM by 529931

    PI tags for health checks

    529931

      Hello All,

      We are trying to monitor the health of PI and AF servers through Process book displays. We have created Element Relative Process book displays and mapped the AF attributes on to them.

      We have perfom interface installed on our servers and we have created the perfmon tags for  majority of the parameters that we needed.

      However we would like to know if we have any possibilities of checking the below functionalities through PI tags.

       

      1. To check the status of all PI sub system services (Started/Stopped). We would like to have a PI tag indicating the status of the service.

      2. To check the last modified timestamp for Primary Archive. A PI tag that will show the last modified time for the archive, which should mostly be the current time. This will help us to understand if data is getting archived or not.

      3. To check the number of critical errors in PI message logs. A PI tag indicating count of critical error messages in the logs. This count should be 0 in normal conditions.

      4. To check the status of ACE modules (Green or Red) from ACE manager.

       

      Thanks,

      Nandhini.S

        • Re: PI tags for health checks
          gmichaud-verreault

          Hi Nandhini,

           

          1. To track if a subsystem is up and running, the best way is to take a look at the \\<Server>\Process(<subsystem>)\Elapsed Time. If the value of the counter has not updated in X amount of time, that would indicate that it might not be running.
          2. To monitor the health of the archive subsystem (answering the "is the data being archived at this time" question), take a look at the following counters:
            PI Archive Subsystem\Archived Events/sec (Rate of successful event addition to the archive)
            PI Archive Subsystem\Events Read/sec (Rate of archive events read)
            PI Snapshot Subsytem\Snapshots/sec (Events sent to the snapshot) -> If this counter is non-zero and the Archived Events/sec it would possibly point at an issue with the Event Queue.
          3. The PI System Tray can be used for exactly that, but I can't recall if we do have a counters tracking errors / error gravity. Someone else might be able to chime in on this.
          4. KB00597 - PI ACE performance monitoring does a good job at detailing some ways to monitor PI ACE health.
            You may also find the this thread useful

           

          Additional Resources:

           

          Regards,

           

          Gabriel

          1 of 1 people found this helpful
            • Re: PI tags for health checks
              529931

              Hello Gabriel,

              Thank you very much for the update.

              We have tried implementing the above checks (point 1 &2), and it is working fine. We are also trying to get a perfmon interface installed for ACE server.

               

              Meanwhile I have few queries w.r.to monitoring of PI Backup. We are trying to use '*_PI Backup Subsystem_Last Backup Failed' tag to determine the status of last backup. We have three kinds of backup configured in our server (Incremental, Differential and Full backup).

              I would like to know which backup status, this PI tag corresponds to? Is it the incremental one?

               

              Thanks for your help in advance.