Introduction

 

In a blog post earlier this year - Troubleshooting PI Analysis Service Performance Issues: High Latency - I discussed how to leverage detailed diagnostic information available for PI Analysis Service to troubleshoot performance-related issues. The evaluation statistics exposed in the Management Plugin can be used to identify analyses that are expensive and may be putting strain on the system.

 

This troubleshooting approach works in many cases, but has following drawbacks:

  • The statistics reported are accumulated over the duration that PI Analysis Service has been running. When service has been running for a long time, this information can be heavily skewed in favor of historical data, making it hard to troubleshoot recent degradation in performance.
  • The process is manual and we all have better things to do than sitting in front of the computer screen, trying to make sense of this information.

 

To this end, users want out-of-the-box health monitoring or to be able to access this information programmatically, so that they can build some sort of monitoring or reporting application on their own. In future releases, we plan to address both -- out-of-the box health monitoring and allowing users to programmatically access PI Analysis Service health information. In the meantime, if you are using PI Server 2017 or later release, you may still be able to access some of this information “programmatically”. Let me show you how.

 

Tracing Analysis Evaluation Events

 

Starting with PI Server 2016 R2 release, we started using NLog for tracing messages from the Analysis Service. If you are a .NET developer, chances are you are already familiar with NLog:

 

NLog is a free logging platform for .NET, NETSTANDARD, Xamarin, Silverlight and Windows Phone with rich log routing and management capabilities. NLog makes it easy to produce and manage high-quality logs for your application regardless of its size or complexity [source].

 

NLog is very flexible and what makes it powerful is that there are many built-in Targets, that can be used to pass log messages to another destination. In addition to targets, NLog also lets you define flexible Logging Rules, where we define the list of targets that should be written to for each combination of source/logger name and log level. When a rule matches, log messages are directed to target(s) in that rule. By default, analysis service uses File-Target, that supports cool features like automatic archiving, compression, etc. The file target is used for a set of predefined logging rules as shown here (you can find these in the configuration file $PIPC\Analytics\PIAnalysisProcessor.exe.config):

 

    <rules>
      <logger name="OSIsoft.AN*" levels="Fatal" writeTo="eventLog" />
      <logger name="*:Evaluation" minlevel="Info" writeTo="logfileAsync" final="true" />
      <logger name="*:Scheduling" minlevel="Info" writeTo="logfileAsync" final="true" />
      <logger name="*:Recalculation" minlevel="Info" writeTo="logfileAsync" final="true" />
      <logger name="*:DataRetrieval" minlevel="Info" writeTo="logfileAsync" final="true" />
      <logger name="*:Configuration" minlevel="Info" writeTo="logfileAsync" final="true" />
      <logger name="*:Performance" minlevel="Info" writeTo="logfileAsync" final="true" />
      <logger name="*:PerformanceEvaluation" minlevel="Info" writeTo="logfileAsync" final="true" />
      <logger name="OSIsoft.AN*" minlevel="Info" writeTo="logfileAsync" />
    </rules>

 

The Performance and PerformanceEvaluation rules were only added in PI Server 2017 release, so the steps described here require that version or higher.

 

Of these, the rule relevant here is with logger name: PerformanceEvaluation. When the log level is turned up all the way to Trace, analysis service uses this logger to write a message for each analysis evaluation. The message written to the log file (typically $ProgramData\OSIsoft\PIAnalysisNotifications\Logs\pianalysisprocessor-log.txt) looks like this:

 

2017-12-18 10:41:07.3432|TRACE|ANPerformanceTrace:PerformanceEvaluation|Type: AnalysisEvaluated, Data: {

  "AnalysisName": "\\\\DemoServer\\Database10\\Element1\\Element1_2\\Analyses[Analysis2]",

  "AnalysisId": "8228d87c-dac1-11e7-87d0-9cb6d0dc024e",

  "GroupId": "8b81d68e-da22-4a73-8174-f715aa7c83c8",

  "TimeclassId": "603223b0-ad81-4a75-b84b-e5cfb5254edb",

  "EvaluationData": {

    "ExeTime": "2017-12-18T18:41:02Z",

    "ElapsedMilliSeconds": 0.00948147283740368,

    "LagMilliSeconds": 5343.2489000000005,

    "AnalysisCount": 1

  }

}

 

This message contains following important information:

  • AnalysisName: Path of the analysis.
  • AnalysisId: AF Analysis Id.
  • GroupId: AF Analysis Template Id (unless the analysis doesn't derive from a template)
  • ExeTime: Trigger time for the evaluation
  • ElapsedMilliSeconds: Amount of time spent (in milliseconds) in evaluating this trigger event
  • LagMilliSeconds: Lag (in milliseconds) in evaluating this trigger event, includes the evaluation wait time (default=5 seconds)

 

Enabling this trace for a production system over a long period of time is not a good idea, as this can really cause your log file to grow fast!

WebService Target for Listening to Evaluation Events

 

While tracing analysis evaluation events is possible by just changing the trace level in the configuration file, this is clearly not the best way to access this information, as this will result in a large log file. This is where NLog's flexibility comes in handy. NLog comes with a built-in WebService Target, that allows us to write each trace message to a web service end point. All we need to do is to add a WebService target to the NLog configuration file, and direct the PerformanceEvaluation logger to write to this web service target. Typically, it's not a good idea to add custom NLog configuration to the PIAnalysisProcessor.exe.config file, as this file is overridden during upgrades. Recommended way to inject custom NLog configuration is to edit the user file - Analysis.UserNLog.config, which is located in the same folder. The only thing you need to do is to replace the contents of the file on your machine with the following:

 

<?xml version="1.0" encoding="utf-8"?>
<!-- Copy this file to the folder PIPC\Analytics folder -->
<nlog xmlns="http://www.nlog-project.org/schemas/NLog.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" autoReload="true" throwExceptions="false">
  <targets>
    <target xsi:type="AsyncWrapper" name="wsAsync" queueLimit="200000" fullBatchSizeWriteLimit="20" timeToSleepBetweenBatches="50" overflowAction="Discard">
      <target type='WebService'
          name='ws'
          url='http://localhost:9000/api/logmessage'
          protocol='HttpPost'
          encoding='UTF-8'   >
        <parameter name='Level' type='System.String' layout='${level}'/>
        <parameter name='Message' type='System.String' layout='${message}'/>
      </target>
    </target>
  </targets>
  <rules>
    <logger name="*:PerformanceEvaluation" levels="Trace" writeTo="wsAsync" final="true" />
  </rules>
</nlog>

 

Note that the WebService target is configured to write to end point address: http://localhost:9000/api/logmessage. You can change it based on the address you choose to listen to. Now all you need to do is to host a web service that listens to POST messages at this address. I have some sample code posted here. The sample uses OWIN to self-host ASP .NET Web API 2 application.

 

Enabling this tracing will have some overhead on the Analysis Service side, but that should be minimal and should be mitigated by the fact that the WebService target used here is wrapped in AsyncWrapper, which is using a queue limit of 200K and an overflow action of 'Discard'. If you choose to change these settings - such as increase the queue size or use overflowaction=Block or Grow, and do expensive things while processing the POST event in your web service implementation - bad things will happen. So you should be a bit careful when changing the settings and make sure that you don't do expensive blocking operations when handling POST event.

If you pulled the sample code and configured everything correctly, this is what you should see:

 

 

If all of this works, you can then use the analysis evaluation events to build a simple monitoring application - that can act when say an evaluation reports high lag or takes unreasonably long time to finish. In future releases, we plan to add more information to this event. For example:

  • Status - Success, Failure, Warning, Skipped
  • Error Information - that describes why certain evaluation resulted in Calc Failed.

 

With inputs from Stephen Kwan, David Pugal and Giselle Lee. Hope this information is useful.