8 Replies Latest reply on Oct 14, 2016 4:16 PM by gregor

    Receiving sporadic "Timeout on PI RPC or System Call" errors when querying PI data using AF SDK

    TrentHuhn

      I am attempting to retrieve one-minute averages for a week's worth of data using the AFAttribute.Data.Summaries() function. Over 90% of the time, the call executes successfully with no issues. However, a small percentage of the time, the call fails to retrieve some or all of the expected data. Here is the code I am using:

       

                  AFTimeSpan timeSpan = AFTimeSpan.Parse("1m");;
                  AFTimeRange tempRange = timeRange;
                  IDictionary<AFSummaryTypes, AFValues> summaryValues;
                  AFValues valuesDataAverage = new AFValues();
                  do
                  {
                      tempRange.EndTime = tempRange.StartTime.UtcTime.AddDays(7) > timeRange.EndTime.UtcTime ? timeRange.EndTime : new AFTime(tempRange.StartTime.UtcTime.AddDays(7));
                      summaryValues = attributeToGetDataFrom.Data.Summaries(tempRange, timeSpan, AFSummaryTypes.Average, AFCalculationBasis.TimeWeighted, AFTimestampCalculation.MostRecentTime);
                      valuesDataAverage.AddRange(summaryValues[AFSummaryTypes.Average]);
                      tempRange.StartTime = tempRange.EndTime;
                  }
                  while (tempRange.EndTime != timeRange.EndTime);
                  AFValue firstVal = valuesDataAverage[0];
                  if(firstVal.IsGood)
                       logger.Info(string.Format("Added {0} points for {1} (First Value: {2} - {3})", valuesDataAverage.Count, attributeToGetDataFrom.GetPath(), (double)firstVal.Value, firstVal.Timestamp)); 
      

                            

       

       

      where attributeToGetDataFrom is an AFAttribute object and timeRange is an AFTimeRange containing the time range in question. The code should break this time range into one-week chunks and call the Summaries method in succession in order to populate the valuesDataAverage object.

       

      Here is a sample of the results I get when running this for a month's worth of data over several different AF attributes:

       

      2016-10-11 11:18:41,286 INFO  - NRG.PIBIAS.LostEnergy.LostEnergyCalculationManager - Added 33121 points for \\WNTRNAFP01\Corporate\Texas Wind\STWD\STWD1\SUB1\FDRBKR-52-F1\BFDR1\WTG26|WindSpeed (First Value: 4.69192561320834 - 9/1/2016 12:01:00 AM)

      2016-10-11 11:18:42,434 INFO  - NRG.PIBIAS.LostEnergy.LostEnergyCalculationManager - Added 43200 points for \\WNTRNAFP01\Corporate\Texas Wind\STWD\STWD1\SUB1\FDRBKR-52-F1\BFDR1\WTG26|WTOperationState (First Value: 0 - 9/1/2016 12:00:00 AM)

      2016-10-11 11:18:43,637 INFO  - NRG.PIBIAS.LostEnergy.LostEnergyCalculationManager - Added 43200 points for \\WNTRNAFP01\Corporate\Texas Wind\STWD\STWD1\SUB1\FDRBKR-52-F1\BFDR1\WTG26|ErrorCode (First Value: 0 - 9/1/2016 12:00:00 AM)

      2016-10-11 11:22:52,541 INFO  - NRG.PIBIAS.LostEnergy.LostEnergyCalculationManager - Added 33121 points for \\WNTRNAFP01\Corporate\Texas Wind\STWD\STWD1\SUB1\FDRBKR-52-F1\BFDR1\WTG26|ActivePower (First Value: 76.5700232357419 - 9/1/2016 12:01:00 AM)

      2016-10-11 11:27:38,022 INFO  - NRG.PIBIAS.LostEnergy.LostEnergyCalculationManager - Added 2884 points for \\WNTRNAFP01\Corporate\Texas Wind\STWD\STWD1\SUB1\FDRBKR-52-F1\BFDR1\WTG27|WindSpeed (First Value: OSIsoft.AF.PI.PITimeoutException: [-10722]

      PINET: Timeout on PI RPC or System Call.

         at OSIsoft.AF.PI.PIException.ConvertAndThrowException(PIServer piServer, Exception ex, String message)

         at OSIsoft.AF.PI.PIPoint.Summaries(IList`1 intervalDefinitions, Boolean reverseTime, AFSummaryTypes summaryType, AFCalculationBasis calcBasis, AFTimestampCalculation timeType)

         at OSIsoft.AF.Asset.DataReference.PIPointDR.Summaries(AFTimeRange timeRange, AFTimeSpan summaryDuration, AFSummaryTypes summaryType, AFCalculationBasis calcBasis, AFTimestampCalculation timeType)

         at OSIsoft.AF.Data.AFData.SummariesEvaluate(AFTimeRange timeRange, AFTimeSpan summaryDuration, AFSummaryTypes summaryType, AFCalculationBasis calcBasis, AFTimestampCalculation timeType) - 9/1/2016 12:00:00 AM)

      2016-10-11 11:27:39,272 INFO  - NRG.PIBIAS.LostEnergy.LostEnergyCalculationManager - Added 43200 points for \\WNTRNAFP01\Corporate\Texas Wind\STWD\STWD1\SUB1\FDRBKR-52-F1\BFDR1\WTG27|WTOperationState (First Value: 0 - 9/1/2016 12:00:00 AM)

      2016-10-11 11:27:40,202 INFO  - NRG.PIBIAS.LostEnergy.LostEnergyCalculationManager - Added 43200 points for \\WNTRNAFP01\Corporate\Texas Wind\STWD\STWD1\SUB1\FDRBKR-52-F1\BFDR1\WTG27|ErrorCode (First Value: 0 - 9/1/2016 12:00:00 AM)

      2016-10-11 11:29:46,078 INFO  - NRG.PIBIAS.LostEnergy.LostEnergyCalculationManager - Added 43200 points for \\WNTRNAFP01\Corporate\Texas Wind\STWD\STWD1\SUB1\FDRBKR-52-F1\BFDR1\WTG27|ActivePower (First Value: 197.568193716581 - 9/1/2016 12:01:00 AM)

       

      It appears that sometimes, it will only retrieve a small subset of the expected values (i.e. 2884) or, more commonly, will return only the first 33121 values (a full month's worth would be 1440 minutes/day x 30 days = 43200 values).

       

      I took a look at KB3224OSI8, which recommended changing a couple tuning parameters. I adjusted the Archive_MaxQueryExecutionSec parameter from it's default value of 260 seconds to 600 seconds, but this did not seem to make any difference (it seems like any query that has issues will run for approximately 5 minutes before returning). I'm wondering what other things I could try (either at the PI archive level or within the AF SDK) that would increase the reliability of these queries. One other option I've considered is introducing a loop that will retry the query (up to a certain max # of tries) if it does not contain the expected number of results.

       

      Thanks,

      Trent

        • Re: Receiving sporadic "Timeout on PI RPC or System Call" errors when querying PI data using AF SDK
          gregor

          Hello Trent,

           

          Can you confirm all attributes are using a PI Point Data Reference? If this is not the case, what other Data Reference(s) are used?

           

          How many raw values are stored for the period in question (WTG26|ActivePower and WTG27|WindSpeed)?

          Is this a production or a development PI System?

          In case raw data is stored with PI Points, what are the settings of the following PI Point attributes?

          • exdevpercent
          • compressing
          • compdevpercent

          Are the raw values using sub second timestamps?

          What's the use case for retrieving 1 minute averages for a complete month?

            • Re: Receiving sporadic "Timeout on PI RPC or System Call" errors when querying PI data using AF SDK
              TrentHuhn

              Gregor,

               

               

              1.       Yes, all attributes being retrieved are PI Point data references.

               

              2.       Is there a good way to see how many raw data values there are for a PI tag over a certain period? If I try to pull a full month’s worth of data in DataLink, I get the same message mentioned in my original post (“[-10722] PINET: Timeout on PI RPC or System Call.”). I was able to pull a week’s worth of data however, which contained 503,306 data values; extrapolating that out to a full month, I would estimate ~2,085,000 values each month.

               

              3.       This is our production server

               

              4.       All tags have similar PI point attributes:

               

              a.       Exdevpercent/compdevpercent = 0

               

              b.      WindSpeed, ActivePower: Compressing = 0

               

              c.       WTOperationState, ErrorCode: Compressing = 1

               

              5.       How would I tell if the raw values use sub-second timestamps? Looking at a sample of the data, it appears as though it’s recording a maximum of one event every second:

               

               

              6.       We use one minute averages in order to perform some downstream reporting on the data; this includes metrics such as availability factor, which requires at least one-minute granularity. Everywhere else in our code assumes one-minute resolution on data, so we need to maintain this same resolution here when retrieving PI data.

               

              Appreciate your assistance,

              Trent

                • Re: Receiving sporadic "Timeout on PI RPC or System Call" errors when querying PI data using AF SDK
                  gregor

                  Hello Trent,

                   

                  I guess the issue you are experiencing is due to the amount of time it takes to build 1 minute averages over a month with ~ 2 million events archived and the Data Access timeout specified client side on the other hand.

                   

                  PI Data Archive Exception and Compression exist to allow reducing the amount of events that make it to the archive without loosing information. Using both in a meaningful way helps to gain optimal performance. I don't like to go into details but refer you to Exception and Compression Full Details video at OSIsoft's Youtube learning channel and the PI Data Archive flow chapter, especially Exception Reporting and Compression Testing in PI Server 2016 documentation at Live Library. With reducing the amount of archived events, you will reduce the time it takes to compute averages dramatically. Finding reasonable settings for Exception and Compression usually requires a good understanding of the process data. While Exception Reporting is used to remove signal noise, Compression is mainly to avoid additional archived events in case there is no or just a minimum change in a signals slope.

                   

                  PI Archive Subsystem stores the second offset between 2 events in case timestamps are without sub seconds. If timestamps are with sub seconds, the full timestamp will be stored which a) deserves more space and b) decreases performance. Because of this we suggest to avoid archiving sub second timestamps unless this is a requirement and it again depends on the kind of data and if there's additional value in having the sub second portion of a timestamp. You can use PI DataLink -> Compressed Data to retrieve the data but please make sure PI DataLink is configured to show the sub second portion (Settings -> Time format = dd-mmm-yy hh:mm:ss.000)

                   

                  Even my guess is that you run into a Data Access timeout, there are several Tuning Parameters that protect the PI Data Archive against expensive queries (please see the KB you linked). We recommend against changing these parameters unless advised by Technical Support. A PI System administrator is able to change those tuning parameters but as we are used to say, great power comes with great responsibility and hence we suggest aiming to understand the nature of an issue first before driving any screw. The Data Access timeout is a connection specific setting and can be changed client side e.g. using PISDKUtility.exe but rather than changing the setting I would suggest to brake queries into smaller chunks.

                   

                  To answer the question you raised with # 2, you can retrieve a Count similar to the Average as an AFSummary (AFSummaryTypes.Count).

                   

                  Please allow me to summarize my recommendations for you in the order of importance:

                   

                  1. Review Exception and Compression settings
                  2. Avoid sub second timestamps whenever possible
                  3. Query smaller periods to avoid query timeouts
                  2 of 2 people found this helpful
                    • Re: Receiving sporadic "Timeout on PI RPC or System Call" errors when querying PI data using AF SDK
                      TrentHuhn

                      Gregor –

                       

                      Appreciate the response. I understand how compression and exception settings work, and I agree that ideally we could configure these to be a little more efficient. However, management on the business side has stipulated that they want one-second resolution data, and so my hands are somewhat tied in this regard. If this is indeed the reason why these queries are failing, it’s possible I may be able to present this as a business case for re-examining our standard exception/compression settings.

                       

                      In the meantime, I will try breaking the query down into smaller chunks (maybe 1 day’s worth of a data at a time) and see if this yields better results.

                       

                      Thanks,

                      Trent

                        • Re: Receiving sporadic "Timeout on PI RPC or System Call" errors when querying PI data using AF SDK
                          Rick Davin

                          Hi Trent,

                           

                          My colleague Gregor Beck  has provided a most excellent answer.  His answer was so good that it is worthy of its own blog post that I would gladly bookmark.  It was so good that I am hesitant to add to it.  However, there may be a possible code solution for you specifically because you said your attributes all use the PIPoint data reference.  It will require a few small changes in code, though admittedly it may require AF SDK 2.7 or better since I will rely upon the PIConnectionInfo.OperationTimeOut Property.  While our recent LiveLibrary Help states that this property has existed since AF 2.5, and shows it is settable, if memory serves me correctly it was not settable until 2.7.

                           

                          Keep in mind I have tested this on my end.  I am just working on recall from previous code.

                           

                            // Put this before your do loop
                              PIPoint tag = attributeToGetDataFrom.PIPoint;
                              PIServer dataArchive = tag.Server;
                              dataArchive.ConnectionInfo.OperationTimeOut = TimeSpan.FromMinutes(4);
                          
                          
                              // Change this inside your do loop
                              summaryValues = tag.Summaries(tempRange, timeSpan, AFSummaryTypes.Average, AFCalculationBasis.TimeWeighted, AFTimestampCalculation.MostRecentTime);
                          

                           

                          The AFConnectionInfo does not have the equivalent of the PIConnectionInfo.OperationTimeOut property.  So the trick is to grab the PIPoint from the attribute, grab the PIServer from the PIPoint, and then set the OperationTimeOut to a higher but not crazy value.

                           

                          This requires a slight change to the Summaries call in that you will work directly off a PIPoint rather than an AFAttribute.Data object.

                           

                          Perhaps this can help.  It's a small enough change that is worth a try.

                           

                          Regards,

                          Rick

                          1 of 1 people found this helpful
                  • Re: Receiving sporadic "Timeout on PI RPC or System Call" errors when querying PI data using AF SDK
                    Rick Davin

                    I see no downside to taking smaller bites.  While in general we promote bulk calls as a best practice, this falls apart when its too big and bulky.  Reducing the query into smaller ranges is probably the safest thing you can do on your part.  Changing the OperationTimeOut is also fairly safe, as that setting is only for the user running your code.  When dealing with large amounts of data, there is no one size fits all solution.  You'll have to experiment with your own environment, perhaps with a combination of smaller ranges AND a slightly larger timeout.

                     

                    Gregor's questions were spot on trying to understand how much data is being summarized behind the call.  My only other advice is just more practical: when dealing with that much data, a little patience and understanding goes a long way.