8 Replies Latest reply on Sep 5, 2014 10:50 AM by Gregor

    Coresight 2012, plotvalues and server load

    aldorexbraam

      Today while troubleshooting a performance issue at one of our customers sites with PI Server/Coresight i stumbled accross a puzzling issue. Insiights of the community are appreciated.

       

      Customer had the issue that the archive subsystem of the server would occasionally stop handling calls, The service would remain running but the message log would show exceptions like

       
      T:4448 PT(ptid=666,recno=666,loc=5,flag=4): ptwait tick/timeout=27000/270000, RID|LLL|LC|LI|NAL=0|2|2|0|253 Lock[253].uc=5 QL.sc/tid=0/4448 threads=2,0,0

      it turned out the each time (alltough many other tags were queried it would choke on two specifiec tags (pointid marked orange), which happened to contain archive data for roughly every second. 

       

      Observed was while monitoring the avg disk queue length that the server clearly had disk IO issue, thus leading to request time-outs (marked green)...So in this case Coresight was merely making a hardware issue with the PI server more apparent 

       

      So far all clear......

       

      Now comes the puzzling...

       

      While trying the mimic the load strain using PI procesbook we observed a dramatic server load difference in favor of PI processbook...thinking of this that made sense since PI PB uses the PlotValues() SDK call for its trending...

       

      This leads me to the path where I suspect that Coresight 2012 ( and possibly 2013) is NOT using that function for its data retrieval...

       

      Can anybody

       

      a) confirm/debunk  this to be the case ?  

       

      b) Share similar experiences on Coresight causing high server strain/ archive walks when trending data ? 

       

      Regards, 

       

      Aldo Braam 

       


       


        • Re: Coresight 2012, plotvalues and server load

          Hello Aldo,

           

          If you wouldn't have stated this happened at a customer site, I would immediately request Technical Support to open a ticket to troubleshoot this issue. I highly recommend opening a case at OSIsoft Technical Support for troubleshooting and possibly suggesting a solution even I have some info to share.

           

          PI Archive Subsystem is a multi-threaded application that attempts locking the point before read and write operations. "ptwait tick/timeout" messages this attempt is either taking long or indeed timing out. In the latter case both numbers are equal, like in the case you report: ptwait tick/timeout=27000/270000.

           

          The amount of archive threads is set by a Tuning Parameter. Please contact Technical Support with information about the hardware i.e. amount of physical cores available. The Technical Support Engineer will likely escalate your case and may return with recommendations what Tuning Parameters could be changed or at least what can be done to improve the situation.

           

          Indeed you've provided a good hint already. 1 archive event/second indicates Exception and Compression settings of the affected tag (ptid 666) should be reviewed.

           

          I also recommend reading KB 3223OSI8 - How to find and disconnect a rogue consumer or heavy hitter. Examining the PI Archive Subsystem thread details (# 3) should allow you to identify long running threads. The kind of thread tells you if issues are mainly with read or with write operations. This brings us to your next question.

           

          To my knowledge PI Coresight is also using PlotValues for trending. I will reach out to the PI Coresight product manager and ask him to confirm this. I don't like to go into details too much but 1 event/second is not only keeping PI Archive Subsystem busy with archive operation but also has an impact on data retrieval. This is also true when using PlotValues.

           

          Windows is reporting to built up a disk queue. This clearly indicates the disk is falling behind and being the performance bottleneck. Reviewing the hard disk configuration and thinking about a strategy can be useful here as well e.g. a dedicated disk for PI Archives, a different Raid mode etc.

           

          To summarize, I do not believe the issue is a pure client issue or specifically caused by PI Coresight. My very first candidate is the configuration (exdevpercent, compressing and comdevpercent) of the Tag with point ID 666.

          • Re: Coresight 2012, plotvalues and server load
            Roger Palmen

            Hi Aldo,

             

            Can't answer any of your questions,but i think the main question is how CoreSight requests data versus ProcessBook. Not commenting on any of these two, some questions pop up:

             

            Is a new request made every time a user zooms in on the timerange? Or changes to the size of the trend?

             

            The assumption here is that PlotValues is optimized if for every zoom or resize you redo you call to PI to get the optimum (minimal) set of PI Points based on timespan and pixels available. Great for initial plotting speed, killing of course for performance of zooming (but you could get smart with that using async calls).

             

            I bet someone is around with the answer.

              • Re: Coresight 2012, plotvalues and server load
                aldorexbraam

                Hi Roger & Gregor thanks for the valuable comments and feedback....I think you pointed out something interesting roger....when zooming is there is absolutely NO archive activity generated by Coresight...this leads me indeed to the conclusion that Coresight is NOT using plotvalues...you can also see this from the client responsiveness when zooming in......any comment from the Coresight product team ?

                  • Re: Coresight 2012, plotvalues and server load
                    tlebay

                    Hello all,

                     

                    Great discussion, sorry I am late to the party.  Coresight absolutely uses PlotValues to generate the trace line for a trend.  However, we also request summary data (min, max, avg) at the same time.  The real rub is that Coresight does this for every data item on the display because in some way or another we show this summary information on every symbol and we made the design decision to get all this information when the display loads rather than if/when a user asks for it.  I think it is the summary data request that accounts for the additional server load between Coresight and ProcessBook.

                     

                    Regarding the question if we make a new data request every time a user zooms in on the time range; the answer is yes.  This is necessary because the fidelity of the trace line and summary values can all be different when the time range is changed, even with a zoom-in scenario.

                     

                    I hope this helps.

                      • Re: Coresight 2012, plotvalues and server load
                        Roger Palmen

                        Great to get to know the inside details of how Coresight treats this. Probably by the time you start zooming, all data is already cached?

                          • Re: Coresight 2012, plotvalues and server load

                            Hi Roger,

                             

                            PlotValues takes AFTimeRange, Intervals and UOM's as input. The following remark exists e.g. in the AF SDK reference and explains what PlotValues does:

                             

                            For each interval, the data available is examined and significant values are returned. Each interval can produce up to 5 values if they are unique, the first value in the interval, the last value, the highest value, the lowest value and at most one exceptional point (bad status or digital state).

                             

                            Zooming into a PI Coresight trend means changing start- and end-time (AFTimeRange). Because the horizontal resolution remains the same (Intervals), I conclude PlotValues needs to go to the server again and hence does not take advantage of client side caching but maybe of server side caching, namely the PI Archive Subsystem cache.

                              • Re: Coresight 2012, plotvalues and server load
                                aldorexbraam

                                Guys.....appreciate the good discussion....that's while I love vCampus :-)

                                 

                                I followed Gregor's hint by investigating the compressions settings.....turns out that these are pretty conservative...These need some tuning.....the customer is understandingly cautious to change anything on his PI server right away.

                                 

                                Ok so that leaves me with the following conclusions:

                                 

                                1) Coresight DOES use Plotvalues, but because it also issues a summary request the actual effect is still an archive walk for the points touched (thus virtually nullifying the PlotValue() 'smartness' )

                                 

                                2) As is always the case compression settings need to be revisited over time....this is common 'best practice'. This has nothing to do with Coresight

                                 

                                3)  all we can do for now on the short term is throttling the Coresight load generated on the PI server by serialising the request issued....this lessens probably the Coresight responsiveness but lessens the risk of the PI server choking up on excessive calls being made in parallel.

                                 

                                I have done that by limiting the number of concurrent calls done by the Coresight dataservices

                                 

                                <behavior name="CoresightBehavior">

                                 

                                                   <serviceThrottling maxConcurrentCalls="8"

                                 

                                                                      maxConcurrentSessions="4"

                                 

                                                                      maxConcurrentInstances="16"/>

                                 

                                               </behavior>

                                  • Re: Coresight 2012, plotvalues and server load

                                    Hello Aldo,

                                     

                                    One other idea just coming into my mind requires me to provide more background info. PI Data Archives are organized into records of 1 k size. When an archive file becomes created, a primary record becomes created for every point existent. If this record is filled, a so called overflow record is assigned. If the overflow record is filled and more overflow records are needed, an so called index records come into the play that just store pointers to the records that contain the data. Maybe not really related to your issue but important to understand why sending data out of order is so expensive, events become stored in timely order. One can imagine that the higher the amount of index records is, the worse becomes performance especially when data arrives out of order but as well with data retrieval. We recommend to not exceed 5 or 6 index records (I hope to recall that number correctly) per tag and archive because of the performance impact.

                                     

                                    There's exists an archive check utility that analysis offline archive files and generates a report that has detailed information about archive usage for every single PI point. There exists a warning that this tool is hungry for resources but I haven't seen issues with the seldom cases I was using it. As far as I remember, it is not necessary to run the analysis on the production PI Data Archive node and I've used archive check utility on another installation with archive files I've copied from a PI Server backup.

                                     

                                    Again, I strongly recommend involving OSIsoft Tech Support.

                                     

                                    Another thought based on the information Tom provided:

                                     

                                    Tom LeBay

                                    The real rub is that Coresight does this for every data item on the display because in some way or another we show this summary information on every symbol and we made the design decision to get all this information when the display loads rather than if/when a user asks for it.

                                     

                                    I suggest using short periods (e.g. 1 hour) for PI Coresight displays - at least while the performance issues are not sorted out. For those displays not containing a trend or a table with the trend column shown, there's no value in selecting a larger period.You can make users with the right to modify PI Coresight display sensitive to the issue and suggest to not leave displays with large time periods selected.  

                                     

                                    EDIT: Add reference to KB00923 - How do you determine which points are archiving the most events