19 Replies Latest reply on May 29, 2015 7:52 AM by larsoleruben

    Compression and Exception

    larsoleruben

      Hi

       

      We are having some discussion about settings of compression and exception. Our dilemma is classic: no loss of information, but as few data as possible. However, 2days trend is moving twards "getting everything with the highest possible resolution"  (driven by the business who has heard of such tools as Hadoop, etc) , which of course leads to enourmous amont of data. Right now we have decided getting most of the data at a resolution of 1 sec.

       

      Now to my question: I have been looking at 3rd party tools, like the one from patterndiscovery, which could help me optimize the settings. Does anybody have any experience with those kind of tools? And what are your experiences?

       

      Is there any OSISoft native tools for this?

        • Re: Compression and Exception
          asorokina

          Hi Lars,

           

           

           

          The SunCoke shared their experience with the Pattern Discovery tool in the webinar which you can find on our partners website: http://partners.osisoft.com/news.aspx?id=319

           

          Similar tool from different partner: http://partners.osisoft.com/solutions.aspx?id=143

           

          And there is no official native OSIsoft tool for that yet.

           

           

           

          Regards,

           

          Anna 

            • Re: Compression and Exception
              Rhys Kirk

              Disk space is cheap, loss of data is priceless.

               

              Perhaps you only need compression for consecutive duplicates & ramping values, but for the most part you don't need to apply compression. Have you considered that route?

                • Re: Compression and Exception
                  larsoleruben

                  Yes, but considering the OSISoft recommendations (relationship between snapshot and archived between 1:3 and 1:10) I am hesitant to do it. But if you have experince with no compression and no performance problems, it will be a welcome input

                    • Re: Compression and Exception
                      mhalhead

                      Hi Lars,

                       

                      While I agree with Rhys that disk space is cheap compression is more about efficiency than space. The more events the higher the load on things like the calculation engines, the clients, reporting, .... Basically there is more data to push across the wire. The reality is that process data is generally very sparse; e.g. 86400 values (1s logging) over a day can be adequately described with say 8000 points.

                       

                      I have tried a few of the compression tuning tools, admittedly not the tool mentioned. To be blunt I've been under whelm by these tools. We have largely stuck to a set of empirical rules which is working okay. For tuning I'm more interested in the points with a low point count than with a high point count. for this I simply pull a point count of a period of time.

                       

                      PI can happily handle enormous qualities of data. Will no compression impact performance? In my opinion yes. In most case not log performance (unless you are trying to log 23 million tags) but it will probably impact client performance.

                       

                      Comment about Hadoop. While everyone has heard the big data hype few really understand much about it. With tools like Hadoop you one of the first things you typically do (not being an Hadoop expert) is run a Map Reduce to get the data; in other words you return a reduced data set for analysis. PI is just doing the reduce bit upfront.

                        • Re: Compression and Exception

                          Amen to everything Michael has said.  

                            • Re: Compression and Exception
                              mjarvis

                              Michael -

                               

                              I'm trying to understand your position on compression. In your experience, does the plotvalues function used by visual tools like PI ProcessBook adequately reduce the number of values? For reporting in applications like PI DataLink, do you find that the calculation results are slower if you do not have compression set to on? I realize I'm playing devil's advocate here, so please feel free to tell me how you really feel.

                               

                              The reason I'm asking is that I'm researching some functionalities around data storage and transmission. I've found that many PI Points are really not very active at the snapshot, and the median data density to the snapshot is approximately 4.7 events/100 points/sec.

                               

                              Mike Jarvis

                                • Re: Compression and Exception

                                  Dear all,

                                   

                                  I don't know the exact historical background for Exception and Compression but assume that it was a requirement in early PI days with the hardware existing these days and as well because of disk costs. The performance of current hardware, overall costs e.g. disk costs are a lot less these days but Exception and Compression are still more than just a cool feature.

                                   

                                  Please allow me to quote from PI Server 2012 System Management Guide:

                                   

                                  You can tune your PI points for maximum efficiency with the configurable attributes that specify compression and exception reporting. The configuration of these specifications impacts the flow of data from the interface node to the server for that point (exception reporting) and the efficiency of data storage in the archive for that point (compression testing).

                                   

                                  and

                                   

                                  Using compression gives you the flexibility to configure on a per-point basis, with the option of archiving relevant information. Compression greatly impacts performance, bandwidth, and data access. It is not intended only for saving storage space. You want to store only meaningful data: no noise, no rounding, and no averages. OSIsoft's compression method is designed to remove noise out of the signal, because noisy signals are prevalent in process data. PI Server stores the actual values received from the source, not interpolations or averages or approximations as do some alternative compression methods.

                                   

                                  Please note that performance is the main argument for using Exception and Compression. Even disk space might be cheap, I've often seen disk performance being the bottleneck especially with RAID5 configurations.

                                   

                                  Please keep in mind that PI Data Archive is not only supposed to store the data in archives. It's also supposed to provide users with data when they are asking for it. Many of us like drinking coffee or tee but nobody finds poor performance acceptable when it justifies to go and fill the cup again. In Michael's example the amount of events/day is reduced by a factor greater than 10. The difference between 8,000 and 86,400 appears small in the computer age but please do the exercise and sum this up for a week, a month, a year ... When users query a period of data, PI Archive Subsystem needs to retrieve the data if it's not cached. The Archive cache is less effective the more frequent events become stored. Accuracy versus performance is a tradeoff but where's the value in storing more events than needed?

                                   

                                  How many vertical points do your ProcessBook, PI Coresight or PI Web Parts trend show? What is the default period for these trends? Assume the vertical display resolution is 2,000 points and you are trending 24 hours. Ok, you may want to zoom in but would you require the "accuracy" of 86,400 events or does 8,000 sound more reasonable?

                                   

                                  There's a difference between the accuracy of a sensor and display accuracy or what's provided through the DCS. Assume display accuracy shows with 3 decimal places but looking into the sensors data sheet you recognize that it is only accurate with 1 decimal place. But the 2 additional display digits are moving. What does that mean? Would you trust in these numbers or consider them "misleading". So let's get rid of this signal noise by setting exdevpercent accordingly.

                                   

                                  We also deal with a lot of different kind of measurements. We know slow and fast changing signals with a lot of gradations in between. Who samples outside temperature, humidity and similar at 1 second frequency? 

                                   

                                  You like to construct a straight line. If you assume a coordinate system this could be anything from a horizontal to an almost vertical line. How many points would you draw 2 or 20? I hope you would answer: "Just 2". This is exactly what PI does when you set compression=1 and compdevpercent=0. As long as the slope remains constant values become compressed. When the slope changes the (snapshot) value before the slope change becomes pushed into the archive. Therefore please never disable compression but set compdevpercent=0 when you don't like to lose precision.

                                   

                                  I believe Technical Support has some "basic" recommendations for exdevpercent and compdevpercent but they can only be used as an orientation because we don't know any details of your process or the sensors. Analyzing signals and evaluate exdevpercent and compdevpercent settings programmatically is at least an interesting and logic approach. I would be interested to learn from users that have experience with the tool from Patterndiscovery.

                                   

                                   

                                  4 of 4 people found this helpful
                              • Re: Compression and Exception
                                Rhys Kirk

                                Compression Deviation of 0 is more common than you may realise, a lot of companies have it as a standard.

                                 

                                If data is not changing, or is on a steady ramp then you already efficiently store data with compression deviation of 0. If data is changing, data should be stored. If your analyses or users only require sampled data there are methods for extracting the required sampled data, or you surface the data via other methods as they require it. You store more and extract 'fit for purpose' data from that; filtering data is possible, filling in missed data is not (at least extremely difficult).

                                 

                                A PI Event is 12 bytes (roughly), if a process signal is not deviating for 60 seconds then with CompDev 0 with a 1 second scan frequency then you'll have 2 PI Events archived - 24 bytes. So if your process data is sparse then you'll naturally only have your 8,000 PI Events over the course of 24 hours...but during noisy periods of process data you have everything being logged - no need to auto-tune your compression settings. I used to often hear comments from users that they wish they had more granular data for past events that occurred so they could retrospectively analyse those events in more detail.

                                 

                                 

                        • Re: Compression and Exception
                          ernstamort

                          Setting the exception and compression setting is not that difficult if you consider the following two points:

                           

                          1) The OSIsoft Compression algorithm is far superior to the exception deviation algorithm. Therefore data reduction should mostly happen during compression, or in other words exception deviation > 5 x compression deviation.

                          2) The OSIsoft Compression algorithm is also much better than polling without compression. For example if you poll with a scan rate of 5 sec. or 12 points/min., you would be better of  polling at a scan rate of 1 sec. and compress the data down to 12 points/min.

                           

                          So you basically have to set the number of points that you want to archive per minute and adjust the compression settings accordingly. This process can be performed off-line for example with PI-DataLink in Excel.

                           

                          Hope this helps

                           

                          Holger

                            • Re: Compression and Exception
                              pmackow

                              My 3 cents:

                              1. Forget exception unless you have network issues, Set excdev to zero. Leave excmin and excmax with default values. It costs nothing in terms of performance and disk usage and saves you one parameter to be aware of.

                              1a. If you use PI OPCInt, set advise mode if supported. The exception test will be done for you in OPC Server.

                              2. Start with compdev set to very low nonzero value, say 0.1% of span

                              3. After 1-2 weeks of data collection, identify tags that have most events recorded per day.

                              4. Perform fine-tuning only on tags found, following OSIsoft guidelines.

                            • Re: Compression and Exception
                              gap4203

                              We have purchased the Exele PI Tuning Tools and I have been very pleased.  We don't have compression set on many of our tags, and as expressed above, this can definitely hamper client performance, especially when engineers/analysts want to pull year(s) worth of data.  The hard part is convincing the engineers that they will still be able to make the same business decisions with compressed data.  I've found this tool to be useful because you can plug in different compression values and then show the original and compressed data.  My plan is to lump similar measurements together and gather consensus on the appropriate amount of compression to apply across a group of tags.  The benefit to the end users is that they are able to retrieve data into client applications much faster.

                              1 of 1 people found this helpful
                              • Re: Compression and Exception
                                dtayler

                                One more thing to add regarding exception settings (Gregor touched on this above) - exception settings at the interface are there really to filter out noise and therefore only send relevant data across the network. What do I mean by noise? Anything below the accuracy threshold of the instrument. Compression settings should filter out (at a bare minimum) any consecutive values.

                                  • Re: Compression and Exception
                                    ernstamort

                                    Both exception and compression algorithm are filtering out noise. The compression algorithm just does a much better job reproducing the signal, whereas the exception algorithm can lead to signal distortion.

                                    I guess the band filter was selected in the past because it is a much faster algorithm with lower overhead.

                                      • Re: Compression and Exception
                                        dtayler

                                        Hi Holger,

                                         

                                        You're right about them both being able to filter out noise (depending on settings). If the exception settings filter out any data that is below the instruments accuracy threshold then you are left with only "good" data at the PI Server. This is the intention of exception settings on the interface.

                                         

                                        Compression is there to enable you to store only the data you need in order to reliably reproduce the data. Depending on many factors, this will be different for different customers. Indeed, due to regulatory reasons, some customers are not able to perform compression at all and must store all values. Many years ago disk space was a concern but not so much now. The two algorithms work in a similar but quite different way. They both use a dead band but compression takes into account the slope of the data which exception does not.