12 Replies Latest reply on Jun 12, 2012 12:46 AM by RJKSolutions

    Should buffered data have an expiration date?

      Wonder if I can run something by the community about buffered data, particularly from the PI Buffer Subsystem.

       

      At the moment you can buffer data for a PI Server or members of a PI Collective, which works wonderfully most of the time.  Whilst adding PI Buffer Subsystem in to a recent system design I have been wondering if there should be an option in PIBuffss to allow an expiration of data that is sat in the buffers.  The reason is that typically I want to buffer for short network disconnects so that I get data back when reconnected so it can be acted upon.  However, in some cases during a long disconnect (let's say 7 days) I may not actually want all data for the entire 7 days, just the last 1 day because in a fast continuous process one of the only benefits I will get is to see what had happened by the data being backfilled by PIBuffss, not necessarily being able to act upon that data - plus, upon reconnect PIBuffss is going to need some time to recover before real-time data is streaming again.  It might be relevant to have only the last 12 or 24 hours of data to allow quicker recovery, act upon data, ...  In fact, here I would actually like to see some Event Frame integration with the core PI Server by having Event Frames generated to indicate when events like PIBuffss recovery have taken place.  It is all too easy to look back at data in a PI Server and question why operators didn't act upon that data; if the data wasn't there then there is no way of knowing that it was automatically backfilled because of a network issue (or other issue).

       

      A scenario where this may be applicable is using PI to PI amongst PI Servers/Collectives across high latency network connections (WAN).  You essentially have a "collector" PI Server and then some kind of aggregated central PI Server/Collective.  You can use PIBuffss to enable some good performance in this scenario from collector to aggregator, but if the two become disconnected for a long period you can experience a long time to recover.  You may only want to ensure you have the last 7 days of data to minimise the network traffic of a high latency network, reduce PIBuffss footprint on the hosting node (100,000 tags @ 1Hz for 1 day = 8,640,000,000 events to process), etc, after all you can always go to the collector to get the data when it is needed.

       

      Anyway, what are your thoughts?

        • Re: Should buffered data have an expiration date?
          MvanderVeeken

          I think you bring up some valid points, and I think this also needs some attention from the OSIsoft Product Managers and possibly Developer (Leads). I contacted Jay Lakumb (PI Server Product Manager) and Charlie Henze (PISDK Development Lead) to chime in.

           

          For me personally these are very good points, I especially like the concept of having the tighter Event Frames integration in these situations .

           

          Am I reading it correctly that you are suggesting a data expiration for buffered data, and data older than the expiration date will be lost? Or a suggestion that data before the expiration data will have a different set of rules regarding backfilling by pibuffss? For instance a less prioritized backfilling, with slower datarates. This would allow a faster recovery without data loss, but full archiving of the buffered data.

           

          edit: grammar

            • Re: Should buffered data have an expiration date?

              My initial idea was to have a rolling automatic expiration of data (e.g. *-7d = expired) and then that data is deleted from the buffer files.  I quite like the idea of having a secondary buffering priority that allows real-time data to continue streaming but then will drip feed the backlog of events.  Not sure on the complexities of that or out of order data implications but that is not really my concern .  I always imagined just removing the expired data but I suppose it makes sense to be able to backup the expired data to files, files that can be reprocessed by PIBuffss when required but not automatically.

               

              For now, may be it makes sense to have the manual option in PIBuffss command utility to set the expiration date and then either delete or backup those events, rather than completely emptying the buffer files as you can do now.

                • Re: Should buffered data have an expiration date?
                  MvanderVeeken

                  Glad you like the idea. I think some of the OSIsoft developers will hate me for suggesting that

                    • Re: Should buffered data have an expiration date?
                      hanyong

                      @Michael: You are right, sounds like you just made the requirement more complex

                       

                      @Rhys: The way I see your requirements, there are 2 big suggestions

                       

                      1. Have more flexibility to control the behaviour of PI Buffer Subsystem such that it doesn't spend too much time trying to backfilling the data.

                       

                      I agree this makes a lot of sense as most users will only consider the system as normal when they see the data being update in real time, and that would include the time required for PI Buffer Subsystem to finish backfilling the data in the buffer into PI Server. 

                       

                      2. Using Event Frames to record system events, and PI Buffer Subsystem recovering/backfilling data is considered one type of these events. (am I making the same mistake as Michael )

                       

                      I believe this would be a feature that would be useful for system administrators to record or find any abnormal events that happened in the system. Maybe the other community members who are more involved in system administration will share their thoughts on this.

                       

                       

                       

                       

                        • Re: Should buffered data have an expiration date?
                          dvacher

                          Thank you Rhys.

                           

                          1. We already heard about a LIFO (Last In, First Out) requirement for pibufss, and also a cap on buffer space which would discard the oldest data. It sounds like these 2 features combined would address your scenario. Shameless plug: PI Server 2012 will be so fast at recovering buffered data that you may not need these... but then our customers will soon figure out ways to create even bigger systems

                           

                          2. Now that's a brilliant idea I never heard before. Not easy to pull off, but that's no reason to give up. I can't make you a promise (we rarely keep them anyhow) but I will sure keep this one on my back pocket.

                           

                          One thing for sure: you really deserve your All-Star awards. Thanks again!

                            • Re: Should buffered data have an expiration date?
                              jlakumb

                              This is a good discussion.  We captured the enhancement request for #1 in PLI #23465OSI8.

                               

                              Just to throw some more mud into the water, I wonder if another option is designate some tags as "high priority", so those tags are unbuffered first and sent to the PI Server?  This way you can have control over which tags should send their data once the connection to PI Server is restored.  Alternatively, you could control whether to send newer data first, as Rhys and Michael suggested.  I would like to understand which of these is preferable, or maybe both?

                               

                              The use case for the first option (send newer data first) is fairly obvious, i.e. newer data is more actionable and immediate, so give this priority.  The use case for the second option is that some tags, e.g. alarm data, may be more time-sensitive, so always give that priority over any historical data when emptying the buffers (although one could argue buffered alarm data is probably not really that important anymore)...

                               

                              Any comments on this?

                                • Re: Should buffered data have an expiration date?
                                  aabrodskiy

                                  Jay,

                                   

                                  I would say both cases are very valid, although if it was the matter of priority in development, I would personally vote for the first option (send newer data first/configurable), as this seems to be highly important, the bigger servers we have.

                                    • Re: Should buffered data have an expiration date?

                                      Wow, wouldn't that be a nice version of the PI Buffer Subsystem

                                       

                                      - Data expiration
                                      - LIFO  (a bit like those folks who are allowed to board the plane first even though they only just got to the airport and I've been sat there for over an hour...)
                                      - Priority tags 

                                       

                                      Oh, and could we also throw in an Update Subsystem for PIBuffss.

                                       

                                      Jay, I would like to see the ability to have the data expire first and then have some sort of priority of tags, which I think depends on who (business function) is setting the priority.  Now it would actually be good to be able to group tags together in something like AF and rank the groups on priority - so during plant upsets, or other significant events, you can quickly re-rank buffering priorities from a central location and have it propagate to all PIBuffss instances that are subscribed to the AF ranking (may be the message of priorities is distributed by the PI Server update subsystem to those signed up PIBuffss instances?).  Seen as though AF is a prerequisite for a PI Server you can establish this intimate link between AF & Interface nodes.  Or even take it to another level and automatically re-rank based on event frames, but I am just getting carried away now...

                                       

                                      Denis, what you guys need are some talented 3rd party developers from an awesome systems integrator to help you guys make these type of enhancements a reality - you just need to pass us some source code for the PI Server & PIBuffss.  

                                        • Re: Should buffered data have an expiration date?
                                          MvanderVeeken

                                          Rhys @ Wipro

                                          Denis, what you guys need are some talented 3rd party developers from an awesome systems integrator to help you guys make these type of enhancements a reality - you just need to pass us some source code for the PI Server & PIBuffss.

                                           

                                          lol, nice try Rhys  

                                          • Re: Should buffered data have an expiration date?

                                            Rhys @ Wipro

                                            a bit like those folks who are allowed to board the plane first even though they only just got to the airport and I've been sat there for over an hour...)
                                            the part of the story you are missing is that these guys sat on airplanes, heard babies cry and ate flight food for much, much more time than some others throughout the year, including weekends and odd hours...

                                             

                                             

                                              • Re: Should buffered data have an expiration date?

                                                Here is another thought.  We all need multiple collective buffering from a single PIBuffss instance, right?  Well I would like to have that ability.  This leads nicely in to the next day dream I had...

                                                 

                                                Rather than destroying data that has expired it might be useful in some scenarios to be able to send expired data to 'another' PI Server/Collective that is used as a form of cold storage, which can be administered when required by the PI Administrators.  For example, you have one high resolution PI Collective that data has to be actioned within an hour (or so), it is being sent data from multiple PI interface nodes running PIBuffss.  There is a network disconnect across network topologies during which time the data is buffered, with a tolerance of keeping only 1 days worth of data.  Anything older than 1 day is set to 'auto-expire' in PIBuffss (after you guys build this switch in for us ).  Although this data is no longer required on the high resolution PI Server/Collective, it can be diverted to a (closer/nearer/extremely low latency) secondary PI Server/Collective where it can be dealt with at the leisure of the PI Administrator - archive merge with high resolution PI Collective, PI to PI, etc.  This way you don't 'destroy' data, but you have your auto-expiring data stored elsewhere to allow fast recovery and streaming of real-time data again from PIBuffss.  I guess if you run a PI Server on the PIBuffss node with this functionality I am essentially talking about SSB

                                          • Re: Should buffered data have an expiration date?

                                            Jay Lakumb

                                            Just to throw some more mud into the water, I wonder if another option is designate some tags as "high priority", so those tags are unbuffered first and sent to the PI Server?

                                             

                                            Jay, I really wished I had this feature today.  I had a bunch of queued events (around 10,000,000) but within there were some Performance Monitor events.  I really wanted to watch the performance counters on the destination PI Server to watch the performance of PIBufss, the source PI Server and disk space on the PIBufss server.  I didn't mind if all the other tag data was being backfilled as usual, but could have done with giving Performance Monitor tags higher priority to have them scanning in real-time.  Say you implemented something like this, how would you categorise a tag as higher priority to PIBufss?