RJKSolutions

Should buffered data have an expiration date?

Discussion created by RJKSolutions on Dec 7, 2011
Latest reply on Jun 12, 2012 by RJKSolutions

Wonder if I can run something by the community about buffered data, particularly from the PI Buffer Subsystem.

 

At the moment you can buffer data for a PI Server or members of a PI Collective, which works wonderfully most of the time.  Whilst adding PI Buffer Subsystem in to a recent system design I have been wondering if there should be an option in PIBuffss to allow an expiration of data that is sat in the buffers.  The reason is that typically I want to buffer for short network disconnects so that I get data back when reconnected so it can be acted upon.  However, in some cases during a long disconnect (let's say 7 days) I may not actually want all data for the entire 7 days, just the last 1 day because in a fast continuous process one of the only benefits I will get is to see what had happened by the data being backfilled by PIBuffss, not necessarily being able to act upon that data - plus, upon reconnect PIBuffss is going to need some time to recover before real-time data is streaming again.  It might be relevant to have only the last 12 or 24 hours of data to allow quicker recovery, act upon data, ...  In fact, here I would actually like to see some Event Frame integration with the core PI Server by having Event Frames generated to indicate when events like PIBuffss recovery have taken place.  It is all too easy to look back at data in a PI Server and question why operators didn't act upon that data; if the data wasn't there then there is no way of knowing that it was automatically backfilled because of a network issue (or other issue).

 

A scenario where this may be applicable is using PI to PI amongst PI Servers/Collectives across high latency network connections (WAN).  You essentially have a "collector" PI Server and then some kind of aggregated central PI Server/Collective.  You can use PIBuffss to enable some good performance in this scenario from collector to aggregator, but if the two become disconnected for a long period you can experience a long time to recover.  You may only want to ensure you have the last 7 days of data to minimise the network traffic of a high latency network, reduce PIBuffss footprint on the hosting node (100,000 tags @ 1Hz for 1 day = 8,640,000,000 events to process), etc, after all you can always go to the collector to get the data when it is needed.

 

Anyway, what are your thoughts?

Outcomes