19 Replies Latest reply on Jul 31, 2015 12:51 AM by hgunterman

    Best Practices for Future Data in PI


      Hey vCampus team,


      We will be using PI to store operational data from in-home demand-management devices (such as controllable thermostats and relay/measure modules). These devices record temperature, power, voltage and the like and are also controllable for the purposes of targeted demand response. We have the capability to forecast the usage of energy-using devices in the home for which we have data, most notably HVAC systems and water heaters. We will be publishing these forecasts at regular intervals to a utility operator who makes day-ahead and hour-ahead operating decisions based on the forecasts.


      Assume the day-ahead forecast is prepared at 2pm for the 36-hour period beginning at midnight the next day. Assume the hour-ahead forecast is prepared on the day-of every hour for the period beginning two hours in the future and ending at midnight (e.g., the forecast for 4pm - midnight today is prepared and stored in PI at 2pm). Both of these forecasts are 'rolling' in that they have regular overlapping periods with older forecasts. We want to retain all of these forecasts. In other words, we never want to overwrite any part of an older forecast with a newer one. All forecasts are done with timestamps at 15-min intervals.


      We want to know what our options are for storing these time-series forecasts (i.e. future data) in PI, and what is considered to be current best practice? From my discussions with folks at OSI and reading through the discussion forums, I am aware of several possibilities:


      1) Use primary and seconday tags, where the primary tag stores the value and the secondary tag stores a time offset that is added to the timestamp of the primary tag. Use programmatic logic to apply the offset and reconstitute the forecast when publishing it.


      2) Use time scaling to compress the data into a time window well within the 10-min future time limit of PI. Use programmatic logic to uncompress the forecast when publishing it.


      3) Use a second PI server with the system clock set in the future.


      4) Use a relational database with the RDBMS interface and store the future data in a rdb table.


      What else am I missing? What recommendations do you have? Thanks for your help!


      Mike Christopher, GridPoint

        • Re: Best Practices for Future Data in PI

          Hi Mike,


          I would vote for option 3. or 4. (btw, you would probably use a COM connector to "link" the relational database in PI so that the future tags look like normal tags for the PI clients).


          Option 1. and 2. will have data in PI that is somehow "wrong" and you need application logic to make this data "correct" - this is a scenario I like to avoid. Personally I prefer if data can be read from the database by standard clients and does not need any custom algorithm to make it useful.



            • Re: Best Practices for Future Data in PI

              We are having the same challenge.


              If you would have a Pi server running at present time, and a Pi server running at future time, you would want to have a COM connector between the two. The actual data would be present at the server running in the future, and the server running at present time would just be a 'proxy' for the pi points at the future server.


              You would probably want to connect to the server in the current time. But, here I think is the catch. If you write a future value to this server, it would use the COM connector to write the value to the server running in the future (hence, the value will be in the past for this server). But: the snapshot subsystem of the server running in the present time would not allow this data to be send trough (because the timestamp is in the future.). So, writing future data to the present server (which then connects with the future server) would not be possible.


              Can anyone comment on this? 

                • Re: Best Practices for Future Data in PI

                  That would have to be tested.  The snapshot may or may not care about the future nature of the timestamp, since it will never be talking to the local archive subsystem for this point.


                  Obviously, it would be better to have the future data just written directly to the future PI server.

                    • Re: Best Practices for Future Data in PI

                      Matt Heere

                      Obviously, it would be better to have the future data just written directly to the future PI server.



                      Technically it would, but from a users standpoint I think not, for usability I think it would be nicer to have just one PI Server to talk to.

                        • Re: Best Practices for Future Data in PI

                          Well, technically speaking, to the final user it will look as if it was only one PI Server, which, behind the scenes, it wouldn't be.


                          For Upgradeability and maintainability it may be desireable to have only one, really. :)




                          Isn't that what we do? make a clean mess behind the scenes so the users just see the nice things when something like this gets asked?

                            • Re: Best Practices for Future Data in PI

                              The thing is you can't really provide a fit for all solution with future data, it depends on how far into the future and type of access to the future data you need (standard client tools, custom application).


                              It is easily possible to do this on a single PI server, for each data point use 2 PI tags where 1 is the real time value, 2nd is the forecast data.  With those 2 PI tags you can use ProcessBook to show actual vs predicted and watch in real time to check accuracies of predictions.  Just use DataSets for value objects to provide an offset (e.g. *-24h) against the Actual PI tag, then use a time offset for a PI trend.  For data even further into the future you could use PI-ModuleDatabase/PI-AF and run a service to synchronise the data with the PI tags as the forecasted data comes within range.


                              Purely as an example of the above, this is a screenshot of standard ProcessBook functionality to monitor predictions (in this case sinusoid):



                                • Re: Best Practices for Future Data in PI



                                  The only pitfall I see in that solution is that if the prediction is getting rewritten the values for the old prediction will either not be saved or get mixed with the new ones.


                                  Say you have 24h predictions that get build each 1 hour, that means that even you use 24 tags to represent that... well, perhaps an interface with a toll-free fortune teller number would work.


                                  You are right about that, there are a pletora of solutions available for the plenty of possible cases. it is hard to predict.


                                  Great post.

                      • Re: Best Practices for Future Data in PI



                        Around 3 years ago I implemented a forecast (upto 48 hours in future) electricity supply/demand utility that used PI-ModuleDatabase (no AF at the time!) and your Option 1 purely for trending in ProcessBook, where you can use some relative time tricks to show future data against actual to track if predictions were correct.  This data is exchanged with the country's main Electrical supplier via web services.  Still works today without a single error but if I was to do it today I would do it differently - combination of AF and COM connectors as Andreas suggests.

                        Can you wait until 2010?    OSI are going to implement future data into the Enterprise Server according to the Engineering Plan: http://techsupport.osisoft.com/techsupport/nontemplates/engplan.aspx?ProductName=PI+Enterprise+Server+(3.x)&ProductGroup=Server



                          • Re: Best Practices for Future Data in PI

                            Rhys is absolutely right: future data is currently slated for PI Server version 3.5.x.x, planned for 2010 (only 10 months away from now )


                            In the mean time you can use the PI OLEDB COM Connector, which is able to handle future data. Note that with COM Connectors, the data is not replicated on the PI Server; COM Connectors rather allow reading and writing data to and from the target database, through the use of "regular" PI Points. With this technique, data may be retrieved or inserted for a past, current or future date - because of the way PI currently handles timestamp, the most advanced date in the future you'll be able to use is in year 2038 (see my blog post about PI Time).


                            Depending on what you want to do and if manual interventions are expected, you could also use PI Manual Logger to handle future data. PIML has a Windows Service that can run on a scheduled basis to upload any queued, future data to PI when it is time.


                            Hope this helps!

                              • Re: Best Practices for Future Data in PI



                                Hi Mike,


                                I'd think that one would need to know the following in order to make a rational choice between methods 1-4:


                                   (1) How many signals (tags) involve future data?  (Tens, hundreds, or thousands?)


                                   (2) What will be the most common use of future data once it is stored? For instance, will it be combined in any sort of analytics, reports, aggregations, etc..., with historical data?



                                  • Re: Best Practices for Future Data in PI

                                    Moh brings a very good point: segregation of the data depending on its nature/frequency/usage. That's why we are also working on a concept called "Point Partitions", introduced in a talk at the 2007 User Conference, planned for the same release of PI as for our "Future Data" release (2010, according to our public Engineering Plan).

                                    • Re: Best Practices for Future Data in PI
                                      Bryan Owen

                                      Right on Moh...


                                      2) "once it is stored"


                                      ...and if archiving is required.  In many use case scenarios, preliminary/projected values are considered volatile. As such new value prediction actually replaces the prior prediction. 


                                      Requirements for shorter update cycles and feedback to the model logic tend to favor archiving in PI.


                                      Thus an alternate approach is to use multiple tags (e.g. an array of tags) to represent the future waveform. The tags need to be queried as an array to construct a dataset for trending (PI Profile is the only native tool for handling arrays of points). AF has some plumbing for point arrays but I don't think the data reference is completely exposed for this kind of use.  A union style OLEDB query could easily assemble a dataset from multiple tags.  Most using this approach use the PI-SDK or API to get snapshots for an array of tags.


                                      To Andreas comment this approach does not compromise integrity of timestamp or value. The points are positional indicators but are potentially useful standalone; for instance the tenth tag could always represent the best estimate for hour ahead.  In fact, archiving could be selectively enabled or disabled in this approach.


                                      1) the array approach has implications on number of tags and is mostly used for critical loads/tie lines. Likely overkill for presenting predicted profile to the home consumer. Unless you are like me, have a server at home and want to see this update in real time. 







                                      • Re: Best Practices for Future Data in PI



                                        (1) On the order of hundreds of tags for our initial development. We will likely need to scale to thousands in the future, but by then future PI will likely be a thing of the present!


                                        (2) Once the data is stored, it will be consumed by a third-party optimization engine that uses it and then discards it. However, it will also be used internally to analyze the accuracy of our forecast model, so we do not want to discard/overwrite it.


                                        After reading everyone's great feedback and contemplating this some more, we definitely want to choose a method that does not require any processing by the consumer of the data, so scratch options 1) and 2) from my original list.


                                        I think the COM Connector is our best option but any other feedback is always appreciated.





                                  • Re: Best Practices for Future Data in PI

                                    Hi Mike,


                                    Essentially it sounds to me like you actually have two issues:


                                    1.  You have data with timestamps that are in the future.


                                    2.  You have data which will be versioned.  As each new itteration of the forecasting algorythm runs, you will get a new data point for each of the forecasted intervals.  If you don't care about older forecasts then this isn't really an issue, but if you simply overwrite the forecasted value for a particular timestamp whenever the algorythm runs, then you can't really use PI to reconstruct the past.


                                    Item #1 has been discussed at length in this post, so I'll merely point out that most of the solutions mentioned (including the upcomming enhancement to the PI server to allow for future data) only address this point.


                                    Item #2 is a distinctly different problem from future data, and the solutions to it are equally different.  If you have a known number of forcast intervals, then it is possible to use multiple tags per forecasted point to store each of the revisions.  If you don't know ahead of time how many updates to the same timestamp you're going to have, then this isn't really an option.


                                    So where does that leave you?  If you combine the concepts of time-shifting the data and using multiple tags to store the versions then you can actually store the data in a PI server.  It will require code to make any sense out of it, datasets in PB at a minimum or some above average complexity Excel work.


                                    I had this exact same scenario as a customer and decided that storing this data in an RDBMS, and virtualizing it through PI using the OLEDB COM connector made the most sense.  As the data ages, it can be moved into PI by the RDBMS interface for long term storage, thus keeping the database small and workable.


                                    As an aside, if you're going the code route anyway it may be possible to use the subsecond timestamp capability in PI (as opposed to multiple tags) to manage the data versions, assuming that your forecast data has no more granularity that one second.





                                    • Re: Best Practices for Future Data in PI

                                      Of course you could get hold of a copy of ECHO and support future data properly!

                                      • Re: Best Practices for Future Data in PI

                                        So let's sumarize where this ends up:

                                        • Versioning of data and the need for future timestams are seperate issues.  Both are requirements for this application
                                        • Versioning has three solutions:
                                          • Use multiple PI tags.  This requires code both to get the data into PI and to pull it correctly.
                                          • Use the subsecond timestamp as the version number.  This requires code both to get the data into PI and to pull it correctly.
                                          • Use a relational database to store this data, and expose it to PI users via the OLEDB COM connector.  This requires code development in the form of the SQL for the COM connector, but nothing complicated.
                                        • Future timestamp support has three solutions also
                                          • Use PI tags with the timestamp offset.  This probably requires code both to get data into PI and to pull it correctly.  If you're really lucky, this "code" is manifest as the SQL to configure the RDBMS Interface on the inputs, and ProcessBook data sets on the output.
                                          • Use a releational database to store this data, and expose it to PI users via the OLE DB COM connector.  This requires code development in the form of the SQL for the COM connector, but nothing complicated.
                                          • Use ECHO with the pre v2.6 circular buffer archive.  This directly supports future timestamps and would look like any other PI server to the users.  Could appear to be part of another PI server via the PI COM connector.

                                        What's noteworthy here is that since you have both requirements, you need solutions that overlap the two scenarios.  That means you're down to either using a bunch of tags and code (assuming you have a known, fixed number of versions [which I think you do]), or the RDBMS solution.



                                          • Re: Best Practices for Future Data in PI

                                            Great summary!


                                            To fast forward , for the question about future time stamps, the latest PI Server 2015 release provides native support for future data.


                                            For versioning, PI AF 2012 introduced the ability to expose time-series data in external databases.  So an option is to store prior versions in a relational DB, then configure the AF table look-up with Behavior = "Table provided time series data."  This way you can plot externally stored time-series data as easily as though it were data stored in PI Data Archive.

                                            plot external db data.png

                                            Note: RDBs weren't optimized for time-series data the way PI Data Archive was, so start small on the date range to ensure you don't overwhelm the RDB.