9 Replies Latest reply on Sep 3, 2013 1:54 PM by Gregor

    Backfill PI Data


      We set up our PI Server in Dec 2011.  Before that we had a different historian.  It has been on my list for a while to try and import the data from the other historian but I haven't had the time to get to it yet.  We are being asked to obtain some of that data now and I would like to get that data in PI so in addition I can shut down those old servers holding the data prior to 2011.  I am currently running PI 2010 but plan to upgrade to 2012 prior to doing this.  I found some documentation on the topic but have a few questions with it.  Referring to the following document



      Step 2 says to make sure current data is not being sent to the tag from the interface.  I can't stop the current data from coming in to the system while I do the import.  I need current data to keep coming in.  Is there a way around this? 


      On step 7 it says to delete the Pt Created event.  The Pt Created event should probably be in Dec 2011.  So do I have to delete that event prior to importing the data?


      The document doesn't talk about creating archive files, but my first archive file has a start date of Dec 2011.  Prior to doing this, do I need to make archive files with start/end dates before that?  Should I make for example, one archive file for each month that I plan to import?  Is there a quick way to make archive files, I will be importing multiple years of data?


      Once everything is set up I plan to write a .Net program using PI OLEDB to insert the data.

        • Re: Backfill PI Data

          Hello Brian,


          Brian Altman

          Step 2 says to make sure current data is not being sent to the tag from the interface.  I can't stop the current data from coming in to the system while I do the import.  I need current data to keep coming in.  Is there a way around this?  


          I understand you plan to backfill data that doesn't belong to existing archives or with other words data older than December 2011. This data will not go to archives covering current periods. If you cannot stop recent data from coming in because users need that data, you will have to do the backfilling in parallel.  


          Brian Altman

          On step 7 it says to delete the Pt Created event.  The Pt Created event should probably be in Dec 2011.  So do I have to delete that event prior to importing the data?


          I suggest deleting "Pt Created" events because different clients will not show data before "Pt Created". Generally you can use piconfig.exe or PI-OLEDB (classic) Provider delete "Pt Created" events but piconfig.exe is not very handy when the "Pt Created" doesn't sit in the snapshot anymore. Please use queries like the following one with caution:



          DELETE FROM picomp2 WHERE tag like 'CDT%' AND status=-253 AND time between '1-Dec-2011' AND '31-Dec-2011'

           To verify first what records would be affected you can list them:



          SELECT * FROM picomp2 WHERE tag like 'CDT%' AND status=-253 AND time between '1-Dec-2011' AND '31-Dec-2011'

          Brian Altman

          The document doesn't talk about creating archive files, but my first archive file has a start date of Dec 2011.  Prior to doing this, do I need to make archive files with start/end dates before that?


          I agree online library could be more clear at step 3 that archives need to be existent covering the historical period. Please make sure you have historical archives online covering the periods you would like to backfill. Please also take the amount of data into account when sizing historical archives. With PI 2012 it's not necessary anymore to reprocess archives against the current point table. Primary records will be added automatically.


          Your PI-OLEDB approach should work.

            • Re: Backfill PI Data

              So if I backfill in parallel, will the data that I am backfilling be compressed or is that the reason step 2 says to make sure current data is not coming in?  I want PI to compress the data that I backfill.

              • Re: Backfill PI Data

                Hello Brian,


                Sorry about the confusion. The title "Backfill existing archives from new PI points with compression" indicates historical archives exist already. If they don't you'll have to create them.


                The title also talks about "with compression". This is why it recommends to stop active data collection. To achieve that compression becomes applied, you have to insert into the snapshot table. Interference will cause compression cannot be applied properly when inserting current and historical data at the same time through PI Snapshot. This will affect compression of recent and historical data.


                My idea would be instead inserting historical data through picomp2 table (without compression). Even it's historical data, make sure events are sorted in timely order (oldest -> newest) per tag to avoid PI Archive Subsystem being busy sorting events within historical archives.  

                  • Re: Backfill PI Data

                    Gregor...but if a tag already has a newer value than any subsequent historical values being backfilled then they are considered out of order and no compression is applied, even if you're not processing in parallel. Correct? (Or am I suffering with Friday afternoon syndrome...)


                    To achieve compression you'd have to create brand spanking new tags with the desired compression then backfill those in order, and then merge the data as out of order to the final destination tags.


                    Edit: You'd still end up with a potential compression mismatch where the backfilled data meets the first archived value of the non-backfilled events.

                      • Re: Backfill PI Data

                        OK, so in order to compress the data I need to stop the interface from giving current data and then with PIOLEDB I need to insert into pisnapshot and not the picomp2 table?  I might be able to go without data for a short time but I can't shut the whole interface off.  I am wondering maybe if I set scan to off for the tag through PIOLEDB, then import the data, then turn scan back on for each tag.  This will limit the downtime only for the tag that I am importing the data and not the whole interface.  If I set the scan to 0 though, do I need to wait for the interface to realize that the scan is set to zero or can I immediately start importing?  These are all OPC points and I know when I add points it takes up to 2 minutes for the OPC Interface to catch the change.

                        • Re: Backfill PI Data

                          Hello Rhys,


                          Your thought about Friday afternoon syndrome is a good one. There were similar doubts running through my brain as well. However, if the described procedure suggests to interrupt current data collection, there must be a good reason. The only one that I can think of is that snapshot updates with recent events would confuse compression of the historical data that is backfilled. If the requirement would be to create brand new tags, the procedure would again not suggest to interrupt recent data collection.


                          What I am not sure about anymore, is if backfilling would indeed interfere compression of recent events. I am afraid a detailed knowledge of the algorithm is required. I'll reach out to Denis.


                          @Brian: I saw you've replied as well in the meantime. I hope you're not totally in a rush with this because I really prefer aiming for clarification if compression is important for you.


                          Setting the scan attribute to "off" would be an option to stop data collection for a single tag but this setting doesn't become effective immediately as you mention already. The interface checks for point updates every 2 minutes. In worst case it takes 2 minutes until scan off / on becomes effective. You would have to wait at least for these 2 minutes before backfilling but it would be even a better option to verify with the interface log, data collection for the particular tag is indeed suspended.


                          PI OPC (DA) Interface doesn't support historical data access. This said, you will lose data for 2 minutes + the time backfilling takes + 2 minutes. I believe disconnecting the interface node and have it buffering at times with minimum impact to users would be the better option. Please do not unplug the interface nodes network cable since this will pretty likely cause data collection being interrupted too. KB00300 - How do I simulate a PI Server shutdown to test buffering? lists some better options.

                            • Re: Backfill PI Data

                              Hello Brian, and thank you for your question.
                              My name is Denis Vacher and I’m the Engineering Group Lead for the PI Server.  As Gregor said, I do want to help as much as I can.
                              First of all, I’m sorry you’re having difficulties with our online documentation.  We’ll use this exchange to make sure we add all necessary clarifications.

                              How does compression work with new vs. old (out-of-order) data?
                              The rule is rather simple; compression is applied by the Snapshot Subsystem only for data that arrives with newer timestamps than the end of the data stream, i.e. the event in the Snapshot table.  This rule applies regardless of the number of values prior to the Snapshot event.  The only exception to that rule is with PI Server 2012 if the only value in a PI Point is the "Pt Created" event.  In this case, you can start sending data out of order and Pt Created will disappear automatically so that data goes through compression (as long as it's coming in chronological order).  This is what the Note below point #7 on the backfilling article refers to.
                              Now I understand you want to insert (backfill) older data into a live PI Point that already has about 2 years of history, and you would like compression to be applied.  Based on what I described above, you should agree that compression won't take place unless we identify a different procedure.  Also, this same article talks specifically about "new PI Points", where the only value is "Pt Created".  In other words, the procedure described is not validfor PI Points that have historical data and/or receives new values continuously.  We'll make sure we add these clarifications to the article.

                              What are the potential solutions then?
                              - Theoretically, you could stop or disconnect all your PI Interfaces (so that no new data is received), dismount (unregister) all archive files that contain data for your backfilling points, and remove the Snapshot values for these points.  This way, the backfilling points will be "empty" --even though they have "hidden" history-- so that the backfilled data goes through compression.  The PI Server will act as if these points were receiving data for the first time.  I'm certain you will agree this procedure isn't quite ideal.  Not only it's fairly complex, it essentially requires your PI Server to be unavailable the whole time of the operation.
                              - What I would suggest instead is to clone your PI Server to a test/staging machine and do the backfilling against that separate server.  Assuming you can find such a machine, the procedure is relatively straighforward, safe, and above everything else won't have any impact on your production server.  To be extra safe, I would still suggest you call our technical staff for guidance.  Here are the steps:


                                1. Make sure you're running PI Server 2012 to begin with;
                                2. Take an online backup of your production server, with just the primary archive file;
                                3. Install PI Server 2012 on the staging server and restore everything except the primary archive;
                                4. Create an empty archive of any size, using pidiag -create, and register it (as primary) using pidiag -ar;
                                5. Start the PI Server on the staging server and verify you can read the Snapshot value of the points you want to backfill (note: verify you do see any history for any points);
                                6. Delete the Snapshot values of these points (alternatively you can delete all Snapshot values by rebuilding the Snapshot table);
                                7. Create the archive files that will receive the old data; for this, PI SMT 2012 provides a very handy tool that creates N archives in a single operation;
                                8. Send your old data to the staging PI Server, and once done, verify you can read it;
                                9. Important: you must send one more value to all points to ensure all values are in the archive files and not in the Snapshot table (I suggest something like "Invalid Data");
                              10. Stop the staging PI Server; move the archive files containing the backfilled data to your production server and mount (register) them;
                              11. Verify you can read all history on the production PI Server; you're done.


                              I'm fairly certain you will need a temporary license file at step #5.  Our technical support folks can help you with that.
                              Please let us know if this procedure might work for you, and feel free to ask for more clarifications.  Good luck to you and happy backfilling!

                                • Re: Backfill PI Data

                                  Thanks Denis, I think I will try your suggestion.  I am not currently in a rush for this but just wanted to have a plan for when I do it.  I haven't gone through a restore test yet anyway and this would be a good test of it.  Just want to verify though that when I insert with PI OLEDB I have to insert into pisnapshot for each value and then PI will do compression and then move it to the archives?

                                    • Re: Backfill PI Data

                                      Hello Brian,


                                      Table pisnapshot is read-only but picomp accepts inserts. I've just tried it with a brand new tag on a PI 2012 installation using a queries like the following one:



                                      INSERT INTO picomp (tag, value, time) VALUES ('Backfill',2.304,'*-24h') 

                                      I can confirm that the first insert removes "Pt Created" and compression is applied.