14 Replies Latest reply on Feb 3, 2012 7:25 PM by Ahmad Fattahi

    How to backfill a large data set?

    Lonnie Bowling

      I have a somewhat large PI Backfilling operation that I need to perform for a customer.  The data is in an old Foxbro Historian (AIM) and we need to backfill 5 years of data for about 10K points.  I’m not sure how many values per point at this time, but the AIM database has at least 40GB of data.

       

      I have read most of the stuff here on the boards and reviewed the “Backfilling with PI Config” KB article.  It looks like because of the amount of data and tags, that PIConfig is not going to get it done, so I’m thinking of using PISDK to do the job.  This is the basic work flow I’m considering:

       

      1.       Create tags to backfill (using the SMT add-on for Excel)

       

      2.       Reprocess old archives to create primary records for the new tags, they will be converted to dynamic archives, using piarchss as outline in the KB article step 5

       

      3.       Create new archives as required (step 6 of KB article)

       

      4.       Clear the snapshot value for all tags (using the SDK to find all the tags, then deleting the current value, is this a good approach?)

       

      5.       Import the point values from oldest to newest for each tag using the SDK.  I may get the data from a csv file (exported from the database) or through an ODBC query to the AIM database, we are evaluating that right now.

       

      This is just a summary of what intend to do.  Does this sound reasonable?  Is there another approach that would be faster or simpler?  I would like to automate all steps as much as possible.

       

      I have a backup of the customer’s PI database that is about a month old, so I will also need to add the newly process archives back into their system at some point.  I may also want to use the utility in the future to move more (new) data from the AIM database to PI, so I also am keeping that in mind.

       

      Any comments, advice, or help would be appreciated.  If someone has done something like this and is willing to share some code that would be great.  I will also share what I do here as we move forward to help others out.

       

       

       

      Thanks,

       

      Lonnie

       

       

       

       

       

       

        • Re: How to backfill a large data set?
          mhalhead

          Lonnie,

           

          We've had the pleasure of doing a number of backfilling exercises from WonderWare and Proficy. Steps 1 - 3 look good to me. Step 4 looks unnecessary; when a point is created the Pt Created timestamp is 1970/01/01 00:00:00. For step 5 I would use RDBMS directly to PI (on a seperate system). You will need to check the performance.

           

          I would also recommend that you create the new points on the customers PI system prior to taking a backup. This way you will only need to register the backfilled archives once they are completed.

            • Re: How to backfill a large data set?
              andreas

              Michael Halhead

              Step 4 looks unnecessary; when a point is created the Pt Created timestamp is 1970/01/01 00:00:00.

               

              "Pt Created" has the timestamp of point creation. But it is not absolutely necessary as your old data will go to an empty archive anyhow, so no "out of order" data impact as long as you old data is in good shape.

               

              Michael Halhead

              For step 5 I would use RDBMS directly to PI (on a seperate system). You will need to check the performance.

               

              Absolutely - PI interfaces use the PI API - and that is still faster than using your custom written PI SDK application, especially if you can not group the data by tags.

               

              Michael Halhead

              I would also recommend that you create the new points on the customers PI system prior to taking a backup. This way you will only need to register the backfilled archives once they are completed.

               

               

              Very good point.

                • Re: How to backfill a large data set?

                  It seems like you can easily extract the AIM data into CSV files... if that's the case, I would strongly advise for the PI Universal File Loader (UFL) Interface, if that's possible. It is much simpler and would be faster than coding anything or configuring the PI RDBMS Interface and PI Points: you drop the CSV's in a folder and that's it, you're done.

                    • Re: How to backfill a large data set?
                      aabrodskiy

                      UFL interface with bulk data writes seems to be the most efficient way of doing that, as Steve mentioned. We had a positive experience using it for backfilling the data.

                        • Re: How to backfill a large data set?
                          Lonnie Bowling

                          Skipping steps, not having to mess with the SDK, this all sounds great.  I'm really glad that I posted to get some feedback before I started.  Thanks you all for the input, this is what vCampus is all about!  I think will will go with the UFL approach and I will post some details on what we did.

                           

                          @Michael thanks for the idea of creating the points in the PI database before doing the operation, that will save me some valuable time.  I assume that after I create the points, I will need to shift to a new archive, take the old archives offline, reprocess all of, and then bring them back online.

                           

                          Thanks again!

                           

                          Lonnie

                           

                          Lonnie

                            • Re: How to backfill a large data set?
                              mhalhead

                              Lonnie,

                               

                              You are correct you will have to force an archive shift and reprocess the old archives. You can reprocess the archives on an offline server with a backup so you don't need to mess with the production system.

                               

                              Andreas you are correct I was talking rubbish.

                                • Re: How to backfill a large data set?
                                  Lonnie Bowling

                                  Concerning the UFL interface.  Is it possible to set exception/compression when backfilling the data with this interface?  That is something I will need to be able to do.

                                    • Re: How to backfill a large data set?
                                      hanyong

                                      If I remember correctly, the exception and compression settings on the tags still applies when this interface writes data to PI Server. There are specific tag and interface configurations that you should take note of regarding whether or not the interface does exception reporting like:

                                      • Location5 setting of the tag
                                      • /lb and /lbs setting of the interface

                                      Hope this helps

                                        • Re: How to backfill a large data set?
                                          andreas

                                          AFAIK exception and compression are only applied if you are writing data that is newer than the snapshot - that makes it difficult in your case. You will have to delete all pt created snapshot events to enable the compression.

                                            • Re: How to backfill a large data set?
                                              wpurrer

                                              I backedfilled the pi server with about 80.000.000 Events on a 150 K Server.

                                               

                                              During this procedure i crashed the server multiple times .. so don't try it on a production server.

                                               

                                              I did it with SDK,... on a production server.

                                               

                                              After i slowed down the process ... only 10K at a time every 10 Minutes it worked.

                                               

                                              PI Server is pretty slow with backfilling.

                                                • Re: How to backfill a large data set?
                                                  andreas

                                                  Wolfgang - I would say that requires some clarification. Yes - I would not do a large backfill on a production server, not because of crashes, but because of the load, any issues caused by less than perfect historical raw data, and any issues caused by less than perfect current data from the interfaces.

                                                   

                                                  If your data is aligned in time - so your are not running into out-of-order archive data, and the PI system is in good shape, you should be able to feed data into the system with a much heigher rate. I wrote a small PISDK app that creates random data to feed into the PI System - naturally the data is in perfect shape as i am just creating it that way - and I don't have to cut the data into pieces. The result was a nice 10k events/s backfill (just for the sake of time I backfilled only 2,000,000 events) for 5 tags.