10 Replies Latest reply on Mar 2, 2010 8:07 AM by oshafie

    Backfill data in non-chrono order


      Hi all,


      I've got a system (3.4.380.36) that I'm testing some programs that back fill data.  A test system, natually.  I'm filling data from Dec-2008 for the entire year of 2009.


      I had a look at the KB article on backfilling data and the source data isn't in a format that can be easily imported with piconfig.  The data is in XML files, sliced by calendar day where 9 tags for a site for a given collection time are in a single XML record.  Consecutive XML records represent chronological data throughout the day.


      I created a dynamic archive file to cover 01-Dec-08 through 01-Jan-10.  I created all the tags I needed for two sites - each site having 9 tags.  I deleted the initial 'Pt Created' archive event for each tag.  When I went and attempted to load the data for a site, I see archive history for awhile, then I end up only seeing the last value sent from the PISDK program in the archive, and no previous history. 


      I am watching the snapshot and archive stats and it appears I'm getting an accumulation of 'Out of Order Events', which couldn't cause a problem- just bypasses compression...?


      It seems like I get different results in different cases.  I had the entire year loaded for one site and when I went and loaded the another site for thesame timeframe and again I only got the last value sent.


      I'm attacking this from server-side first, then I'll see if this is a PI-SDK programming issue.


      Thanks for any advice.



        • Re: Backfill data in non-chrono order

          Hi Corey,


          First off, what are you flying in your profile photo?  I'm a Diamond (DA20) guy myself


          Now as to your data.  Have you looked at our XML interface?  Might be just what you need to get your old data into the server.


          Out of order events aren't the problem that they used to be, but they're still not great for the PI server.  The archive is highly optimized for data which arrives in chronological order.  If there's anything you can do to make sure the data arrives in order, you'll see pretty big performance gains on your backfill.


          You might also consider creating a couple of archives to cover the time range, rather than one big dynamic one.  It may not matter (I don't know how much data we're talking about here), but it can't hurt.



            • Re: Backfill data in non-chrono order

              Hi Matt,


              I have a Cessna 177B Cardinal FG.  Love it - very comfortable xc aircraft.  Going to take it to OSH this year.  :)


              Tha data I'm injecting is in chrono order, at least as far as the tags are concerned.  I do see the history for awhile, but then it disappears and I just see the last value sent.  I'm guessing (and I should have watched this) is that the values spool up in the eventq, remain there for awhile, but don't make it to the archive - leaving the snapshot in place 'cause it's, well, the snapshot.  the fact that I can see history at all is likely ONLY because they are just queued up waiting to get written to archive - but they ain't.


              Chrono order should be important from a tag perspective, correct? If I load year 2009 for 9 tags, then turn around and load year 2009 for 9 other tags, that should still be allright, eh?


              I'll take your multiple fixed archive suggestion into account.  That's why I'm doing this on a test system. :)


              CAVOK to you.



                • Re: Backfill data in non-chrono order

                  OK, Chrono order per tag is just fine.  Thanks for that bit.


                  Can you explain how you're looking at the historical data?  Is this a DataLink sheet, ProcessBook, your own code, etc.?  The different cleints behave slightly differently with respect to exactly how they fetch data from the server, so I'm just trying to get a feel for the details of the process here.


                  How much data does this end up being?  You said you had successfully loaded one site, how big was that archive?


                  Also, is it possible to do a test load with the XML interface?  Would pinpoint almost immediately whether this is a server issue or a code issue. 


                  The server stats will tell you how many events went to the snapshot, and then off to the archive.  They'll also tell you whether the data is sitting in the queue or not.


                  Right back at ya on the weather!  IFR here today I'm afraid.  OVC030



                    • Re: Backfill data in non-chrono order

                      Yeah, I meant to watch the server stats while I was injecting.  Have to try it again and see what transpires.  Will do that when I switch to multiple archives.


                      I'm using the SMT archive plug-in to look at the data as I load it.  The data is 5-minute snapshots for 9 tags per 'site', over a 12-month period - around 2K values/day - not prohibitive.  It's a PI-SDK program that reads the data in a day at a time and injects it.  I'm 100% convinced what I do see is just coming out of the eventq, not the archive.  The quantity of data sent in just log jams it until it's dequeued accounts for the delay.


                      Don't think I have the XML interface as part of our vCampus benefits, so can't try that.  Should check our install...


                      CYYC today: METAR 34012KT 10SM -SHSN M09/M09 :P

                        • Re: Backfill data in non-chrono order
                          Daniel Takara

                          Hi Corey,


                          If I understood your situation correctly, the backfilled data of the 9 PI tags of the first site (let's call it site A) kept archived, while the backfilled data of the 9 PI tags of the second site (let's call it site B) seem to make it to the event queue, but don't get ultimately archived.


                          I would bet you created the 9 PI tags of site B after creating the dynamic archive. Is that correct?


                          If so, you'll need to:

                          1. unregister the dynamic archive
                          2. create all the tags you need to backfill with data (in case you need to backfill data for sites C, D, E, etc)
                          3. create and register new dynamic archive(s) for the time span you need (so that the archive(s) have a recno for every tag you need to backfill with data)
                          4. backfill the data
                            • Re: Backfill data in non-chrono order

                              Just like Daniel suggested (I would bet you created the 9 PI tags of site B after creating the dynamic archive), one of your first statements led me to believe you created the PI Points after you created the archive:

                              Corey Wirun

                              I created a dynamic archive file to cover 01-Dec-08 through 01-Jan-10.  I created all the tags I needed for two sites - each site having 9 tags.


                              One of the things you need in order to be able to write data to PI, is a "primary record" that identifies each of your PI Points in the archive file(s) you intend to write data to. When you create new points for backfilling purposes, the recommendation is to create those before. If your points are already created before you created (e.g. you already collected data for the points and you want to backfill data later on), then you will need to "reprocess" your archives. This is done using the offline archive utility and will compare your archive and add/fix/remove records (whether primary or overflow records) for tags that were added or removed. You can consult the PI Server System Management Guide on the vCampus Library or contact our Technical Support, for assistance on the topic.


                              However, since you don't already have actual data in there, it will be simpler to do as Daniel suggested: get rid of your archive, then create another one(s) after making sure all your tags are created.


                              Corey Wirun

                              Don't think I have the XML interface as part of our vCampus benefits
                              That is correct: the PI XML Interface is not part of the vCampus PI Products Kit. You might want to contact your Account Manager and inquire about PI Development Servers (the ones assigned to your site/organization), not your personal vCampus one.

                                • Re: Backfill data in non-chrono order

                                  Steve Pilon

                                  consult the PI Server System Management Guide on the vCampus Library or contact our Technical Support, for assistance on the topic
                                  I just realized there's another good source of information for you to understand the structure of PI Archives (i.e. primary records, overflow records, reprocessing archives, etc.): chapter 5 of the PI System Manager I training course. You can attend the instructor-led version or go through it online, in the vCampus Training Center.


                                  Hope this helps!

                                    • Re: Backfill data in non-chrono order

                                      I did initially create the archives out of order (with the tags).  I rectified that situation and made sure the tags were created first.  I'm going to try this again this week, while watching the server stats during the data injection.  I'll have more info this week.

                                        • Re: Backfill data in non-chrono order

                                          Data in the event queue is not visible.


                                          For all data except Batch Database data, the data flow is the following:


                                          Snapshot -> event queue -> Archive


                                          The Snapshot is not in the Archive (it's in the piarcmem.dat), so you can always see the last value sent with the latest timestamp.


                                          You can monitor the flow with piartool -ss / -qs / -as or SMT or the corresponding perfmon counters. (ss = snapshot statistics; qs = event queue statistics; as = archive statistics)


                                          I suspect the Archive was just slowly processing data in the queue, especially if you had out-of-order data, so per tag, you would have the following:


                                          ST.....ET          snapshot value


                                          Then, after some more processing time, you would have the following:


                                          ST.............ET  snapshot value


                                          That is, the values will get filled in over time towards the snapshot as the event queue gets processed.


                                          We definitely recommend sending in chron order (oldest first) per tag, especially if you want compression and/or fastest backfilling.



                                            • Re: Backfill data in non-chrono order

                                              If you see archive data, then it disappears and never reappears, then the most likely explanation is that you don't have archives that cover that time range.  But if that's the case, you should definitely see errors in the PI Server Message log telling you so.


                                              The reason that archive history could be temporarily visible is that after the Archive takes it out of the event queue, it puts it into its write cache.  Data in the write cache is visible to archive queries.  When the write cache gets flushed to disk, that's when we look for an archive that covers the time range.  If there isn't one, the data is tossed, and any subsequent queries for that tag for that time range would show no events.


                                              So there are definitely scenarios that match your observation.  But if all the tags have data from the same time range, then I would expect the behavior to be consistent for all the tags in the backfill once the write cache is flushed.


                                              We have long had a request to keep archive data (in the event queue) until it is successfully flushed to disk, but the architectural requirements are not yet in place.