21 Replies Latest reply on Jul 18, 2013 10:16 AM by RJKSolutions

    Different archive files for different tags

      I have heard some talk on this topic a few times so time to bring it to table with OSI.  Is it feasible to think that for a single PI system you can seperate PI tags out to different archives, maybe based on tag attribute configuration - fast updating tags go to 1 set of archives, slow updating tags to another set.


      So fast updating tags are likely to cause archive shifts sooner and use up more storage, so if the tags could be seperated to seperate sets of archives the storage requirements for each could be handled differently.


      Thoughts on this?

        • Re: Different archive files for different tags

          OK so I got thinking after writing my own post, I suppose making use of the OSI PI COM connector could provide something along these lines.


          Having 2 PI systems (maybe utilising virtual machines) where 1 system is used for fast update tags (lets think of this system just as a storage system rather than a true PI system) and the 2nd for "normal" or slow updating PI tags (this is the system users would connect to and browse).  Install the PI COM connector on the 2nd PI system and configure the fast acting tags to connect through to the 1st PI system for data retrieval.


          I am just thinking along the lines of you have a nice big stable PI system, archives shift nicely and the system just runs without any real issues.  A project or initiative comes along that wants to add only a few PI tags to that system but the data load from those few tags is massive.  What's the best method to minimise the impact on the nice big stable PI system & archives?

            • Re: Different archive files for different tags

              These are very interesting thoughts. You have obviously done some serious thinking on this and we appreciate this. So I hope you will be glad to hear that what you describe here is basically what was shown at the UC 2007 for the future of PI (http://www.osisoft.com/Resources/Multimedia/User+Conference+2007/MO-02-03B.htm).




              The functionality is called Point Partitions and among the described functionality it could provide us some more (future data, better use of resources). But let me note that the implementation has been delayed so if you refer to the engineering plan this functionality has no release date yet but we are working on it.




              Meanwhile the least impact on the PI server as you have already stated would be a separate PI Server and the use of a PI COM Connector. However, this addresses only the concern of filling up archive space, it does not affect performance enhancements. The PI COM connector approach is also frequently used in cases where future data is requested. In this case it is common to use the OLEDB COM connector and have the future data stored in a relational database.



                • Re: Different archive files for different tags

                  Andreas, thanks for the information.

                  So is there anymore literature on Point Partitions?  Interested to see how you configure the points to an archive (using standard attributes or a new method?).

                  Currently for future data we use a secondary tag and retrieve data from it using an offset as the future data for a primary tag.  There are then tricks to display this data on trends, imagine forecasting energy usage then watching to see if actual usage matches the forecasted and calculating cost of deviations - this was achieved using the standard PI system and PISDK.
                  Interested to read some more on how future data will be handled in detail in PR2-D2.


                  Thinking on this would there be any additional benefits with separating the archives?  Could archives, if read-oly archives, be shared amongst PI systems?  In a collective for old data you could point the PI servers in a collective to the same read only PI archive on a separate machine.  This does add another point of failure in a collective setup but may present some advantages.

                    • Re: Different archive files for different tags





                      we do not have any documentation yet that I can share with you. For this you will have to wait some more time.


                      Point partitions could have several benefits (and that will depend on what gets implemented) - I can imagine a benefit in different history depths as you have stated, future data, you can distribute the archives to different resources (taking advantage of multiple storage channels), etc. One can imagine that you have high frequent data that you need to process in PI but neither need it afterward for trending nor needing a long history of this values so you store it in a separate archive set.


                      Your idea of sharing read only archives does sound interesting, but given that disk space is usually cheap compared to bandwidth and concerns in reliability I don't think it is a considerable feature.


                      Actually the approach of data with a time offset in the archive is also quite common, however, personally I do not like this approach.


                      Why? It is simply because I hesitate to have archives with wrong data (even if it is only the time-stamp) and the logic to correct that in the application. That is why I strongly prefer to go the COM connector road for future data. Future data is usually not available in large amounts, so storing that in a relational database (or sometimes even in a different PI system that is set to a different time) is easy to do.

                        • Re: Different archive files for different tags

                          Look forward to the documentation on Point Partioning!  You have got me thinking about archive shifts on multiple primary archives and if the server will one day allow us to set rules on archives e.g. once older than predefined number of days/weeks/months, become read only, make a backup copy to a directory & reindex archive on tag or time...automated system managers


                          Sometimes in projects you are given a PI server and told to provide an application/functionality with no influence over changes to the system (except tag creation etc) i.e. use existing infrastructure.  In these scenarios you really get to know the PI system and come up with some home grown versions of readily available solutions.  For future data we were constrained by "no changes to the PI system", since we only had a PI system and ModuleDatabase (no AF etc) to work with we came up with a solution ((there is always a solution) <-- self promotion ) using offsets and PIModules for persisting the data between the Archives and the Application.



                            • Re: Different archive files for different tags

                              Andreas, nearly 3.5 years on and I am still waiting for that documentation


                              Actually I think about point level partitioning every now and then, and it made a re-appearance over the last few days.  So much so I wanted to post about it, again.  Oh and I wanted to bring up a presentation from Monterey in 2007 when Denis first got me excited about this (as well as system partitioning, which would get me really excited if it was part of a presentation at this years UC!).




                              Anyway, if we are now looking at large PI Systems with high frequency data for many different type of tags I think back to point-level partitioning.  I have seen servers hosting PI Servers with some hefty storage with 2TB mount points and then maintenance processes to move about older archives as space becomes full, introduce new mount points, ...  However, it seems to me in the advent of such large PI Systems that to be able to logical separate points to specific archives in different locations would be a logical step.  For example, you assign all interface related performance/health points to one archive, the corresponding process data tags to another set of archives, and future data tags to a third type.  You could then spread each of those 3 primary archives to different storage areas and not worry about filling up one area and moving archives around.  Also, this would make for efficient archives in that those high frequency tags are grouped together where the archive shifts quicker than the other archives, which contains lower frequency data - all to do with the archive internal indexing...unless any of this has changed with PI Server 2012?


                              Well that was long way of asking if point level partitioning is remotely on the (internal to OSIsoft) engineering plan?

                                • Re: Different archive files for different tags
                                  Ahmad Fattahi

                                  I let the PI Server team chime in on the engineering plan. On the technical side, this reminds me of the discussion we had on my blog post on Hadoop. What you are saying is basically a special and efficient way of dividing the times series data into logically sensible shards. I would say even more than space benefits, it would make many analytics operations far better suitable for parallel processing (MapReduceable).

                                    • Re: Different archive files for different tags

                                      Indeed, your blog post brought it back to my brain's very own snapshot subsystem and it was waiting to be analysed, which it now has.


                                      Here is a nice picture from the presentation, the same presentation that mentions SSB by the way.





                                        • Re: Different archive files for different tags

                                          Ok, fair enough.  Sometimes it is nice to dream, but maybe we should have let everyone know that we were thinking big and sharing our vision.  We were happy to see that folks were just as excited about our future plans.  Along the way, things changed with *how* and *when* we plan to execute that vision, although our goals of *what* problems to solve are still the same.  I guess now is a good time to come back to earth and evaluate where things stand...


                                          In 2007 we showed a prototype using point partitioning to achieve 20M tags.  This year we will release PI Server 2012 which is capable of scaling up on a production system.  In addition, PI Server 2012 can collect, store, update, and distribute data at very high rates, which would be expected when dealing with increasing volumes of data.  How much and how fast?  I cannot reveal that yet but it is safe to say we achieved the goals we set out to accomplish in 2007.  How did we do that you might ask?  Well, you have to come to the UC or watch the video after.  Suffice it to say, we accomplished these significant improvements without point partitioning (although we do rely on modern 64-bit Windows servers like we talked about back then).


                                          So what else happened between 2007 and now?  We took care of some very important and urgent needs with High Availability, Windows Security, and PI AF Integration.  In fact, one could argue that PI AF used with multiple PI Servers enables point partitioning today.  For example, a customer could store slow data and high speed data in separate PI Servers with different archive management strategies, retention policies, security, etc.  Then PI AF could be used to access the data in these multiple PI Servers through PI AF attributes.  Sure, you might say this adds a little more complexity with multiple servers (and additional licensing costs), but it is possible and provides some of the benefits that we envisioned with point partitioning in a single server.


                                          What else did we deliver on?  We made good on PI SDK buffering for manual entry applications and custom interfaces.  We also have HA for PI Event Frames so although this is not the same as batch replication, it will provide the same level of redundancy when customers are ready to migrate to PI Event Frames.


                                          What does that leave us with?  A couple major items are still on our todo list: future data and server-side buffering.  Admittedly these are long overdue.  It is difficult to choose which to tackle first, but if I am doing my job correctly and listening to our customers, then clearly future data is next (sorry Rhys, that may not be what you were hoping to hear about SSB).


                                          Ok, where are we at overall?  We talked about performance and scalability and this is almost done.  We talked about better HA and replication and accomplished some parts of those, but clearly there is more to do.  Finally, we talked about future data and presented useful workarounds at vCampus Live! 2011 (see Extras in Download Center), but our work here is just beginning.  To summarize and answer the original question - Yes, enhancements like future data, server-side buffering, and partitioning are still in our long-term plans (in that order) along with many others.  Of course those plans may change over time as we continue to gather and evaluate market feedback, however rest assured they are not forgotten.


                                          I look forward to continuing this discussion with all of you at UC.

                                            • Re: Different archive files for different tags

                                              @rhys ... the process data archive I have seen, with the highest performance used one file per tag.


                                              Some of my users still complain (why don't we have this system...)

                                              • the system had a higher datarate stored and
                                              • in the trend and data access tool, you can choose which datarate you want ... for example :i want the data of last 5 years, but i need only hourly values.
                                              • it also saved a lot of storage because there wasn't a need to store a pointer to next entry (osisoft) or a index file (aspentech) or some list in front of every archive (again osisoft). 
                                              • and it was fast

                                              the system was based on linux, so you don't have the issue with open files limits ... but with w2k8r2 and 64 bit this limit should be gone too.


                                              I also think it would be better if the system automatically generates hourly,... values so this would reduce overall performance decresses with a lot of data. ... and then you can choose the seconds date i only need 1 months... please keep the 5 minutes data for 3 years and the above forever. ( honeywell phd maybe has this... haven't been this som eng. who worked at osisoft before.....) ... also taking compressing into account.


                                              I'm not sure if with ssd disks .. and reduced access times the partitioning is still needed.


                                              Summary: i think partitioning is a bad idea, which really doesn't solve the problem in the core. in my opinion the two alternatives make much more sense





                                                • Re: Different archive files for different tags
                                                  Ahmad Fattahi



                                                  (I am just talking my personal idea on the technical aspect of the matter and not official OSIsoft position on this) The idea f partitioning data attaches the scalability from the root. So the scenarios you are putting forward stem in the fact that not a single server can handle all the queries when someone asks for the data over 3 years. Therefore, you would like to "work around" the problem by returning only hourly values.


                                                  What sharding/partitioning of the archive files can potentially do is to divide and conquer. So when someone asks for 3 years worth of data, not one super strong server but several smaller and cheaper servers will get the job done.

                                                    • Re: Different archive files for different tags

                                                      @Jay: Thanks for the comprehensive reply, appreciated.  Can I be a pain in the backside...if there was an experimental version of the PI Server that had point partitioning to achieve higher scalability back then, isn't the principal still valid (improved given the new index structures) given the PI Server's current architecture?  I am not talking about scalability necessary with point partitioning being applied to PI Server 2013 and above, but increased efficiency and manageability of the PI Server.  Appreciate "PI in the Sky" will probably tackle these type of issues so you are limited in 


                                                      @Wolfgang: On the whole I agree with what Ahmad has said, we are singing from the same hymn sheet.  On one of your points you mentioned the system stores high rates of data but the client tool specifies 5 years worth of data at an hourly rate - you can do that with ProcessBook technical, practically it is not intuitive to do so..but..that is a feature of the client tools and not necessarily the server (unless I have misunderstood you).  I have worked with a proprietary data historian that runs on VMS and stores data as minute values, automatically calculates hourly, daily, weekly, monthly, book closing and yearly average values.  That was great for the applications being used against the data but it is less efficient IMO - no compression algorithms, ...


                                                      By the way, seen you have updated your goal for end of 2012



                                                        • Re: Different archive files for different tags

                                                          Rhys, what's your opinion on one tag per file... instead of storage groups.


                                                          I know that some system which haven't compression are using the automatic approch... maybe  a combination of all  systems would be best.


                                                          Compression + one tag per file + automatic compression settings widening based on time ( so the archive inreases compression over time) + automatic hourly, daily ,.. whatever values.  maybe with the third one, you don't need the last one.