22 Replies Latest reply on Sep 26, 2011 7:30 PM by RJKSolutions

    Management of TB's in Collectives

      Bit of a general question.

       

      When you start hitting 500GB, 1TB and beyond for your PI server archive files what are the fastest methods people are using (either directly or via your 3rd party provider) for moving such large amounts of data around?  I am talking about occasions such as forming a 4 node collective from a PI server with 1TB of archives and you want to prepare the 3 secondary servers.  Or you have new hardware that means you need to add a secondary PI server etc

       

      Of course you can set up some batch jobs (e.g. robocopy) to do it in the background for a few days but it is a pain and I am an impatient guy  

       

      I started doing some research and got sidetracked by the CERN LHC and the sheer amount of data collected and transferred, 15 petabytes of data a year; a good read if you have some spare reading time... 

       


      http://public.web.cern.ch/public/en/LHC/Computing-en.html

       

      http://lcg.web.cern.ch/LCG/public/

       

      http://lcg.web.cern.ch/LCG/public/data_transfer.htm

       

      http://lcg.web.cern.ch/LCG/public/data-processing.htm      (1Mb @ 40,000,000 events per second) 

       

       

        • Re: Management of TB's in Collectives
          Ahmad Fattahi

          One basic question is where these machines are sitting relative to each other? Is network connectivity the only way? Is using an external hard drive and moving the disk around an option (pretty basic way of moving data!)?

            • Re: Management of TB's in Collectives
              jlakumb

              I am interested in your feedback on this discussion.  We are looking at archive management issues like these, since PI Server 2012 will support high point count/high data rate systems.

               

              Jay Lakumb

               

              PM, PI Server

                • Re: Management of TB's in Collectives

                  Jay, Isn't this an important talking point for OSIsoft's research project of PI running in the Cloud?  I have a few thoughts running round my head about PI in the Cloud...a discussion for another day.

                   

                  Ahmad, so far I am only really talking about data centres where a collective typically spans two physical locations (for disaster recovery).  Usually walking up to them with a USB (or Thunderbolt? - imagine having Fibre Channel SAN on your MacBook via Thunderbolt!) device and driving across to the other data centre is not an option - to get a 3rd party provider to do that (quickly) is even harder.

                   

                  Seriously wonder about a (public/private) cloud service for being able to switch on new PI servers with all software and archives ready at the flick of a switch.  Interfaces send data to the cloud and the cloud PI server replicates the data via SSB (had to get that one in there) etc...

                    • Re: Management of TB's in Collectives
                      wpurrer

                      My archives are growing with a rate of 4.5 gb per week. What i hate is the reprocessing of the archives we have to do on regular basis.  On archive takes about 1.5 hours to reprocess. You can calculate how long a job does take to reprocess the all our archives. ....

                       

                      what i think is also an issue that we want to increase the performance for example with ssd ... but if you ask techsupport they say there is no impact on the performance with ssd. I would like to see that osisoft tests pi on ssd ... to have some recommandation and no fancy timing issues that crashes the pi server if used with ssd.

                       

                      Next thing is the snapfix command which should be replaced by a recreated snapshot from archive cmd ... it`s a pain in the ass if you have a crashed pi server where you have to remove all snapfixes from the archive .. and wrong pe + ace calcs because of the snapfix entries.

                       

                      Same is with stability of the data transport ... there are a couple of issues where you loose data. With big systems even more.

                        • Re: Management of TB's in Collectives
                          Ahmad Fattahi

                          Wolfgang Purrer

                          What i hate is the reprocessing of the archives we have to do on regular basis.

                           

                           

                          What makes you reprocess archives on a regular basis? I would worry about that in first place.

                            • Re: Management of TB's in Collectives
                              wpurrer

                              *Because if i create tags .. and i like to backfill data,

                               

                              * for performance issues

                               

                              * to eliminate the overhead ???? records

                               

                              * to delete data ( a couple of million entries....)

                               

                              * ........

                                • Re: Management of TB's in Collectives
                                  wpurrer

                                  Ps ... have a look on archiving features like aspentech ip21 .... or honeywell phd ... i especially like layered archiving on honeywell phd .....where if you like to access the data one a years duration it uses precalculated days min/max and is still fast not like pi which always accesses all the data of a tag

                                  • Re: Management of TB's in Collectives

                                    Wolfgang Purrer

                                    *Because if i create tags .. and i like to backfill data,

                                     

                                    * for performance issues

                                     

                                    * to eliminate the overhead ???? records

                                     

                                    * to delete data ( a couple of million entries....)

                                     

                                    * ........

                                     

                                     

                                    Although not as regular as Wolfgang, I am moving (one-off activity) 10 years worth of data for around 30,000 tags.  An absolute administrative nightmare when it comes to the archives, something I am automating so I can sleep whilst it works.  The bigger a system gets the more administration becomes a distraction.  Maybe if you attempt to write data to an archive that doesn't contain the tag rather than reject it the PI server should queue the data and reprocess the archive automatically (controlled to some extent by tuning parameters) - in a collective environment it could share the load with its collective members that aren't doing much (I know it is easy to say but harder to implement).

                                     

                                     

                                      • Re: Management of TB's in Collectives
                                        jlakumb

                                        Wolfgang, you may be glad to know that PLIs 9313OSI8 and 23573OSI8 are targeted for PI Server 2012.

                                         

                                        Also, we are looking at archive management issues in this release. I will contact you separately to see if you are interested in having a discussion about this...

                                          • Re: Management of TB's in Collectives
                                            wpurrer

                                            Will PI Server 2012 a new Version (which is coverd by our maintenance) or a new Software (like PI2010)?

                                              • Re: Management of TB's in Collectives
                                                jlakumb

                                                Good question. I don't want to get into licensing issues here, but I will say that we're working on ways to ensure that all customers (regardless of which package) will be able to upgrade to the latest PI Server. Of course PI Server 2010 and later requires PI AF, so we are making that available to all customers in PI Server 2010 R3.

                                                 

                                                I'd encourage you to monitor the PI System Roadmap for more details about the future plans.

                                                  • Re: Management of TB's in Collectives
                                                    wpurrer

                                                    @Jay what is the planned future for "collective" .. there are a couple of features missing...

                                                     

                                                    (in terms of server side buffering, real write / delete  access (with full support of all oledb, sdk,... whatever functions...)

                                                      • Re: Management of TB's in Collectives
                                                        jlakumb

                                                        Looks like this topic has taken a slight detour.  SSB is still on our long-term plans, but there is no active development.

                                                         

                                                        I guess I would ask what problem(s) are you trying to solve?  Does the upcoming PI SDK 2010 R2 with SDK Buffering (aka client-side buffering) help?

                                                          • Re: Management of TB's in Collectives
                                                            wpurrer

                                                            Think of an ace application (sdk) or an oledb application which writes/deletes data into the archive ...

                                                              • Re: Management of TB's in Collectives

                                                                Thinking out loud...in some situations archive processing needs to be done on an offline server.  Although backup and restore to a server facilitates this, I wonder if it would be beneficial to create 'temporary' collective members - the type of collective members that aren't broadcast to clients that there is a new collective member, but they still behave like a member (receiving meta data updates); they are invisible/ghost members.  Then you can hammer the ghost member and when finished trigger archives to be moved back to the visible members.

                                                                 

                                                                You could then have an additional server available for performing resource intensive management  e.g. automatic archive reprocessing & data backfilling, archive merges, ...  The management tasks would be queued on the ghost member and distributed to all 'real' collective members when complete.

                                                                 

                                                                I've essentially done this via manual steps & some scripts - much better if I only had to click a few buttons

                                                                  • Re: Management of TB's in Collectives
                                                                    Ahmad Fattahi

                                                                    Good ideas. So you are basically introducing a lot more hierarchy in the collective than just Primary-Secondary, right? To me it looks more like a multi-member system with different members as opposed to current collective where all are almost clones.

                                                                      • Re: Management of TB's in Collectives
                                                                        wpurrer

                                                                        In my opinion in a real collective it should not matter in any way if one server is down (for maintenance or whatever reason) or who is master or slave.

                                                                         

                                                                        For example in past years we used "mysql" in an two way replication mode  for this, on this system it doesn't matter if one server is down, I could add/delete/change data or tables anyway from anywhere (also local on the servers) - they sync the servers with log shipping.
                                                                        (we used then MS - NLB to share the load from external to all servers.

                                                                         

                                                                        the current situation is with PI:
                                                                        * you only change configuration on the "master" (pi tags or AF)
                                                                        * the buffer only works for interfaces if they aren't installed on the server
                                                                        * and for a custom sdk - based application only "buffered insert" will be supported in the future.
                                                                        * from the point of view that the buffering doesn't work "every time" and you have two differnet buffered transport ways to the server -> that means it is likely that two servers aren't having the same data.

                                                                         

                                                                        What I like to have is a collective where there is no master or slave .. every server should be equal (all changes on data or configuration allowed anytime .. if a server comes up he will sync internally all the changes.)

                                                                         

                                                                        All the major or minor database suppliers provide this, osisoft should provide it too.and for process data the ACID or transaction rules for replication aren't that important (in my opinion

                                                                         

                                                                        As a programmer create a log for all changes on the server and ship to another server shouldn't be an issue, i think the current solution with the buffering external (not inside the server) is a bad idea (sorry Jay) -> maybe with the audit function this solution is nearly ready ?

                                                                        • Re: Management of TB's in Collectives

                                                                          @Ahmad, I guess so yes.  Like Wolfgang, a collective should be a collective of cloned PI servers but this is where mechanisms such as SSB come in handy.  Even if there was a Collective of cloned systems then the collective should be clever enough to utilise the collective members that have the most resources available for the administrative tasks.  Or on multi million point systems you may want to offload the work to worker/ghost nodes - it will be interesting to see PI Server 2012 in detail at vCampus Live.  Maybe even learn some more about PI in the Cloud (which would have to deal with these issues).

                                                                            • Re: Management of TB's in Collectives
                                                                              wpurrer

                                                                              Regarding PI in the Cloud:

                                                                               

                                                                              * what i really like to have is to be able to have different mandates?? on a server, that means each "area" key User i can give full admin rights to his tags, data,.. but all still runs on one server. => i would really appreciate if osisoft would go in this direction with their cloud approach not just install pi on a virtuall machine.

                                                                                • Re: Management of TB's in Collectives

                                                                                  Wolfgang, you mean beyond creating an Active Directory group that acts as a specific job or role, e.g. 'Site A -> Plant C -> PI administrator', and using PI Identities/Mappings?  Or are you talking about true usage of PI in the cloud where you would buy say 1,000 tags, send data to those tags and only have access to manage those 1,000 tags regardless of the underlying PI system(s) hosting those tags?

                                                                                    • Re: Management of TB's in Collectives
                                                                                      wpurrer

                                                                                      I'm talking about "true usage of PI in the cloud where you would buy say 1,000 tags, send data to those tags and only have access to manage those 1,000 tags regardless of the underlying PI system(s) hosting those tags" and i like to install this cloud edition on my "local" system so my department key users can manage there area in the site by themselves.

                                                                                       

                                                                                      My Organisation is:

                                                                                       

                                                                                      Global PI Admin

                                                                                       

                                                                                      -

                                                                                       

                                                                                      Site PI Admin

                                                                                       

                                                                                      -

                                                                                       

                                                                                      Area PI Key Users

                                                                                      Ps: and I'm very interested in the cloud solution because what they do with archive management,...
                                                                                        • Re: Management of TB's in Collectives

                                                                                          Wolfgang Purrer

                                                                                          Ps: and I'm very interested in the cloud solution because what they do with archive management,...

                                                                                           

                                                                                           

                                                                                          Me too because I am wasting too much time recently on archive management.  I remember reading (or hearing) about the OSIsoft research project on PI running in the (public/private) cloud, not heard anything since.  

                                                                                           

                                                                                          Are you attending vCampus Live?