5 Replies Latest reply on Oct 22, 2014 5:00 PM by Gregor

    Collective shutdown/startup

    marco.zoccoli
      Hi all, for some backup&restore test, I have to shutdown first,and later startup all 4 members of a collective (server primary, interface node primary, server secondary, interface node secodnary). Could you please indicate me the correct order to switch off and in a second moment switch on this 4 components? Best Regards, Marco
        • Re: Collective shutdown/startup
          Roger Palmen

          This being HA, it should not be a big issue whatever the order is, but to prevent unnecessary sync activities, i'd shutdown in this order:

          1. Interface secondary
          2. Interface primary
          3. Server secondary
          4. Server primary

          And restart in reverse order.

            • Re: Collective shutdown/startup
              marco.zoccoli

              Thanks a lot!

               

              Have a nice day,

               

              Marco

              • Re: Collective shutdown/startup

                Hello Marco & Roger,

                 

                The challenge here is to keep data loss at a minimum or to prevent it at all, if possible. Shutting down all nodes at the same time doesn't appear the right approach to me. Instead my idea would be to restart one node at time.

                 

                The interface nodes are critical to data collection. Under the assumptions that all interfaces are OSIsoft interfaces, all of them are of current versions, all connections are set up with UNIINT Interface level failover phase 2 and PI Buffer Subsystem properly being set up to buffer against both PI Data Archive nodes, I suggest finding out first what interface instances are currently in Primary role. UNIINT Interface level failover phase 2 uses a shared file and 7 PI tags to control failover functionality. Each member of a failover instance has an ID that should be unique system wide. If I recall correctly the ActiveID failover tag reports the active member of a failover pair and can as well be used to force a manual failover. I recommend confirming this with OSIsoft Technical support and also check if there's something interface specific. However, the first candidate for a restart would be the interface node with all instances in backup / standby mode. As soon as the node is restarted and back online again, verify with the PI Message Log (pipc.log with older interface versions) if all interface instances start properly and do not report any issues. Also verify that PI Buffer Subsystem is operational. Now promote all interface instance members on the restarted interface node as Primary, so the node that was restarted already becomes "Primary" and the remaining interface node becomes the backup node. Depending on different factors e.g. points assigned a failover can take some time and there's a chance some data becomes lost during the transition. The important thing however is that all interface instance members on a node are in backup state before restarting the machine.

                 

                I am not sure if we recommend restarting the Primary node or Secondary nodes of  a PI Data Archive node first. This is something Technical Support again might be a good resource for. There's just one known issue coming into my mind that you likely will run into with points serviced by PI Totalizer and PI Alarm Subsystem. Please see known issue # 19811OSI8. Tags however serviced by OSIsoft interfaces shouldn't suffer any data loss if PI Buffer Subsystem is set up properly because PI Buffer Subsystem maintains a separate queue for each PI HA Collective member and data becomes simply queued while a PI Data Archive node is unavailable. As soon as it becomes available again, PI Buffer Subsystem will reconnect and empty its queue.

                  • Re: Collective shutdown/startup
                    Roger Palmen

                    Agree that if you don't want to loose data, you need a more careful approach.

                     

                    But if you want to do a restore, then in general you loose data that is not in the backup...

                     

                    Maybe it's good to have some explanation of what the test is trying to prove?

                      • Re: Collective shutdown/startup

                        Oups! Sorry Roger, I have overlooked the small little detail that this is intended to be a test.

                         

                        @Marco: What are you trying to achieve here? Are you backing up virtual machines? With regards to PI Data Archives the best recommendation to avoid disruption is using PI Backup but if your approach is really to backup complete VM's I suggest running at least 1 interface node that will buffer the data while the other 3 nodes are down and back up this interface node as soon as both PI Data Archives and the other Interface node is operational again. This way you should be able to avoid data gaps.