1 Reply Latest reply on Dec 11, 2017 12:38 PM by rschmitz

    Does anyone have experience with server clustering? How highly available is it?

    Heather

      Our systems were recently upgraded, but we took a pass on server clustering; it was our understanding that F5 load balancing was potentially in the future for notifications and analytics. I've followed up with tech support, and it seems OSI may be no longer considering F5 load balancing for analytics and notifications.

       

      For this reason, we're reconsidering server clustering, but have a couple questions for anyone who has experience with this:

       

      How highly available is the cluster? I'm under the impression that you will still have an application outage for server clustering, because the service only runs on one server at a time with clustering. The only difference is that the application will automatically start up on another server if the current "primary" box is unavailable. However, you'll still have to wait for the application to start up on the other box.

       

      What triggers failover to another server? Are there settings that are configurable? We don't necessarily want analytics failing over too frequently, especially with the amount of time that auto-backfilling takes.

       

      Thanks!

        • Re: Does anyone have experience with server clustering? How highly available is it?
          rschmitz

          Hi Heather,

          In my experience clustering is a robust solution for those who can tolerate a small amount of downtime in their applications, because as you mentioned the service runs on one server at a time with clustering. Cluster failover is triggered when the clustering service no longer detects a node over reachable over a network. These settings are tune-able (how frequently the node is checked and how many misses are acceptable before fail over kicks in), I would recommend taking a look at this blog post on MSDN. According to Microsoft, "The default settings out of the box are optimized for failures where there is a complete loss of a server" rather than brief network failures (avoiding the frequent failover you mentioned), although this may be something you need to discuss and test internally to determine the correct threshold parameters for your setup.

          Cheers,

          Rob