According to this web page:
“You can enable Fault Tolerance for most mission critical virtual machines. A duplicate virtual machine, called the Secondary VM, is created and runs in virtual lockstep with the Primary VM. VMware vLockstep captures inputs and events that occur on the Primary VM and sends them to the Secondary VM, which is running on another host. Using this information, the Secondary VM's execution is identical to that of the Primary VM. Because the Secondary VM is in virtual lockstep with the Primary VM, it can take over execution at any point without interruption, thereby providing fault tolerant protection.”
From my understanding Fault Tolerance is not as efficient as having a PI Collective and let me explain why. As the Secondary VM's execution is identical to that of the Primary VM, those VMs are not independent which means that if a PI Service crashes in one VM, this will probably happen on the second one. Fault Tolerance seems to be a good solution to avoid data loss in case there is a hardware failure.
As all PI Server members of the PI Collective are independent, whatever happens with PI Server A won’t influence PI Server B. I believe this is the main reason why HA is a better option than Fault Tolerance in order to decrease the chance of facing data loss issues.
Hope this helps you!
@Vincent: To add to Marcos comments, the solution you present only addresses fault tolerance protection but doesn't offer high availability or load balancing. Although, you are right that batch related databases aren't replicated for a PI Collective; this can be less attractive for you to use.
Have you thought of combining both solutions for getting the best of both solutions?
I took a look on the requirements to enable fault tolerance for a VMWare vSphere and I suspect that some of the requirements (only 1 vCPU supported, no physical raw disk mapping) might be problematic with a PI Data Archive Server. IMHO, that would be for me potential drawbacks from this solution. I have mostly heard from "vMotion" capability to support high availability with this kind of architecture with relatively good success.
Thanks for the feedback, Marcos and Mathieu. I was not aware of the limitation of being able to use only 1 vCPU with VMs on which Fault Tolerance is enabled (I subsequently found a VMware article on more limitations - see here if you are interested). Also, the potential for an application crash on one VM to be mirrored on the second is a good point. If we did decide to pursue Fault Tolerance further, I would suggest testing this.
We will discuss further internally and decide accordingly. If you have any other thoughts, please let me know. In the meantime, thanks again for the feedback