yes, the PI ACE failover will happen automatically but with the setup you have, you will get "Calc Failed" statuses for a certain time. This is because the PI Server resides on the same machine and it only supports cold failover (it does a complete stop and restart to fail over).
good news: you shouldn't have to code anything yourself, so you can leave the C/C++ failover cluster API alone
you will get "Calc Failed" statuses for a certain time
You may get those, depending on how your calculations are scheduled
As a side note, here is the procedure to install and run the PI ACE Scheduler on a Microsoft Cluster (this comes directly from the "Running ACE on a Cluster" article in the PI ACE User Guide, available on the vCampus Library):
- Install the ACE Scheduler on a shared disk on both cluster nodes as follows: Install ACE on the current node; fail over to the other node and install ACE again.
- The following step is only necessary if you run the ACE Scheduler on the same PI Server where the ACE structure information is stored: Add the ACE Scheduler to the cluster group for PI and make it depend on pibasess. Note that this dependence only applies to failover and differs from the service dependence. Do not make the ACE Scheduler service dependent on the pibasess service. One potential issue with this resource dependence is that when pibasess is shutdown, the ACE Scheduler would be shutdown too. However, the ACE Scheduler is not restarted automatically when pibasess is restarted. This can be the case if one runs a backup script shutting down pibasess. If this is the case, make sure to start the ACE Scheduler afterwards.
- Register all the ACE Executables on both nodes.
- Make sure the target PI Server to start calculations on both nodes is identical.
Note 1 (from documentation): Never manage clustered resources (starting, stopping, setting dependencies, and so on) from the Windows Services applet. Perform all of these actions via the Cluster Administrator once the services are configured as cluster resources. Not following this rule can inadvertently trigger the clustered resources to fail and potentially cause an unnecessary cluster group failover.
Note 2 (from Steve ): Step #3 is only necessary if you are using PI ACE 1.x in VB6, where compiled PI ACE Calculations are under the form of .exe files.
I try to avoid vb6, but not entirely successfully. I don't have any PI ACE 1.x in VB6 anymore, but I was just working on upgrading a vb6 app for possible inclusion in a windows service or in PI ACE 2.x.
I'm interested in your comment, "You may get those, depending on how your calculations are scheduled ". Are event scheduled ACE applications, and ACE with recalculations the only ACE exe's that are likely to get "Calc Failed" outputs?
Thanks Michael. I wonder how much trouble it would be to just suppress output from ACE, if there would be a "Calc Failed" message on startup. Hmm, on the other hand, I wouldn't want the "Calc Failed" message to be suppressed at other times. Then I may have no way of knowing whether the calculation was attempted or not.
Just out of curiosity, since you have two physical servers already why are you using clustering and not the high availability (HA) architecture supported by the PI and ACE servers?
A couple of observations on this:
First, I'm not sure if the manual/help are completely clear, but if ACE is run on a clustered PI server, the ACE scheduler services must be part of the cluster group containing the PI services.
Second, we actively discourage customers from running ACE on a clustered PI server. Any failover of the PI cluster group will also require a cold failover of the ACE calculations. If any of you have large ACE installations, you know that restarting ACE can take a significant amount of time. In addition, it is likely that interface nodes will reconnect and dequeue buffered data long before the ACE calculations are restarted, so many calculations will be missed.
Running ACE on a dedicated server (or better yet, running redundant ACE schedulers on separate nodes) is a better way to go. And it's much less expensive and easier to manage than a PI+ACE cluster.
I wonder how to best manage code using redundant ACE schedulers on separate nodes. What if there are some calculations that should only run once, say event driven calculations? Is there a way to coordinate the actions of separate ACE schedulers so that the calculation doesn't re-run on the second scheduler after the first has started running?
I'm not sure I understand. In the first paragraph, it says that if ACE is run on a clustered PI server, the ACE scheduler services must be part of the cluster group containing the PI services. Then in the last paragraph, it says that running ACE on a dedicated server (or better running redundant ACE schedulers on separate nodes) is a better way to go.
Here's what I don't fully understand. If we put redundant ACE schedulers on separate nodes, then can we use these ACE schedulers against a clustered PI server that is remote to the ACE scheduler? Or can we just not use ACE against a clustered PI server?
And would each redundant ACE scheduler node need to include it's own Visual Studio license, and have duplicate copies of each executable and module code residing on the same box as the scheduler?
If you install ACE on the clustered PI servers, then the ACE scheduler needs to be in the same cluster resource group as all of the PI server services. This is largely because of the common dependencies on PI Network Manager. You don't have the option of installing ACE on the same machines, but not setup as part of the cluster, because of this common dependence. Likewise you can't have ACE as its own cluster resource.
If ACE is installed on its own servers then it and PI become completely independent. You can run one ACE scheduler, or an HA configuration of multiple ACE schedulers. In either case they'll all view the PI cluster as just simply the PI server. When a cluster failover occurs, PI is stopped on one node and then started on the other, so there is essentially a PI outage during cluster failover. ACE will behave just as it would if you had a single PI server remotely that got restarted for some reason.
When you have an HA ACE configuration, you need to make sure that the calculation files (DLLs) are distributed to all of the scheduler nodes. There isn't any functionality in ACE to do this for you, and ACE won't check to make sure that they're all the same version. This is another good reason to run our HA configuration for ACE. It allows you to update the code on one server, then failover to that server while you update the other one. A cluster won't do this without the equivalent of an ACE restart (outage).
You need a Visual Studio license on any machine where you are developing ACE calculations. This can be an ACE server or it can be another machine (such as your personal workstation). There is not a VS license associated with the ACE runtime engine.
A question made to PI tech support a few months back said that HA does not support ACE. So, we did not consider this option. Has this changed? I wonder if HA supports redundant PI servers that can be accessed via ACE, but not redundant ACE calculations directly. Did I misunderstand the HA/ACE issues?
Well, I can't speak to the conversation you had with tech support. The answer they gave you is very likely correct given the information they had at the time (like for instance, if ACE 1.x was part of the conversation).
Regardless, ACE 2 does support an HA configuration. Essentially, you run multiple copies of the scheduler on different machines simultaneously. The ACE schedulers keep track of who's alive and who's not, and ensure that calculations are running on one and only one server at a time. ACE 2 is also able to read data from and write data back to PI collectives as well, so the full system of servers can be fully redundant without the use of a cluster.
This is the architecture that we generally reccomend. Again, I don't claim to know all the details of your situation and so I'm very, very hesitant to suggest that you do something other than what tech support recomended to you. However, if you are running solely ACE 2.x calculations, I would definately suggest that you grab the latest ACE manual and familiarize yourself with the HA architecture.
Same goes for the PI servers. Speaking generally, PI collectives are superior in almost every way to MS clusters. It's a tailor made solution for the way PI works, as opposed to clustering which has to function in almost a complete vaccuum of knowledge about the app it's handling.
I went back to the Tech support reply, and found the following information from April 2009. In our case, specific SDK, and not just traditional Input Tag/Alias and Output Tag/Alias calls are being made. The SDK call issue was the reason I steered away from HA for our ACE applications. I think I will ask another support question, to clarify whether all SDK calls other than the ACE functions are not supported in HA/ACE environments, or if there is a list of certain SDK calls that are not supported by HA.
Here was the tech support response regarding HA and ACE.
ACE does work with HA with the following requirements:
- ACE scheduler needs to run on a box other than a PI server.
- Need to install BufSS on the box with the ACE scheduer. This provides n-way buffering.
ACE writes to BuffSS which, in turn, writes to all members of the collective. If you are writing specific SDK calls, this will not work.
If you are using ACE in the traditional Input Tag/Alias and Output Tag/Alias, it will work in an HA environment.
Ah Ha! As I suspected, the TS guys are right (as usual).
Now as for where that leaves you, the SDK is scheduled to support n-way buffering in a release scheduled for Q12010. At that point, you could "have your PI and eat it too" [sorry, occupational hazard].
Untill then I'd suggest that you start a new vCampus thread with the details of why you needed to make direct SDK writes to PI from ACE. Perhaps there's a way with the built in functionality to acomplish the same thing, which would allow you to run the HA ACE configuration now.
I already wrote a question to tech support about the reasons I needed SDK calls. Primarily because I am writing to structured annotations, which previously were not supported under ACE. I asked if these are supported yet. Also, because I am looping through more than 2700 tags, which could require a module database tagstream (using up a good chunk of our counted point license) for each output alias. Instead, I just have a consistent tag naming strategy that can be formed in a for/next loop, and I used SDK calls to point to each tag during loop operations via pisdk.pipoint("pointname") functions in the SDK.
I'll wait for tech support's answer before starting another post. I like your suggestion, but I asked the tech support question before I read the suggestion in your post.