3 Replies Latest reply on Jul 28, 2010 1:43 AM by nagarwal

    Calculations failing intermittently

    SteveOD

      Hi,

       

      Our ACE Server is running 11 exes, 35 modules and 560 contexts on an 8 CPU server.

       

      The majority of calcs are scheduled every minute, with offsets spread throughout the minute.

       

      I'm recently seing a significant number of micro "Calc. Failed" status values. These values only last for one minute and then the next cycle they result in vaild resultants. These occurences very significantly from 3 - 30 minutes apart.

       

      My thought is that it is an overload problem - if the calculation doesn't write a value out in the right timeframe, it writes a "Calc Falied" as the output value.

       

      The two problems exes have 232 and 67 contexts, respectively.

       

      My question is - What is the best way to reduce the overloading ?  I could create an identical executable(s) and spread the contexts over them so one executable is not timing out.

       

      Are there any tuning parameters that may be useful, and if so what values would help ?

       

      Any input is welcome.

       

      Cheers.

        • Re: Calculations failing intermittently
          SteveOD

          A second thought is to create additional modules within the big exe. Would each module then run as a separate thread on a diferent CPU ?

            • Re: Calculations failing intermittently
              nagarwal

              Hi Steve,

               

              I don't expect creating additional ACE modules under the same ACE executable to provide any benefit. Basically, scheduled calculations corresponding to all contexts within an executable (which may be coming from different ACE modules) are simply put on the .NET ThreadPool. Following this, its up to the system to schedule these threads - so as to take full advantage of the underlying hardware (multi-core or multi-processor).

               

              Splitting the contexts in multiple executables may help in the sense that each host process would be using lesser memory and would have its own ThreadPool - which increases concurrency to certain extent. However, increased concurrency is only going to be helpful, if you were not reaching very high CPU utilization with the single executable. 

               

              Finally, before you explore any of these alternatives, it is important to make sure that this indeed is a loading issue. For most part, you would see "Calc Failed" event written to the output because of some unhandled exception in your calculations. As Han pointed, you should also monitor ACE performance counters to get more insight into this.

               

              -Nitin

            • Re: Calculations failing intermittently
              hanyong

              Hi Steve,

               

              I guess I don't have all the answers that you are looking for. But it sure sounds like this is not going to be easy to troubleshoot.

               

              It may or may not be true that server overloading can cause "Calc Failed", but I think there are certain things that you can do to prevent "Calc Failed" being written. Some suggestions can be found in this other post, Something to add on to it is if an exception is thrown in the ACE calculation, we can get "Calc Failed" as well. So perhaps you can have a try and catch in the ACE calculation like:

              Public Overrides Sub ACECalculations()
              [declarations]
              Try

              [your ACE code]

              Catch ex as exception
                  ' indicate not to write value to PI, avoid "Calc Failed"
                  OutputTag.SendDataToPI = FALSE
                  ' write exception message to message log
                  PIACEBIFunctions.LogPIACEMessage(OSIsoft.PI.ACE.MessageLevel.mlErrors, ex.message, MyBase.Name)
              End Try

              End Sub

               

              To verify if the ACE server is overloaded or not, we can have some performance monitoring. ACE exposes perfmon counters that we can look at like "Number of aborted calculations", "Number of skipped calculations" and "Time to complete calculations". You can probably monitor some of the basic system counters like "% Processor Time". These should let us have a better idea of what is happening to ACE server.

               

              As for your questions about spreading the context to different exe and if modules in an exe are running on seperate threads, I'm afraid I do not have an answer to that.