4 Replies Latest reply on Oct 10, 2014 12:05 AM by skwan

    Multi-threaded performance overhead within AFSDK

    Rhys Kirk

      We are experiencing an issue with the AF SDK 2.5 & 2.6.

       

      It appears that as soon as > 1 thread is attempting a data access request that there is an overhead to manage the concurrent requests; it appears to be consistent despite the number of threads.

       

      For example, I want to call PlotValues for 5 Attributes using AFAttribute.Data.PlotValues. I can call those on a single thread in sequence and in general is performs extremely fast - for arguments sake 10ms. If I take an alternative approach and spread out the PlotValues across multiple threads the response time per data request increases across all threads - in this case it can jump up to 150ms. There is no delay on the PI Server (I can see via the piarchss threads there is no block, and multiple threads are serving the requests), the block appears to be coming from the AF SDK.

       

      I can interchange between 1 thread, and 2/10/20 threads, and 1 thread is always much faster to return from the PlotValues call. 

       

      I have a TechSupport ticket open, so you can contact me via that or post the cause on here. TechSupport suggested I post the issue here too.

        • Re: Multi-threaded performance overhead within AFSDK
          Rick Davin

          Only got questions for you.  One, are you using Parallel LINQ, System.Threading.Tasks.Parallel, or your own custom threading?  Secondly, when you say that a single thread is faster, are you issuing a bulk call the 5 attributes (probably not in 2.5) or processing them serially?

           

          Some various vague and extremely general comments that may be of little truth or help to you ...

           

          Sorry but parallel processing for 2, 5, 10, and 20 items isn't strongly justified.  Creating and managing threads does carry a cost that may not be warranted by your application.

           

          I'm under the impression that a PI server can handle a maximum of 8 concurrent RPC calls (don't know if its true, 8 max for everyone, or 8 max per user).  So when I use System.Threading.Tasks.Parallel, I also specify the ParallelOptions.MaxDegreeOfParallelism to be around 3 - 4 threads depending upon what I am doing.

           

          Retrieving PlotValues shouldn't be too big of a burden per attribute.  Sure, one attribute may only have 2 values, but even an attribute with a large amount of values should return a much smaller subset using PlotValues.  Therefore, one can almost, sort of, kind of consider the processing requirements per attribute to relatively consistent.

           

          That said, if you every want to process a lot of things with a count in the 100's or 1000's in parallel, and those the processing of each individual thing is fairly consistent with any other thing in your list, then straight-up parallel processing on each individual thing can be quite inefficient due since each thing requires the overhead of creating and managing a thread.  If you feel that is your case, I suggest you look into System.Collections.Concurrent.Partitioner.  It divides your collection of attributes into partitions or ranges.  You then would parallel process a range for the outer loop but serially process each attribute in an inner loop.  This is more efficient because maybe only 4 threads will ever be created, instead of say 1000.

           

          Again that depends on the application.  If you are fetching full archives and not just PlotValues, then this may not be a good fit.  I have instances where one tag may have only 10 values over a given time period and the very next tag in my list has 500,000.  This would be a poor fit for partitioning.

            • Re: Multi-threaded performance overhead within AFSDK
              Roger Palmen

              Dunno if this will add anything to the solution. but why use threads when you obviously don't need them?  But the real question: have you been able to find a break-even point when you scale this up where the multi threading is faster?

               

              Really like the suggestion of Rick of using the Partitioner class! 

                • Re: Multi-threaded performance overhead within AFSDK
                  Rhys Kirk

                  Rick, thanks for the questions. I've been chained to my desk in a dark room in an attempt to solve this.

                   

                  The programming was sound, some areas for on-going optimization but it should have functioned without such delays. In fact as it turned out in some environments it did function as expected, but in some (the important, production ones) the horrendous latency was being introduced.

                   

                  As a side note, the subsystems in the PI Server can have a configurable number of threads to serve RPCs, the default is 8 for piarchss.

                   

                  I won't bore you with the long list of checks and tests we had to go through, but it ultimately came down Nagle's Algorithm and delayed ACK in the TCP stack. It just so happens that the particular data pattern being requested was just right to trigger the algorithms to apply their timer for delayed ACK. The repetition of data request from the multiple threads was also a trigger point, i.e. a single thread did not trigger the delays as data was not being repeated in parallel.

                   

                  In case anyone wants a read, this has the same fix that we ultimately ended up applying:

                   

                  blogs.technet.com/.../tcp-delayed-ack-combined-with-nagle-algorithm-can-badly-impact-communication-performance.aspx

                   

                  Have to mention my new best buddies from TechSupport and the Development team...Arnold, Eddy, Ryan and Alexander!