Rick Davin

Aggregating Event Frame Data Part 4 of 9 - Classical FindEventFrames

Blog Post created by Rick Davin Employee on May 10, 2017

The Advanced AF SDK lab at UC SF 2017 was on this very topic.  The material in this 9-part series follows much of that lab which showcases AFEventFrameSearch methods new to PI AF SDK 2.9.

 

Blog Series: Aggregating Event Frame Data

Part 1 - Introduction

Part 2 - Let's Start at the End

Part 3 - Setting up the App

Part 4 - Classical FindEventFrames

Part 5 - Lightweight FindObjectFields

Part 6 - Summary per Model

Part 7 - GroupedSummary per Manufacturer

Part 8 - Compound AFSummaryRequest

Part 9 - Conclusion

 

The Classical Full Load Approach

We are going to be using the AFEventFrameSearch.FindEventFrames method, which was about the only thing available to us in AF SDK 2.8.  In Part 3 we talked about the filter query we will be using.  We essentially want a 2 level summary to be performed: the first level is by Manufacturer and the second level is by Model.  Take some time to think of how you would have done this.  Many of you should already have firm ideas on how you would do it, and perhaps many of you have already done something like this before.

 

My approach will be to call FindEventFrames using fullLoad: true because I do need to reference some attributes.  As explained at the bottom of Part 2, this is rather heavyweight since it will be bring back a lot of stuff that I'm not interested in for the specific task at hand.  Despite this heaviness there is one crucial thing the full load doesn't bring back: attribute data!  That means I have to compose a 2nd set of calls to fetch the data, which means additional trips to the server.  An experienced developer would know that it's inefficient to call GetValue() one-at-a-time, so we will implement some sort of custom chunking to process in bulk in order to minimize the number of trips.

 

For those who attended the UC 2017 lab, I am going to do something a bit different.  In the lab, I would initialize a StatsTracker instance in the method below, and the method would return StatsTracker.  I decided to initialize StatsTracker shortly after I set my database and template objects, but before I record the metrics.  The initialization makes an RPC call to fetch the AFTable, which is the same for all 5 apps, so I don't really want to measure it for all 5 apps since its the same thing.  The method below differs from the lab in that it takes the StatsTracker as an input argument and the method now returns void.

 

public void GetSummaryByMfrAndModel(StatsTracker summary, AFDatabase database, IList<AFSearchToken> tokens)
{
    const int chunkSize = 5000;

    //Starting with AF 2.9, AFSearch implements IDisposable
    using (var search = new AFEventFrameSearch(database, "FindEventFrames Example", tokens))
    {
        //Opt-in to server side caching
        search.CacheTimeout = TimeSpan.FromMinutes(10);

        var frames = search.FindEventFrames(fullLoad: true, pageSize: 10000);

        var chunk = new List<AFEventFrame>();

        foreach (var frame in frames)
        {
            chunk.Add(frame);

            //Process in bulk calls for a given chunk
            if (chunk.Count >= chunkSize)
            {
                ProcessChunk(chunk, summary);
                chunk = new List<AFEventFrame>();
            }
        }

        //Process last chunk (if any)
        if (chunk.Count > 0)
            ProcessChunk(chunk, summary);
    }
}

private void ProcessChunk(IList<AFEventFrame> chunk, StatsTracker summary)
{
    var attributes = new AFAttributeList();

    //First pass over each event frame in this chunk to gather the attributes
    foreach (var frame in chunk)
    {
        attributes.Add(frame.Attributes["|Manufacturer"]);
        attributes.Add(frame.Attributes["|Model"]);
    }

    //Secondly issue a bulk GetValue call on those attributes, but I need a dictionary for faster lookups
    var values = attributes.GetValue().ToDictionary(pv => pv.Attribute);

    //Finally pass over each event frame one last time to update summary using fetched values.
    foreach (var frame in chunk)
    {
        //Read data from current event frame
        var mfr = values[frame.Attributes["|Manufacturer"]].Value.ToString();
        var model = values[frame.Attributes["|Model"]].Value.ToString();

        summary.AddToSummary(mfr, model, frame.Duration, 1);
    }
}

 

You will note in line 09 that I do opt-in to server-side caching, which I do for the other 4 apps.  The difference is that here, where I know there is a performance penalty due to the heaviness of the objects, I use a timeout of 10 minutes.  Since the other 4 apps in the series will be much, much faster, I will use a timeout of 5 minutes for them.

 

About pageSize

The default value for pageSize is 1000.  I choose 10000 here.  Why?  Is it better?  Do I have inside information that its better?  NO.  I did this because life is too short.  While developing the lab, and testing frequently with each new beta build, I ran this over 1000 times.  Early on, when I looked at a 3-day period, the code above took 10 minutes to run.  This would be ridiculous to have a lab exercise take 10 minutes just to execute.  So I trimmed it my filter to 1-day and the code then took over 4-5 minutes to run using the default pageSize.  A pageSize of 5000 took 3-and-a-half minutes to run, but used more memory.  I settled on a pageSize of 10000, which used even more memory, but took about 2-and-a-half minutes to run, which is about as long as I would want a lab exercise to take.

 

There was a side benefit that since this took more memory using pageSize: 10000 that it really helped to show-off the memory savings of the new methods.  But that was just a side benefit.  It really came down to I didn't want to wait 10 minutes a couple of times a day to wait for this to finish.

 

About Producing Metrics

I have shown you results of metrics, but I haven't shown you how I produce them.  That is in this separate blog.  It is not a part of this 9-part series.  It will be a blog to stand on its own since metrics tracking is a topic completely independent of event frame searching or aggregation of data values.

 

Metrics Comparison (from Part 2)

The numbers below are from a 2-core VM using Release x64 Mode.  The smaller values are better.  Caution that we sometimes have a difference in UOM between MB and KB, but I will bold KB when needed.

 

Resource Usage:

Values displayed are in MB unless noted otherwise

Method

Total GC Memory (MB)

Working Set Memory (MB)Network Bytes Sent
Network Bytes Received
FindEventFrames145.48257.089.13 MB190.08 MB
FindObjectFields1.2865.555.00 KB3.68 MB
Summary2.5455.358.58 KB261.81 KB
GroupedSummary9.8664.286.24 KB1.98 MB
AFSummaryRequest7.2965.365.00 KB3.68 MB

 

Performance:

MethodClient RPC CallsClient Duration (ms)Server RPC CallsServer Duration (ms)Elapsed Time
FindEventFrames12063337.011039118.102:27.8
FindObjectFields105360.8114547.600:06.0
Summary159484.6169310.900:10.1
GroupedSummary125527.2134938.500:06.2
AFSummaryRequest102992.2102222.200:03.7

 

 

Next up: Finally Something New

This was just one possible way of producing aggregates, or specifically a 2-level aggregation.  There's lots of different ways this could have been done, even with FindEventFrames.  How would you have done it?  Would you have done something mostly similar but differing in some details?  Or do you have a totally different approach?

 

Anyway, we have established our baseline for performance metrics.  We are now ready to venture into brand new territory with Part 5 where we see how to use the new FindObjectFields method.

Outcomes