Aggregating Event Frame Data Part 6 of 9 - Summary per Model

Blog Post created by rdavin Employee on May 11, 2017

The Advanced AF SDK lab at UC SF 2017 was on this very topic.  The material in this 9-part series follows much of that lab which showcases AFEventFrameSearch methods new to PI AF SDK 2.9.


Blog Series: Aggregating Event Frame Data

Part 1 - Introduction

Part 2 - Let's Start at the End

Part 3 - Setting up the App

Part 4 - Classical FindEventFrames

Part 5 - Lightweight FindObjectFields

Part 6 - Summary per Model

Part 7 - GroupedSummary per Manufacturer

Part 8 - Compound AFSummaryRequest

Part 9 - Conclusion


A Bona Find Aggregation Method

We've covered 2 different ways to produce our summaries but neither of those approaches used a true aggregation method.  Instead they both returned detailed rows where we had to apply our own custom aggregation.  In the case of FindEventFrames, the detail rows were heavyweight event frames.  In the case of FindObjectFields, the detail rows were data container records.  For this brand-new AFEventFrameSearch.Summary, we will getting back an aggregation and you will note what is sent from across the network to us (as recorded in Network Bytes Received) is only a teeny tiny bit memory.


Summary requires a priori knowledge of what you want to be summarizing.  In our case, we want to summarize by Manufacturer and Model so we must know all the Manufacturers and Models we wish to summarize before we can actually summarize them.  This was discussed in Part 3.  I chose to read an AFTable and populate a model-keyed dictionary inside a manufacturer-keyed dictionary.  You are in no way restricted to do the same.  You are encouraged to find the solution that best fits your own database and needs, and I welcome you sharing your creative solutions back on PISquare one day.


You may also remember that in Part 2 the Summary method seemed to be the slowest of the new methods.  It really isn't.  The problem is I am trying to have all these new methods produce the exact same output, so making multiple calls on Models within Manufacturers is really not the best use case of Summary.  On the other hand, it is a very good example of syntax on how to issue a Summary call, as well as what to do with the results that come back from that call.  Let's focus on that as the main lesson to be learned in the code below.


The Highlights

My a priori requirement is taken care of by my dictionary in a dictionary.  However, I will need to get an independent list of the keys to the dictionaries.


I will also need to issue a Summary per Model.  This means I must use the same base tokens or query that I used for our previous examples, and modify them for each Summary call.  Again, I could take the lazy or sloppy approach and only worry about Model since my current data set had 3 unique models.  But that code could break in the future if I were ever to add a Model with the same name to a different Manufacturer.  Instead, I will take a rigorous approach and truly query by Manufacturer and then Model.


All of this is to say that I will be looping first over Manufacturers, and then secondly over the Models.  Then I will modify the tokens or query string for inside the inner loop.  Because I will modify the input tokens/query repeatedly, I have renamed the input argument from "tokens" to be "baseTokens".


The final steps will be to receive the results from Summary, and unwrap them to conform to my DurationStats and StatsTracker objects discussed in Part 3.


public void GetSummaryByMfrAndModel(StatsTracker summary, AFDatabase database, IList<AFSearchToken> baseTokens)
    //Absolutely critical to have a priori list of Manufacturers and Models
   //Get independent list of Manufacturers
    var mfrList = summary.Keys.ToList();

    foreach (var mfr in mfrList)
        //Get independent list of Models for given Manufacturer
        var modelSubList = summary[mfr].Keys.ToList();

        foreach (var model in modelSubList)
            //Safest Technique: via tokens.  
            //Get independent copy to modify inside loop
            var tokens = baseTokens.ToList();
            tokens.Add(new AFSearchToken(AFSearchFilter.Value, mfr, "|Manufacturer"));
            tokens.Add(new AFSearchToken(AFSearchFilter.Value, model, "|Model"));

            //Starting with AF 2.9, AFSearch implements IDisposable
            using (var search = new AFEventFrameSearch(database, "Summary Example", tokens))
                //Opt-in to server side caching
                search.CacheTimeout = TimeSpan.FromMinutes(5);

                var desiredSummaryTypes = AFSummaryTypes.Count | AFSummaryTypes.Total;

                var perModel = search.Summary("Duration", desiredSummaryTypes);

                var totalVal = perModel.SummaryResults[AFSummaryTypes.Total];
                var countVal = perModel.SummaryResults[AFSummaryTypes.Count];

                var stats = new DurationStats();

                //Unwrap the returned results as needed
                if (countVal.IsGood)
                    stats.Count = countVal.ValueAsInt32();
                    if (totalVal.IsGood)
                        stats.TotalDuration = ((AFTimeSpan)totalVal.Value).ToTimeSpan();
                    summary.AddToSummary(mfr, model, stats.TotalDuration, stats.Count);



The above example uses query tokens.  I mentioned in Part 3 you could have used a query string.  If you wanted a string instead of tokens, I would have an input string argument named "baseQuery" containing:


$"AllDescendants:{allDescendants} Template:'{templateName}' Start:>={startTime.ToString("O")} End:<={endTime.ToString("O")} '{attrPath}':>={attrValue}"


Then inside the inner loop of Model, lines 16-18 would become:


var query = $"{baseQuery} |Manufacturer:'{mfr}' |Model:'{model}'";


Note the use of single quotes around {mfr} and {model}.  For model this is an absolute must have with our data, because we do have one model ("Nimbus 2000") that contains an embedded blank in its name.  For mfr, we did this for future proofing in case we ever add a Manufacturer with a blank in its name.  You may recall in Part 3 I cautioned that if it's a name or a path that the safest route is to wrap it in single quotes.  This helps make your code less fragile.


Event versus Time Weighting

For the Summary overload we used in the code above, the result is event weighted.  Normally with data coming from a process historian, I tend to first think in terms of time weighted values.  But we're working with event frames here, so my inclination is that the values are event weighted, that is there is a value associated with the entire event frame.  But that's me.  But you may be interested in getting back a time weighted number, so you might ask "Is there a time weighted overload?"


The trick answer is No.  While it's true there is not a Summary overload that allows you to pass an AFCalculationBasis.TimeWeighted enumeration, there is an overload that accepts a general weighting field as the 3rd argument.  This means you aren't restricted to either event weightings or time weightings, but you may pass a custom weighting!  The restriction here is that you pass the name of the weighting field, and that field must belong to the event frame.  For a time weighted weighting, the name of the weighting field could be "Duration" or perhaps you have another time span attribute defined on your event frame.



Metrics Comparison (from Part 2)

The numbers below are from a 2-core VM using Release x64 Mode.  The smaller values are better.  Caution that we sometimes have a difference in UOM between MB and KB, but I will bold KB when needed.


Resource Usage:

Values displayed are in MB unless noted otherwise


Total GC Memory (MB)

Working Set Memory (MB)Network Bytes Sent
Network Bytes Received
FindEventFrames145.48257.089.13 MB190.08 MB
FindObjectFields1.2865.555.00 KB3.68 MB
Summary2.5455.358.58 KB261.81 KB
GroupedSummary9.8664.286.24 KB1.98 MB
AFSummaryRequest7.2965.365.00 KB3.68 MB



MethodClient RPC CallsClient Duration (ms)Server RPC CallsServer Duration (ms)Elapsed Time



Caution again about CaptureValues()

The performance is realized because all of my event frames have captured values.  This means filtering by wind velocity, manufacturer, and model - all of which are attributes - is performed on the server.  That greatly reduces the network load.


I don't know you consider it a good thing or a bad thing that the code above also works if you have not captured values.  Yes it will work.  But it may be as slow or slower than FindEventFrames.


Use the Right Tool for the Right Job

The above example shows correct syntax and how to peel back the results as you need.  Admittedly, a 2-level summary is not a good use case for Summary.  I would absolutely reject using this method if I had to query model 5 times or more (that is make 5 or more invocations of Summary).  I may possibly consider it if I knew I had less than 5 models but would likely reject it as the method of choice unless I only had to make 1 or 2 calls.  With 1 call, it's a no-brainer: Summary is the right choice.  Would you like proof?


BONUS: Summary Using ONE Call

Let's come up with a better use case where we only need to issue one call.  Allow me to temporarily (just for illustration purposes) change my requirements on the end report.  I no longer am interested in the average and counts per manufacturer and model.  Instead I want to summarize over the exact same data set as a whole.  The new report would look like:


Manufacturer  Model            Count Avg Duration

------------- ------------ --------- ----------------

<All>         <All>           23,313 03:52:55.3506627

------------- ------------ --------- ----------------

            1            1    23,313


I get the exact same record count as the original report in Part 2, which shouldn't be surprising since I use the exact same filter.  For the code to produce the above report, I don't need to initialize my summary object to populate itself from an AFTable.


//I still use StatsTracker for conformity but we don't need to initialize this from our AFTable
var summary = new StatsTracker();


That is the summary instance I will pass to my new method, which now eliminates 2 levels of looping.


public void GetSummaryByMfrAndModel(StatsTracker summary, AFDatabase database, IList<AFSearchToken> tokens)
    //Starting with AF 2.9, AFSearch implements IDisposable
    using (var search = new AFEventFrameSearch(database, "Better Summary Use Case", tokens))
        //Opt-in to server side caching
        search.CacheTimeout = TimeSpan.FromMinutes(5);

        var desiredSummaryTypes = AFSummaryTypes.Count | AFSummaryTypes.Total;

        var oneCallSummary = search.Summary("Duration", desiredSummaryTypes);

        var totalVal = oneCallSummary.SummaryResults[AFSummaryTypes.Total];
        var countVal = oneCallSummary.SummaryResults[AFSummaryTypes.Count];

        var stats = new DurationStats();

        if (countVal.IsGood)
            stats.Count = countVal.ValueAsInt32();
            if (totalVal.IsGood)
                stats.TotalDuration = ((AFTimeSpan)totalVal.Value).ToTimeSpan();
            summary.AddToSummary("<All>", "<All>", stats.TotalDuration, stats.Count);


Since I get back only 1 row, there is no need to sort the results.  Let's review the metrics with making that one call:

Total GC Memory (MB)4.48
Working Set Memory (MB)52.48
Network Bytes Sent4.77 KB
Network Bytes Received260.02 KB
Client RPC Calls10
Client Duration (ms)534.0
Server RPC Calls10
Server Duration (ms)353.8
Elapsed Time00:01.1


Wow, that IS FAST!!!



Up Next: Reduce the Calls to the Outer Loop

Putting aside the bonus section, let's return to the original report by Manufacturer and Model.  We had to drill down into 2 loops to build our Summary call per Model.  In Part 7 we reduce the number of calls by making a call in the outer loop per Manufacturer.  We will do this with the GroupedSummary method.  See you in Part 7.