I would just like to ask if someone out there were able to calculate outliers in PI AF. Basically, you calculate the median then calculate for Q1 and Q3 so on and so on.
Your help is greatly appreciated.
I'd be happy to play around with something like this and see what I can come up with. Could you give me an example of sorts that you're after? What kind of inputs are we working with, what is the ideal "result" or "output" behavior?
Ricky, using SQC and other statistical models basic on standard deviation are pretty straightforward to do using out of the box PI tools. I think if you are looking at outliers and quartiles, you are more interested in how the data is distributed, and pretty much asking for a box-and-whisker style plot. We do supply a Median function in asset-analytics that is easiest if used for rollups and using the PE scheduler there is the medianFilter PE that has not been brought forward into asset-analytics. If you are looking at a distribution across several assets (AF elements), than rollup makes sense, but it would be difficult to determine Q1,Q3 since we don't have a nice method that puts everything about a specific value into an array on which to apply another median. I think it would be easy to find the min, the max, median, and how many are X units away from the mean. Example: count of events greater than the (max-median)/2 + median.
If you are looking at the distribution of say the last 1 hours of data using quartiles, you might want look at using a custom tool such as AF SDK programing, custom data references, partner solution or ACE. Even in SQL finding median is not very straightforward if I recall correctly.
I'll try to touch base with Taylor McManus and see if we can find an easier way to provide this. Again, obtaining/using sigma testing is easy; yet, Q1/Q3 is a different story using the PE syntax.
Indeed I got stucked in calculating for Q1 and Q3 and so we resorted into a workaround.Instead of for computing the quartiles, I started calculating for the rate of change. By computing for the rate of change I am able to have a predicted value for the next incoming real time value. I also added a range in which we would consider that the incoming value is still within the safe zone. If the value is out of range then we initially tag it as an outlier and would go next for different if else statement and comparison in order to verify if it is indeed an outlier.
Our client usually look at the trends and if he see a sudden spike then he investigate it manually. We wanted to do this in realtime so I guess my first job was to determine sudden spikes in real time. We are testing this process if it could really be an effective way to determine outliers.
That sounds like a great proactive approach, and I'm very curious to hear how it works throughout testing!
Retrieving data ...