Ahmad Fattahi

Advanced Analytics for PI Data

Discussion created by Ahmad Fattahi Employee on May 31, 2012
Latest reply on Sep 27, 2017 by Sridhar_K
PI Server 2012 is about Big Data! It is capable of handling more than 20 million tags (data streams) and a million events per second on a single server; brilliant! The natural next question would be "how do we look into and analyze this huge volume of data?" The sheer volume goes beyond any human's capacity to decipher relationships and trends in real time and historical data across an enterprise. How do you analyze your PI Data when it grows big? What tools do you use to look deeper into the PI Data? What kind of solutions are of significant interest and value to you?

Over the past couple of years we have been trying two major routes to harvest the Power of Data inside the PI System using analytical and statistical tools. For the examples below we used real data being collected from our headquarter building in San Leandro, California into our PI Server. We are interested in the power usage of the whole building to see what drives it up and down. We also collect the outside temperature as one of the biggest driving factors in this story.

PI System and R
R is one of the most dominant and fast growing environments for data analysis and Big Data. Best of all, it is a free tool known for its power in doing statistical analysis and impressive graphics. Through a few examples on the dataset explained above we managed to decipher very valuable knowledge and insight into the raw data. By integrating PI System and R we managed to create very interesting graphics revealing some hidden behavior of our dataset. For example, we could see that there is strong correlation between temperature and power usage. Looking at the distribution of the samples we detected two modes of behavior: one for weekdays and one for weekends. We also noticed that springtime is somewhat different than other seasons in California probably because of the moderate temperature outside.

Power distribution.jpeg
Temperature - Power - Correlation - smoothScatter.jpeg
Power grouped by day of the week.jpeg

PI System and MATLAB
We built a predictive model based on MATLAB decision trees. It gets months of historical data out of the PI System and trains the model based on the observed behavior. We used archived outside temperature and time of the week as the predictors to predict the power consumption of the building. In real time, we used temperature forecasts to come up with the "most likely" power consumption looking into a few days in the future. Without major adjustments to the machine learning algorithm we managed to get very good prediction precision. The result can be used to trigger notifications if the real time power deviates from the predicted (expected) value by a certain percentage. It can also be used for planning major projects, maintenance, and down times. PI Server, PI ACE, PI SDK, and MATLAB were used to accomplish the results.

Past and future trends.jpg
Prediction - MATLAB.png

Please share your thoughts on the general topic as well as any specific tools you use or analytical needs of value to your business.