PI Server 2012 is about Big Data! It is capable of handling more than 20 million tags (data streams) and a million events per second on a single server; brilliant! The natural next question would be "how do we look into and analyze this huge volume of data?" The sheer volume goes beyond any human's capacity to decipher relationships and trends in real time and historical data across an enterprise.

Over the past couple of years we have been trying two major routes to harvest the

R is one of the most dominant and fast growing environments for data analysis and Big Data. Best of all, it is a free tool known for its power in doing

We built a predictive model based on MATLAB decision trees. It gets months of historical data out of the PI System and trains the model based on the observed behavior. We used archived outside temperature and time of the week as the predictors to predict the power consumption of the building. In real time, we used temperature forecasts to come up with the "most likely" power consumption looking into a few days in the future. Without major adjustments to the machine learning algorithm we managed to get very good prediction precision. The result can be used to trigger notifications if the real time power deviates from the predicted (expected) value by a certain percentage. It can also be used for planning major projects, maintenance, and down times. PI Server, PI ACE, PI SDK, and MATLAB were used to accomplish the results.

Please share your thoughts on the general topic as well as any specific tools you use or analytical needs of value to your business.

**How do you analyze your PI Data when it grows big? What tools do you use to look deeper into the PI Data? What kind of solutions are of significant interest and value to you?**Over the past couple of years we have been trying two major routes to harvest the

**Power of Data**inside the PI System using analytical and statistical tools. For the examples below we used real data being collected from our headquarter building in San Leandro, California into our PI Server. We are interested in the power usage of the whole building to see what drives it up and down. We also collect the outside temperature as one of the biggest driving factors in this story.**PI System and R**R is one of the most dominant and fast growing environments for data analysis and Big Data. Best of all, it is a free tool known for its power in doing

**statistical analysis**and impressive graphics. Through a few examples on the dataset explained above we managed to decipher very valuable knowledge and insight into the raw data. By**integrating PI System and R**we managed to create very interesting graphics revealing some hidden behavior of our dataset. For example, we could see that there is strong correlation between temperature and power usage. Looking at the distribution of the samples we detected two modes of behavior: one for weekdays and one for weekends. We also noticed that springtime is somewhat different than other seasons in California probably because of the moderate temperature outside.**PI System and MATLAB**We built a predictive model based on MATLAB decision trees. It gets months of historical data out of the PI System and trains the model based on the observed behavior. We used archived outside temperature and time of the week as the predictors to predict the power consumption of the building. In real time, we used temperature forecasts to come up with the "most likely" power consumption looking into a few days in the future. Without major adjustments to the machine learning algorithm we managed to get very good prediction precision. The result can be used to trigger notifications if the real time power deviates from the predicted (expected) value by a certain percentage. It can also be used for planning major projects, maintenance, and down times. PI Server, PI ACE, PI SDK, and MATLAB were used to accomplish the results.

Please share your thoughts on the general topic as well as any specific tools you use or analytical needs of value to your business.

I was just discussing this type of concept at a customer site last week when they brought up predictions and R. They don’t directly use it (yet), but did mention the fact that a BI reporting tool they have licensed leverages R under the hood.

My introduction to MATLAB over the past couple of years has been a great experience, especially once I got my DirectAccess to PI toolbox worked out. There is limitless untapped potential for the application of data contained in PI when combined with these types of analyses.

R appears to provide a interesting alternative to MATLAB, and its open source (free!) nature tells me that I need to download it and add it to my toy box!