Ahmad Fattahi

PI System Enables Data Science

Blog Post created by Ahmad Fattahi Employee on Mar 31, 2017

Data and community are two major assets that cannot be invented or disrupted as quickly as technology. At the OSIsoft Users Conference 2017 in San Francisco I had the opportunity to witness firsthand how the PI community is pushing the envelope for a smarter manufacturing world. In the process they have at their disposal decades of real data sitting across several thousands of PI Systems around the world. An ever-increasing number of organizations are making good on the promise of turning time-series process data into decision-ready and predictive knowledge. Below is a short summary of my observations.


Activities and opportunities


  • Hackathon: we offered real operational data from Barrick Gold, the biggest gold producer in the world. The dataset spanned 6 months of sensor measurements from their massive haul trucks. 60 data streams from 30+ trucks made this a rich dataset. The trucks are big parts of Barrick’s operational cost. Each of these giants cost $4MM while one tire costs $50K. Their miles per gallon (mpg) is 0.3. That gives you an idea how costly this operation is and how efficiency can be vital to any mine operator like Barrick. When a truck is full the value of gold waiting to be extracted is $60K. It means if a truck goes down without notice a significant amount of capital can sit in the middle of nowhere for days before fix arrives.

Our hackers took advantage of the opportunity and built very innovative ideas. As a judge I was struck by the level of quality and maturity of the submissions in 23 hours. Several teams focused on advanced analytics and data as the cornerstone of their submission while others focused on software development. Most notably, the winning team merged machine learning with social engineering to design a system where drivers would earn points by driving “well”. And “well” was learned through sifting through sensor data, the strains and temperatures across the truck body, as well as geospatial qualities of the road.


  • Partners and customers: several OSIsoft partners and customers flexed their muscles around data science with PI data. We enjoyed several hands-on labs and presentations on the topic ranging from real customer stories to educational pieces on how to pull off a successful machine learning project with PI data. Most notably “United Utilities” (UK) presented how they built a demand forecasting engine which is critical to serving water to their customers efficiently. “Total” showcased how they use data form PI System to build and deploy a model and predict the percentage of gasoil in the residues of their distillation tower. Many conversations I had with attendees all point to the acceleration of more advanced analytics and data science in the PI world. This is all exciting because it shows how much business opportunity is out there waiting to be tapped.




Like everything else in the world the nice benefits don’t come for free. There are still significant challenges along the way:


  • Quality of the data: throughout the event a constant theme was challenges around data quality. While it’s typical to immediately focus on the machine learning algorithm or architecture it is evident that industrial data can be messy, vague, or flat out nonsense. The nature of these data sources and their paths to server make them susceptible to sensor error, noise contamination, process errors, mistakes in units of measure, unlogged changes over time, and lack of context to name a few. A significant amount of effort has to be spent on vetting, reshaping, cleaning, and reformatting data before a machine learning algorithm can be applied. Building a diverse and broad team of data scientists and subject matter experts seem to be the right strategy to alleviate such pain points.
  • Cultural and governance issues: preserving and sharing data may not be as easy as the technology that enables it. The industrial community has a long history of protecting itself against all sorts of malicious attacks and innocent mistakes, hence isolating itself from the rest of the world. The new needs and opportunities call for easing up some of the traditional requirements while guaranteeing security. It takes a significant cultural shift on top of technological and security advancements. Besides, addressing the data quality issues mentioned above takes a change in the mindset across the organization from top to bottom. To top this off with yet another layer, in many cases data is comprised of elements sitting in different jurisdictions which makes data sharing and aggregation even more challenging.


The opportunities for data science and machine learning in the PI world abound as do the challenges. However, challenges are nothing that we can’t overcome with the power of our smart and energetic community. All the signs are pointing to a wave of industrial organizations investing serious capital and resources in this area. After all it may be the differentiator between the survivors and failures of the coming decade. I ask of anyone in this community to share their thoughts, experiences, challenges, and ideas in this field. We at OSIsoft are committed to push this forward with your help.