Industrial IoT time-series data engineering - a layered approach to data quality

Blog Post created by gopal on Jul 11, 2020

10-minute read


Data quality is a foundational prerequisite for data engineering in your digital transformation and data science/machine learning initiatives.  This is a "how-to"  for getting started and implementing a layered approach to data quality with the PI System. 



With streaming industrial sensor/IoT time-series data, a layered approach to data quality gives you the flexibility to apply fit-for-purpose techniques mapped to use case requirements. 


Whether it is an individual sensor (pressure, temperature, flow, vibration etc.) or an individual equipment such as a pump (with a few sensors), or a functional equipment group (FEG) i.e. a feed water system (made up of a group of equipment like motor/pump/valve/filter/pipe assembly) or an entire process unit or a plant – the scope of the data quality requirement has to be understood in its usage context.



 At the individual sensor level, validation checks for data quality are basic and easily done - for missing data (I/O timeout), flat-line data (sensor malfunction), out-of-range data, and similar.  Often, the gaps in data are due to issues in source instrumentation/control system, network connectivity, faulty data collection configuration (scan rate, unit-of-measure, exception/compression specs) etc. And, the recommended practice is to use a monitoring app (shown below as PI System Monitoring)  to proactively detect and correct them - instead of imputing values for the missed/erroneous readings using post-processing logic. 


Below are some screens from PI System Monitoring (PSM):







Sensor data quality issues (flat line, bad data, stale data...) that are beyond the logic in PI System Monitoring can still be trapped with simple analytics - see Chapter 9.  


Figure below shows the analytics related to flat line, bad data, stale data...: 


Next, for related sensors in an individual equipment (motor, pump, heat-exchanger, ...), simple math/physics based analytics/correlations and inferred metrics such as chiller kW/ton, pump head/flow etc.can be applied to cross-validate data quality.  

Also, using the PctGood (percent good) in the time-weighted analytics provides a way to ensure that bad or missing data is not propagated to dependent calculations - most commonly, aggregated metrics such as totals, hourly averages, and other such statistics.  And, you can use simple display features to visually differentiate between good (black-on-white) and not-good (white on black) data - see the example from NRC (Nuclear Regulatory Commission) below.



For FEGs (Functional Equipment Group) such as a feed water system, an air handler etc., with 10s or 100s of sensors or an entire process unit/plant with 1000s of continuous measurements, data quality validation requires multivariate logic. Even with healthy source systems and data collection infrastructure reporting apparently “good” sensor readings, the inconsistencies across multiple sensors can only be inferred at a system level using analytical data models, and when required, connected flow models with mass/energy balance.  


To illustrate a FEG, consider an air-handler – part of an HVAC (heating, ventilation and air condition) in a  building management system (BMS).  Sensor data for Outside Air Temperature, Relative Humidity, Mixed Air Temperature, Supply Air Temperature, Damper Position, Chilled Water Flow, Supply Air Flow, Supply Air Fan Power etc. are available, and we walk through a data driven (machine learning) analytical model to cull out “bad sensor” data...more.


In another use case, a pre-heater example illustrates the need for a connected flow model. The stack flue gas O2 (oxygen) measurements are unusable until reconciled with mass/energy balance corrections.


A steam cycle is another use case with a connected flow model to validate the numerous sensor readings via mass and heat balance. 



To recap, a layered approach to data quality includes:

  • PI System Monitoring ...more
  • Simple analytics to validate (individual) sensor data streams ...more
  • Simple analytics to validate at an equipment level (several related sensors)
  • Advanced (multivariate/machine learning) analytics to validate at a FEG (functional equipment group) level - tens or hundreds of sensors ...more
  • Advanced (connectivity model based) analytics - data reconciliation with mass/energy balance (one or more process units) ...more


Data quality is a foundational prerequisite for data engineering in your digital transformation and data science/machine learning initiatives, and a layered approach as discussed in this blog can be helpful.