An R scripting engine for PI is really powerful way to apply R time series libraries to real time data. One of the challenges of using the R libraries out-of-the box is that they are written for evenly spaced data points, but sensor data is often unevenly spaced. This requires interpolation of data points to retrieve an evenly spaced vector.
In my last blog I showed the Custom Data Reference to configure a calculation.
The question is how do you provide interpolated values in real time in a scalable solution. One approach would be to use the Rx libraries to create Observables to buffer data, interpolate them and send them to the R engine for calculation. There are already several articles on VCampus describing how to use Rx with the AF-SDK for example:
The Rx library is a great way to handle data in motion and excellent fit for PI data.
The task to provide real time interpolations to R can be broken down into the following tasks:
- Subscribe to PI data
- Provide rolling windows for one or more variables
- Sample the stream based on the polling rate; polling rate = 0 would be event based
- Group the stream by variable; here you would transition from Observable to List
- Perform Interpolation
- Calculate R
- Send results to PI
Each task is one layer, so if you would create for example 2 windows of different sizes and use "variable1" you would then only subscribe to one data feed.
Therefore the only load on the PI server itself is to provide real time data stream.
A marble diagram shows the main operation on the data stream (here is a nice website to play with Rx operator):
The operations are mostly out-of-the box functionality of the Rx library. The interpolation uses the previous value for the left side boundary and the current value for the right side boundary.
When you put this all together the question is: So how fast is this and how scalable is this?
The second question is easier to answer in that Rx was designed for this type of application, so by abstracting away the hard part of threading, locking, syncing, scaling up seems almost too easy. There have been some question raised about the performance though:
I did some measurements and found for interpolating 100 data points, it takes ~ 200 microseconds. Considering that the R script execution time in the range of milliseconds, this is really negligible.
In summary, complex event processing (CEP) or Rx which are designed to handle large data streams are ideal to perform real time calculations. Vector based calculations require an additional fast interpolation step that can be integrated in the data flow. By placing the buffering and interpolation on the calculation engine, the PI server is stripped on the additional load of providing a vast number of very similar interpolations.