Skip navigation
All People > ernstamort > Holger Amort's Blog > 2018 > March
2018
ernstamort

R Calculation Server Design

Posted by ernstamort Mar 21, 2018

An R scripting engine for PI is really powerful way to apply R time series libraries to real time data. One of the challenges of using the R libraries out-of-the box is that they are written for evenly spaced data points, but sensor data is often unevenly spaced. This requires interpolation of data points to retrieve an evenly spaced vector.

 

In my last blog I showed the Custom Data Reference to configure a calculation.

 

The question is how do you provide interpolated values in real time in a scalable solution. One approach would be to use the Rx libraries to create Observables to buffer data, interpolate them and send them to the R engine for calculation. There are already several articles on VCampus describing how to use Rx with the AF-SDK for example:

 

Async with PI AF SDK: From Tasks to Observables

 

The Rx library is a great way to handle data in motion and excellent fit for PI data.

 

The task to provide real time interpolations to R can be broken down into the following tasks:

  1. Subscribe to PI data
  2. Provide rolling windows for one or more variables
  3. Sample the stream based on the polling rate; polling rate = 0 would be event based
  4. Group the stream by variable; here you would transition from Observable to List
  5. Perform Interpolation
  6. Calculate R
  7. Send results to PI

 

Each task is one layer, so if you would create for example 2 windows of different sizes and use "variable1" you would then only subscribe to one data feed.

Therefore the only load on the PI server itself is to provide real time data stream.

 

A marble diagram  shows the main operation on the data stream (here is a nice website to play with Rx operator):

 

 

The operations are mostly out-of-the box functionality of the Rx library. The interpolation uses the previous value for the left side boundary and the current value for the right side boundary.

 

When you put this all together the question is: So how fast is this and how scalable is this?

 

The second question is easier to answer in that Rx was designed for this type of application, so by abstracting away the hard part of threading, locking, syncing, scaling up seems almost too easy. There have been some question raised about the performance though:

 

c# - Reactive Extensions seem very slow - am I doing something wrong? - Stack Overflow

 

I did some measurements and found for interpolating 100 data points, it takes ~ 200 microseconds. Considering that the R script execution time in the range of milliseconds, this is really negligible.

 

In summary, complex event processing (CEP) or Rx which are designed to handle large data streams are ideal to perform real time calculations. Vector based calculations require an additional fast interpolation step that can be integrated in the data flow. By placing the buffering and interpolation on the calculation engine, the PI server is stripped on the additional load of providing a vast number of very similar interpolations.

Time series data are a very good application in R, so naturally the OSIsoft PI database is a good fit as a data source. Getting PI or AF data into R requires some plumbing and flattening of the data structures. The following library takes care of this:

 

ROSIsoft, OSIsoft PI and R - Version 2

 

But how do you deploy a model? How do you get inputs and outputs from a script engine from and to PI? And this in real time ...

 

I have been working on this problem for a while now and this goes beyond the plumbing part: Engineers have to be able to quickly develop and deploy models and work within the context of their AF and EF models. A graphical user interface needs to facilitate the process and allow to visualize the calculation.

 

There is also a need to work with time series vectors\matrices instead of single value calculations. Time series vectors\matrices can feed into forecasting and multivariate models and allow for much more sophisticated models or solutions. Calculations also need to support several in and outputs, current and future values.

 

Most important IMO is that any solution should also integrate into the existing AF and EF architecture. AF currently offers Custom Data References (CDR) to extend the existing functionality. Here are a couple of great references:

 

Implementing the AF Data Pipe in a Custom Data Reference

Developing the Wikipedia Data Reference - Part 1

Developing the Wikipedia Data Reference - Part 2

 

For this project I used the CDR just as a configuration and modeling environment. The calculation will be performed on a R Data Server, which will be described in my next blog post.

 

The "R Calculation Engine" CDR can be selected as any other reference:

 

The first step in the calculation is to frame the analysis by creating an element template with the input and output attributes. Then the calculation can be configured using the following GUI:

 

Each attribute is mapped to a R Variable (in- and output). The time stamp is translated to a POSIXct data type and the value to double precision. The data can be used in R as follows:

 

     Variable$Time : Array of time stamps

     Variable$Value : Array of values

 

The model can interactively be developed and is executed in an R Engine. The results are translated back to AFValues type and displayed in the plot window.

 

Once the model is developed the model settings are stored in a JSON string. As mentioned earlier, this custom data reference allows the configuration of the calculation. Real time and historical calculation are better performed as separate service, which I will describe in my next blog.