Time series data are a very good application in R, so naturally the OSIsoft PI database is a good fit as a data source. Getting PI or AF data into R requires some plumbing and flattening of the data structures. The following library takes care of this:

ROSIsoft, OSIsoft PI and R - Version 2

But how do you deploy a model? How do you get inputs and outputs from a script engine from and to PI? And this in real time ...

I have been working on this problem for a while now and this goes beyond the plumbing part: Engineers have to be able to quickly develop and deploy models and work within the context of their AF and EF models. A graphical user interface needs to facilitate the process and allow to visualize the calculation.

There is also a need to work with time series vectors\matrices instead of single value calculations. Time series vectors\matrices can feed into forecasting and multivariate models and allow for much more sophisticated models or solutions. Calculations also need to support several in and outputs, current and future values.

Most important IMO is that any solution should also integrate into the existing AF and EF architecture. AF currently offers Custom Data References (CDR) to extend the existing functionality. Here are a couple of great references:

Implementing the AF Data Pipe in a Custom Data Reference

Developing the Wikipedia Data Reference - Part 1

Developing the Wikipedia Data Reference - Part 2

For this project I used the CDR just as a configuration and modeling environment. The calculation will be performed on a R Data Server, which will be described in my next blog post.

The "R Calculation Engine" CDR can be selected as any other reference:

The first step in the calculation is to frame the analysis by creating an element template with the input and output attributes. Then the calculation can be configured using the following GUI:

Each attribute is mapped to a R Variable (in- and output). The time stamp is translated to a POSIXct data type and the value to double precision. The data can be used in R as follows:

Variable$Time : Array of time stamps

Variable$Value : Array of values

The model can interactively be developed and is executed in an R Engine. The results are translated back to AFValues type and displayed in the plot window.

Once the model is developed the model settings are stored in a JSON string. As mentioned earlier, this custom data reference allows the configuration of the calculation. Real time and historical calculation are better performed as separate service, which I will describe in my next blog.

Hi Holger,

Very interesting and great work! So if i understand correctly, you actually run the R models in the R engine through your existing script engine and only use the CDR as a placeholder for configuration?

Then it puzzles me why to use a CDR in the first place. Why not use e.g. a specific attribute category and have the R engine just lookup each of these attributes to read it's configuration?

Of course we have MatLab integration on the roadmap, but i'm also a bit reluctant as that is very well scaleable but does not provide an easy way to integrate some custom analytics quickly. First you need to loop through procurement, add infrastructure and architecture, walk up a learning curve, etc. So i'm still interested in finding an easy way to do some custom analytics. Now we typically build a custom engine on AFSDK but that is always a pain in the architects eye...