
Advanced AF Analysis; CalculateInR data reference
Posted by ernstamort in Holger Amort's Blog on Feb 9, 2017 6:45:48 PMAF Analysis is a pretty powerful tool and covers the majority of use cases. But there are also situations, where more advanced solutions are required.
The following is a common request: Basic Linear Regression: Slope, Intercept, and R-squared
And although the math is not difficult and code snippets are available on the internet, I don't think its a good idea to create modules just for that specific need. There will always something missing.
You start with linear regression,
then you need something to clean up outliers and missing values,
maybe you want to robustify your calculations,
a non linear function might give you a better fit,
one variable might not enough to describe your data
...
Solutions for these problems are easy in MATLAB and R since they have already plenty of libraries available. The following describes how to build a R custom data reference.
Prerequisites
You need Visual Studio and an 64 bit R installation: Microsoft R Open: The Enhanced R Distribution · MRAN
Setup a standard class library, similarly to this post: Implementing the AF Data Pipe in a Custom Data Reference
In addition you will need the following nuget packages
https://www.nuget.org/packages/R.NET.Community/
NuGet Gallery | Costura.Fody 1.3.3
There are alternatives for both nugets, but I just tested this combination.
And of course a standard AF client installation.
Its also useful to include a logger, I really like the following: Simple Log - CodeProject
In the project set up you need to set the bitness of the library to x64.
To automate the testing I used the following: Developing the Wikipedia Data Reference - Part 2
which I changed to the x64 deployment.
Before you start I would also execute RSetReg.exe in the R home directory.
The Code
We first need to set the config string:
public override string ConfigString { get { return $"{AttributeName};" + $"{WindowSizeInSeconds};" + $"{NoOfSegments};" + $"{RFunctionName}"; } set { if (value != null) { string[] configSplit = value.Split(';'); AttributeName = configSplit[0].Trim('\r', '\n'); WindowSizeInSeconds = Convert.ToDouble(configSplit[1]); NoOfSegments = Convert.ToInt32(configSplit[2]); RFunctionName = configSplit[3].Trim('\r', '\n'); SaveConfigChanges(); } } }
The idea is to have a function based on a source attribute that is executed on window size with a number of points
In the property setter you can already set the attribute:
private string _AttributeName;
public string AttributeName { private set { if (_AttributeName != value) { _AttributeName = value; // get the referenced attribute var frame = Attribute.Element as AFEventFrame; var element = Attribute.Element as AFElement; if (element != null) SourceAttribute = element.Attributes[_AttributeName]; SaveConfigChanges(); } } get { return _AttributeName; } } public AFAttribute SourceAttribute { private set; get; } public double WindowSizeInSeconds { private set; get; } public int NoOfSegments { private set; get; } public string RFunctionName { private set; get; } private REngine engine { get; set; }
Next we add the R engine in the constructor:
// initialize REngine public CalculateInR() { // set up logger string pathAppData = Environment.GetFolderPath(Environment.SpecialFolder.CommonApplicationData); SimpleLog.SetLogDir(pathAppData + @"\CalculateInR", true); SimpleLog.SetLogFile(logDir: pathAppData + @"\CalculateInR", prefix: "CalculateInR_", writeText: false); SimpleLog.WriteText = true; try { // create R instance - R is single threaded! REngine.SetEnvironmentVariables(); engine = REngine.GetInstance(); // set working directory engine.Evaluate("setwd('C:/Source/ROSIsoft')"); // source the function engine.Evaluate("source('Regression.R')"); // might need to install and load libraries in R } catch (Exception ex) { SimpleLog.Error(ex.Message); } }
At minimum you would need to set the working directory and source your R code. For more advanced calculation you also might need to install\load libraries.
Next we need to build the helper method to send the values to R and get the results back:
private double ExecuteRFunction(AFValues values) { var vector = engine.CreateNumericVector(values.Select(n => (n.IsGood)?n.ValueAsDouble():Double.NaN).ToArray()); // make symbol unique; R is single threaded and share the variable space var uniquex = "x" + Attribute.ID.ToString().Replace("-", ""); var uniquer = "r" + Attribute.ID.ToString().Replace("-", ""); // set symbol engine.SetSymbol(uniquex, vector); // perform calculation string executionString = uniquer + "<-" + RFunctionName + "(" + uniquex + "," + WindowSizeInSeconds + "," + NoOfSegments + ")"; double result; try { result = engine.Evaluate(executionString).AsNumeric()[0]; } catch (Exception ex) { SimpleLog.Error(ex.Message); result = Double.NaN; } return result; } private AFValues CreateVector(DateTime endTime) { var timeRange = new AFTimeRange(endTime - TimeSpan.FromSeconds(WindowSizeInSeconds), endTime); AFTimeSpan span = new AFTimeSpan(TimeSpan.FromSeconds(timeRange.Span.TotalSeconds / NoOfSegments)); return SourceAttribute.Data.InterpolatedValues(timeRange, span, null, "", true); } s prett}y much follows the examples here: Basic types with R.NET | R.NET -- user version
Since R is single threaded and different instances share the same variable space, I would recommend to make the R variables unique. I measured the execution time from .NET and it took ~ 1 ms. This of course depends on the type of calculation you perform. There is also some overhead on the PI side when requesting interpolated values.
Next we need to define the GetValue and GetValues methods:
public override AFValue GetValue(object context, object timeContext, AFAttributeList inputAttributes, AFValues inputValues) { var currentContext = context as AFDataReferenceContext?; var endTime = ((AFTime?)timeContext)?.LocalTime ?? DateTime.Now; // get the function result from R var values = CreateVector(endTime); return new AFValue(null, ExecuteRFunction(values), endTime); } public override AFValues GetValues(object context, AFTimeRange timeRange, int numberOfValues, AFAttributeList inputAttributes, AFValues[] inputValues) { AFValues values = new AFValues(); DateTime startTime = timeRange.StartTime.LocalTime; DateTime endTime = timeRange.EndTime.LocalTime; // loop through the timeRange double span = (endTime - startTime).TotalSeconds; for (var index = 0; index < numberOfValues; index++) { var tmpValues = CreateVector(startTime + TimeSpan.FromSeconds(index * span)); values.Add(new AFValue(null, ExecuteRFunction(tmpValues), endTime)); } return values; }
Then we just populate the data methods using https://techsupport.osisoft.com/Downloads/File/5cbefb97-d253-46dd-b369-f36cda374e47
and create the data pipe using Daphne Ng code.
So now we have custom data reference that can execute a function that takes the following inputs: x,WindowSizeInSeconds and NoOfSegments
In R we can develop the code for the linear regression, which is basically just calling the lm-function. I believe this is included ion the standard installation. Since for this example we are only interested in the slope the R wrapper code looks as follows:
regression <- function(x,WindowSizeInSeconds,NoOfSegments) { span<-WindowSizeInSeconds/NoOfSegments lm(seq(0,WindowSizeInSeconds,span)~x)$coefficients[2] }
After registration, we can use the CDR in AF Analysis, which provides the all the plumbing to call the CDR based on point updates.
Result
Here is the result of 10 min average a linear regression with a 10 min window of a fast moving 1h sinusoid:
Comments