Skip navigation
All People > ernstamort > Holger Amort's Blog > 2017 > February
2017

AF Analysis is a pretty powerful tool and covers the majority of use cases. But there are also situations, where more advanced solutions are required.

The following is a common request: Basic Linear Regression: Slope, Intercept, and R-squared

 

And although the math is not difficult and code snippets are available on the internet, I don't think its a good idea to create modules just for that specific need. There will always something missing.

     You start with linear regression,

     then you need something to clean up outliers and missing values,

     maybe you want to robustify your calculations,

     a non linear function might give you a better fit,

     one variable might not enough to describe your data

     ...

Solutions for these problems are easy in MATLAB and R since they have already plenty of libraries available. The following describes how to build a R custom data reference.

Prerequisites

 

You need Visual Studio and an 64 bit R installation: Microsoft R Open: The Enhanced R Distribution · MRAN

Setup a standard class library, similarly to this post: Implementing the AF Data Pipe in a Custom Data Reference

In addition you will need the following nuget packages
    https://www.nuget.org/packages/R.NET.Community/

    NuGet Gallery | Costura.Fody 1.3.3

 

There are alternatives for both nugets, but I just tested this combination.

And of course a standard AF client installation.

Its also useful to include a logger, I really like the following:  Simple Log - CodeProject

 

In the project set up you need to set the bitness of the library to x64.

 

To automate the testing I used the following: Developing the Wikipedia Data Reference - Part 2

which I changed to the x64 deployment.

 

Before you start I would also execute RSetReg.exe in the R home directory.

 

The Code

 

     We first need to set the config string:

        public override string ConfigString
        {
            get
            {
                return $"{AttributeName};" +
                       $"{WindowSizeInSeconds};" +
                       $"{NoOfSegments};" +
                       $"{RFunctionName}";
            }
            set
            {
                if (value != null)
                {
                    string[] configSplit = value.Split(';');
                    AttributeName = configSplit[0].Trim('\r', '\n');
                    WindowSizeInSeconds = Convert.ToDouble(configSplit[1]);
                    NoOfSegments = Convert.ToInt32(configSplit[2]);
                    RFunctionName = configSplit[3].Trim('\r', '\n');
                    SaveConfigChanges();
                }
            }
        }


 

The idea is to have a function based on a source attribute that is executed on window size with a number of points

In the property setter you can already set the attribute:

 

        private string _AttributeName;
        public string AttributeName
        {
            private set
            {
                if (_AttributeName != value)
                {
                    _AttributeName = value;
                    // get the referenced attribute
                    var frame = Attribute.Element as AFEventFrame;
                    var element = Attribute.Element as AFElement;
                    if (element != null) SourceAttribute = element.Attributes[_AttributeName];
                    SaveConfigChanges();
                }
            }
            get { return _AttributeName; }
        }
        public AFAttribute SourceAttribute { private set; get; }
        public double WindowSizeInSeconds { private set; get; }
        public int NoOfSegments { private set; get; }
        public string RFunctionName { private set; get; }
        private REngine engine { get; set; }



 

Next we add the R engine in the constructor:

 

        // initialize REngine
        public CalculateInR()
        {
            // set up logger
            string pathAppData = Environment.GetFolderPath(Environment.SpecialFolder.CommonApplicationData);
            SimpleLog.SetLogDir(pathAppData + @"\CalculateInR", true);
            SimpleLog.SetLogFile(logDir: pathAppData + @"\CalculateInR",
                prefix: "CalculateInR_", writeText: false);
            SimpleLog.WriteText = true;
            try
            {
                // create R instance - R is single threaded!
                REngine.SetEnvironmentVariables();
                engine = REngine.GetInstance();
                // set working directory
                engine.Evaluate("setwd('C:/Source/ROSIsoft')");
                // source the function
                engine.Evaluate("source('Regression.R')");
                // might need to install and load libraries in R
            }
            catch (Exception ex)
            {
                SimpleLog.Error(ex.Message);
            }



}    

At minimum you would need to set the working directory and source your R code. For more advanced calculation you also might need to install\load libraries.

 

Next we need to build the helper method to send the values to R and get the results back:

 

        private double ExecuteRFunction(AFValues values)
        {
            var vector = engine.CreateNumericVector(values.Select(n =>
            (n.IsGood)?n.ValueAsDouble():Double.NaN).ToArray());
            // make symbol unique; R is single threaded and share the variable space
            var uniquex = "x" + Attribute.ID.ToString().Replace("-", "");
            var uniquer = "r" + Attribute.ID.ToString().Replace("-", "");
            // set symbol
            engine.SetSymbol(uniquex, vector);
            // perform calculation
            string executionString = uniquer + "<-" + RFunctionName +
                                     "(" + uniquex + "," + WindowSizeInSeconds + "," + NoOfSegments + ")";
            double result;
            try
            {
                result = engine.Evaluate(executionString).AsNumeric()[0];
            }
            catch (Exception ex)
            {
                SimpleLog.Error(ex.Message);
                result = Double.NaN;
            }
            return result;
            
        }
        private AFValues CreateVector(DateTime endTime)
        {
            var timeRange = new AFTimeRange(endTime - TimeSpan.FromSeconds(WindowSizeInSeconds), endTime);
            AFTimeSpan span = new AFTimeSpan(TimeSpan.FromSeconds(timeRange.Span.TotalSeconds / NoOfSegments));
            return SourceAttribute.Data.InterpolatedValues(timeRange, span, null, "", true);
        }



s prett}y much follows the examples here: Basic types with R.NET  | R.NET -- user version

Since R is single threaded and different instances share the same variable space, I would recommend to make the R variables unique. I measured the execution time from .NET and it took ~ 1 ms. This of course depends on the type of calculation you perform. There is also some overhead on the PI side when requesting interpolated values.

 

Next we need to define the GetValue and GetValues methods:

        public override AFValue GetValue(object context, object timeContext, AFAttributeList inputAttributes,
            AFValues inputValues)
        {
            var currentContext = context as AFDataReferenceContext?;
            var endTime = ((AFTime?)timeContext)?.LocalTime ?? DateTime.Now;
            // get the function result from R
            var values = CreateVector(endTime);
            return new AFValue(null, ExecuteRFunction(values), endTime);
        }
        public override AFValues GetValues(object context, AFTimeRange timeRange, int numberOfValues,
            AFAttributeList inputAttributes, AFValues[] inputValues)
        {
            AFValues values = new AFValues();
            DateTime startTime = timeRange.StartTime.LocalTime;
            DateTime endTime = timeRange.EndTime.LocalTime;
            // loop through the timeRange
            double span = (endTime - startTime).TotalSeconds;
            for (var index = 0; index < numberOfValues; index++)
            {
                var tmpValues = CreateVector(startTime + TimeSpan.FromSeconds(index * span));
                values.Add(new AFValue(null, ExecuteRFunction(tmpValues), endTime));
            }
            return values;
        }




 

Then we just populate the data methods using https://techsupport.osisoft.com/Downloads/File/5cbefb97-d253-46dd-b369-f36cda374e47

and create the data pipe using Daphne Ng code.

 

So now we have custom data reference that can execute a function that takes the following inputs: x,WindowSizeInSeconds and NoOfSegments

In R we can develop the code for the linear regression, which is basically just calling the lm-function. I believe this is included ion the standard installation. Since for this example we are only interested in the slope the R wrapper code looks as follows:

 

regression <- function(x,WindowSizeInSeconds,NoOfSegments) {
  span<-WindowSizeInSeconds/NoOfSegments
  lm(seq(0,WindowSizeInSeconds,span)~x)$coefficients[2]
}

 

After registration, we can use the CDR in AF Analysis, which provides the all the plumbing to call the CDR based on point updates.

 

Result

 

Here is the result of 10 min average a linear regression with a 10 min window of a fast moving 1h sinusoid:

 

The more I read about R, the more interested I get. It really provides solutions for a wide range of problems. The R packages are well documented, have example to get you started and I found that the creators and authors are very approachable . There is a learning curve especially if you want to create some custom scripts, but at large you get more work out of it than you have to invest.

 

For PI user the most interesting packages are the time series applications. I can recommend the following:

 

https://cran.r-project.org/web/packages/forecast/forecast.pdf by Prof Rob Hyndman

https://cran.r-project.org/web/packages/robfilter/robfilter.pdf from TU Dortmund

and

https://cran.r-project.org/web/packages/xts/xts.pdf

 

One of the challenges is that R time series packages are mostly designed for homogeneous time series (identical point to point distance), but PI due to the compression algorithm stores heterogenous data series (varying point-to point distance). So, in almost all cases the data have to be re- and down sampled.

 

The following are common steps for down sampling

Data cleansing, outlier removal
Moving average
Resampling at lower rate

This process can be very resource intense, especially when applied in real-time. One approach is to use a set of algorithms based on the exponential moving average algorithm. The following is an excellent read:

 

https://www.amazon.com/Introduction-High-Frequency-Finance-Ramazan-Gen%C3%A7ay/dp/0122796713

And here are some R code examples:

http://www.eckner.com/papers/ts_alg.pdf

 

The last package you need is a R data access package – I attached ROSIsoft to the post.

 

Quick primer on ROSIsoft

 

The library is built using the rClr package and a wrapper dll. The wrapper dll is necessary to do the plumbing between .NET data and basic R types. I simplified the AF data models to make them more compatible with R.

 

Installation is done manually. In RStudio select Tools\Install packages … and when the open dialog opens, change the option “Install from:” to           “Packaged Archive File”

 

After the installation the library is loaded with: library(ROSIsoft)

(If you are missing a library or package, the process is always the same)

 

To connect to AF and PI server use first: AFSetup()

 

This will also install the rClr package, which is included in the ROSIsoft package.

 

All functions are documented in the help file, although I have to spend some more time on it. To connect to the PI server use the following:

 connector<-ConnectToPIWithPrompt("<PI Server>")

and

connector<-ConnectToAFWithPrompt("<AF Server>","AFDatabase")

 

The connector object contains information about the PI and AF server as well as their connection states. It’s also the only object that needs to be initiated, all other methods are static.

 

To get values just use the GetPIPointValues() function. It requires a retrieval type constant as string, which can be looked up with the GetRetrievalTypes() function.

To get some recorded values for the sinusoid (sinusoid1H is a faster moving sinusoid for testing) is then straightforward:

 

values <- GetPIPointValues("sinusoid1H","T+8H","T+10h","recorded",10)

 

Plotting requires the xts package to convert the string datetime into a R time object.

 

plot(xts(values$Vector,as.POSIXct((values$Time))),type="p")

 

which produces the following plot:

As I mentioned above, most R packages require homogeneous time series. Since I didn't find all the functions in R I added a couple of real time operators and also exception\compression functions:

 

     ApplyCompression: to apply different exception\compression settings to the time series

     CalculateEMA: calculate realtime exponential moving average

     CalculateMA: calculate realtime moving average
     CalculateMSD: calculate realtime moving standard deviation average

     CalculateZScore: calculate realtime moving zscore average; helpful for outlier detection and removal

     CalculateOutlier: outlier removal based on zscore

 

I will provide some data sets in upcoming posts that are a good starting point. Here is an example of using the ApplyCompression function on the same time series:

 

com<-ApplyCompression("PIExceptionAndCompression",0.02,0.06,val$Time,val$Vector)

 

and then the plot:

 

 

New version as of 08/06/2017