This post is a follow up to Marco's R blog:
Manufacturing facilities are creating large amounts of real time data that are often historized for visualization, analytic and reporting. The amount of data is staggering and some companies spend numerous resources for the upkeep of the soft and hardware and data securement. The driving force behind all these efforts is the common believe that data are valuable and contain the information for future process improvements.
The problem has been that advanced data analytic requires tools that go beyond the capabilities of spreadsheet programs such as Excel, which is still the preferred option for calculation in manufacturing. The alternatives are software packages that specialize on statistical computation and differ in price, capability and steepness of the learning curve.
One of these solutions, the R program, has become increasingly popular and is now actively supported by Microsoft . R has been open source from the beginning and the fact that it is freely available has drawn a wide community to use it as primary statistical tool. But there are more important factors why it will also be very successful in Manufacturing data analytics:
- R works with .NET: There are two projects that allow interoperability between R and Net called R.NET and RCLR.
- R provides a huge number of R packages (6,789 on June 18th, 2015 8,433 Windows packages as of June 6th, 2016), which are function libraries with specific focus. The package 'qcc' , for example, is an excellent library for univariate and multivariate process control.
- According to the 2015 Rexer Data Miner Survey 76% of analytic professionals use R and 36% use it as their primary analysis tools, which makes R by far the most used analytical tool.
- Visual Studio now supports R with support for debugging and Intellisense. Visual Studio is a very popular Integrated Development Environment (IDE) for NET programmers and will make it easier for developers to start programming in R.
- R's large user base help to review and validate packages.
- The large number of users in Academia leads to the fast release of cutting edge algorithms.
The following are three examples of using R analysis in combination with the OSIsoft PI historian (+ Asset and Event Framework).
Example 1: Process Capabilities
Fig.1 Process capability of QC parameter
Data were loaded using RCLR+OSIsoft Asset Framework SDK
Analysis shows that lower control limit will lead to a rejection rate of 0.5% (CpL < 1.0)
Example 2: Principal Component Analysis of Batch Temperature Profiles
Fig.2 PCA Biplot
Data and Batch Event Frames were loaded using RCLR+OSIsoft Asset Framework SDK
There are only 3 underlying factors that account for 85% of the variance. The biplot shows the high correlation between the different variables which is typical for batch processes.
Fig.3 Predicted Batch Profile using PCA
Blue line are measured values and the green line is the predicted remainder of the batch
The results of the R Analysis can also be used in real time for process analysis. In general, the process of model development and deployment is structured as follows:
In the model development phase, models such as SPC, MSPC, MPC, PCA or PLS are developed validated and finally stored in a data file. During the real time application or model deployment phase, new data are sent to R and the same model is used for prediction.
Fig.3 Single and Trend Process Control Limits
Control Limits are fed back to the historian – The dotted vertical line represents the current time
There is increasing gap in Manufacturing between the amount of data stored and the level of analysis being performed. The R statistical software package can close that gap by providing high level analysis of production data that are provided by historians such as OSIsoft PI. It provides a rich library of statistical packages that perform univariate and multivariate analysis and allows real time analytics.
- I was very surprised by the Rexer survey to see how popular R has become.
- Although R provides a lot of different packages, some tools for manufacturing analysis are still missing. Most notably for batch are real time alignment and optimization. One reason is that chemical engineers often use Matlab and there is a large code base available. Also some key journals only provide Maltlab examples (e.g. Journal of Chemometrics).
- R.NET is single threaded. This isn't a problem in the model development, but during run time this could lead to a bottle neck. I used a consumer-producer queue to enforce sequential calculation.
- rclr works fine and I didn't encounter any problems. In order to call the AFSKD from R still requires an additional layer in NET to flatten some of the objects to string or double array and some scripting in R to make the calls consistent with other libraries.
- Writing future values/forecast into PI tags worked perfectly - now you can work with forecasts over longer periods.
- I used elements as container for the model parameter, but this might not be the right way of organizing data.
- Same article was also published here.