I am still working with PI and R and looking for opportunities to leverage the analytical capabilities of R in PI AF and EF. IMO the PI database is now primarily used to store heterogenous (aka not evenly spaced) time series such as sensor data. AF and EF provide the abstraction layer that allow data scientist easy access. So I try to frame every problem into:

a) equipment context

b) time context

This allows for context specific calculation, that are generic enough to find wide spread applications.

One of the problem I have been working on is the calculation of the peak asymmetry factor. This metric is used in biotech to verify that the column packaging is sufficient for the purification step. The calculation is pretty simple:

Af=b/a

where a is the front half width at 10% peak maximum and b is the back half width at 10% peak maximum.

If the factor is 1, the curve is Gaussian shaped: Gaussian function - Wikipedia, the free encyclopedia

Deviation of 1 can indicate problems with column packing or column degradation. The easiest way to frame the analysis is to create event frames for each peak in AF Analysis. The start and end condition can be based on a threshold that can be calculated from historical values. The calculation can then be simply performed using a custom data reference:

The data reference also adds 4 attributes to log if and when the calculation has been performed, so this value has to be only calculated once.

This works well and might also cover most use cases. But this is also a very crude metric of the peak shape and can be susceptible to noise level, sampling rate and compression settings. A more precise measurement is to fit the actual empirical function to the data points. This could be done in NET and therefore a custom data reference, but a stat package such as R is much better suited for these problems.

The most commonly used function for fitting chromatography peak is the Exponential Modified Gaussian (short EMG). I tried this function but unfortunately it didn't work well. The function is numerically unstable for very low asymmetries and I got a lot of problems during the optimization. A much better choice is the Exponential-Gaussian Hybrid (short EGH): http://acadine.physics.jmu.edu/group/technical_notes/GC_peak_fitting/X_lan_Jorgenson.pdf

The function can be written in R as follows:

egh <- function(t, H, tr, sigmag, tau) { result <- rep(0, length(t)) index <- which(2*sigmag^2 + tau*(t-tr)>0) result[index] <- H*exp(-(t[index]-tr)^2/(2*sigmag^2+tau*(t[index]-tr))) return(result) }

where

t : time vector

H : amplitude

tr : peak retention

sigmag : Gaussian standard deviation

tau : exponential time constan

The fitting requires an optimizer and I found the out-of-the-box to be sufficient. The code looks as follows:

egh_optim <- function(x, y, par, lower, upper) { dat=data.frame(x=x,y=y) # fitting routine egh_penalty <-function(data, par)(sum((egh(data$x,par[1],par[2],par[3],par[4])-data$y)^2)) fitting <- optim(par = par, egh_penalty, data = dat,lower=lower, upper=upper, control=list(maxit=1000,trace=0),method="L-BFGS-B") # calculate predicted values ypred <- egh(x,fitting$par[1],fitting$par[2],fitting$par[3],fitting$par[4]) # find peak maximum H <- fitting$par[1] tr <- fitting$par[2] sigma <- fitting$par[3] tau <- fitting$par[4] inverse <-function(x,y)(sum((y-egh(x,H,tr,sigma,tau))^2)) l10 <-optimize(inverse,c(tr-5*(sigma+tau),tr),tol = 0.0001, y = 0.1*H,maximum = FALSE) r10 <-optimize(inverse,c(tr,tr+5*(sigma+tau)),tol = 0.0001, y = 0.1*H,maximum = FALSE) asymmetry <- (tr-l10$minimum)/(r10$minimum-tr) # collect results and return result <- list(par=fitting$par, value=fitting$value, convergence=fitting$convergence, ypred=ypred, peak_max=fitting$par[1], asymmetry=asymmetry) }

This works well if the optimizer gets good starting conditions and boundaries from historical data. In testing the error was insignificant and the optimizer found the right values for a wide range of settings.

To put the system together requires a windows service that facilitates the data exchange between EF and R as described in R Statistic in Manufacturing

The results can be simply be written back to PI and be used for Statistical Process Control:

In summary it doesn't take too much to transition from a very simple metric such as peak asymmetry to a curve fitting solution in R. The additional parameter such as goodness of fit, peak area and retention time can be used for further SQC or notification. I hope this blog post is a good starting point for other developments ...