In almost every project that I worked on, there has been discussion on how to correctly configure the exception and compression deviation settings. A common approach is to estimate the compression deviation and then divide it by a factor of two to calculate the exception deviation. I have often thought there must be a more objective way to optimize the compression. So I came up with the following approach.
The PI compression is a two-step process:
Step 1 on the Interface: Data Reduction by Exception Deviation. The resulting data are called “SnapShots”.
Step 2 on the PI Server: Data Reduction by Compression Deviation. The resulting data are called “Archived Values”.
When a client application requests a historical data point from the archive, the PI server calculates this value by interpolating between archived values. The error of this interpolation is called prediction error and can be estimated using cross validation. Cross validation differs from the more commonly known fitting error (standard deviation) in that it quantifies the error of unknown data, not how well a model fits existing data.
To calculate the cross validation estimate, I used the
Off-line exception algorithm,
Off-line compression algorithm, and a
Modified version of the leave-one-out cross-validation (LOOCV).
As an example, I calculated a sinusoid (1000 data point, sampling rate = 1 pt/sec, frequency = 1E-3 Hz, amplitude = 100, noise = 2%) and then calculated the compression using deviation settings from 0 to 10. The standard deviation and cross validation errors are shown in the plot below:
The cross validation curve shows that the error in prediction first decreases. This is due to the increased window size of the interpolation, which more accurately reflects the underlying sinusoid. At one point this benefit diminishes and there are just not enough data points for the interpolation to accurately follow the curve. In this example, the minimum cross validation error was at a compression deviation of 3.6, which reduced the number of data points by 71%.
It is interesting to note that data compression can have a beneficial effect on the prediction error. Therefore data reduction and data quality don’t always have to be diametrical goals. Cross validation is an objective metric to optimize the compression settings and should be used to access the prediction error of the system. Although this is a computationally expensive procedure (~0.45 sec/cross validation) it is still fast enough to be integrated into an automated solution.