Principal Component Analysis (PCA) is a well established technique in statistics used for exploratory data analysis. PCA is used for multivariate analyses and looks for correlation between the different variables and determines the combination of values that best captures and explains the variance in the system. These combined feature values construct a more compact feature space called the principal components.
Summary of Steps for Principal Component Analysis:
1. Generate the observation matrix
2. Subtract the mean from all the observations
3. Calculate the covariance matrix
4. Calculate the eigenvectors and eigenvalues of the covariance matrix
5. Sorting the eigenvalues and corresponding eigenvectors
6. Generate the feature vector based on the eigenvalues
7. The Principal component matrix is obtained by multiplying the Feature Vector with the Mean-Subtracted observation matrix.
In this Custom Symbol I render two plots:
1. Scree plot - a bar chart of the eigenvalues to show how most of the variance in the data can be explained by the first 2 principal components and the rest can be ignored as they are essentially zero.
2. PCA Biplot (Principal Component 1 vs Principal Component 2) - This plot shows how our 11 dimensional data set can be mapped on a 2D space via its first 2 principal components. This plot can further be used for anomaly detection in our dataset.
Extensively verify my math against other commercial statistical software.
Although this example has been hard-coded for the Oil Rigs dataset it is possible to make this symbol more universal with a little more development effort. As long there is an operational state to map the data against this analysis is pretty universal.