Article,

Missing data methods in PCA and PLS: Score calculations with incomplete observations

, , and .
Chemometrics and Intelligent Laboratory Systems, 35 (1): 45 - 65 (1996)
DOI: DOI: 10.1016/S0169-7439(96)00007-X

Abstract

A very important problem in industrial applications of PCA and PLS models, such as process modelling or monitoring, is the estimation of scores when the observation vector has missing measurements. The alternative of suspending the application until all measurements are available is usually unacceptable. The problem treated in this work is that of estimating scores from an existing PCA or PLS model when new observation vectors are incomplete. Building the model with incomplete observations is not treated here, although the analysis given in this paper provides considerable insight into this problem. Several methods for estimating scores from data with missing measurements are presented, and analysed: a method, termed single component projection, derived from the NIPALS algorithm for model building with missing data; a method of projection to the model plane; and data replacement by the conditional mean. Expressions are developed for the error in the scores calculated by each method. The error analysis is illustrated using simulated data sets designed to highlight problem situations. A larger industrial data set is also used to compare the approaches. In general, all the methods perform reasonable well with moderate amounts of missing data (up to 20% of the measurements). However, in extreme cases where critical combinations of measurements are missing, the conditional mean replacement method is generally superior to the other approaches.

Tags

Users

  • @vivion

Comments and Reviews