Main Content

processpca

Process columns of matrix with principal component analysis

Syntax

[Y,PS] = processpca(X,maxfrac)
[Y,PS] = processpca(X,FP)
Y = processpca('apply',X,PS)
X = processpca('reverse',Y,PS)
name = processpca('name')
fp = processpca('pdefaults')
names = processpca('pdesc')
processpca('pcheck',fp);

Description

processpca processes matrices using principal component analysis so that each row is uncorrelated, the rows are in the order of the amount they contribute to total variation, and rows whose contribution to total variation are less than maxfrac are removed.

[Y,PS] = processpca(X,maxfrac) takes X and an optional parameter,

X

N-by-Q matrix

maxfrac

Maximum fraction of variance for removed rows (default is 0)

and returns

Y

M-by-Q matrix with N - M rows deleted

PS

Process settings that allow consistent processing of values

[Y,PS] = processpca(X,FP) takes parameters as a struct: FP.maxfrac.

Y = processpca('apply',X,PS) returns Y, given X and settings PS.

X = processpca('reverse',Y,PS) returns X, given Y and settings PS.

name = processpca('name') returns the name of this process method.

fp = processpca('pdefaults') returns default process parameter structure.

names = processpca('pdesc') returns the process parameter descriptions.

processpca('pcheck',fp); throws an error if any parameter is illegal.

Examples

Here is how to format a matrix with an independent row, a correlated row, and a completely redundant row so that its rows are uncorrelated and the redundant row is dropped.

x1_independent = rand(1,5)
x1_correlated = rand(1,5) + x1_independent;
x1_redundant = x1_independent + x1_correlated
x1 = [x1_independent; x1_correlated; x1_redundant]
[y1,ps] = processpca(x1)

Next, apply the same processing settings to new values.

x2_independent = rand(1,5)
x2_correlated = rand(1,5) + x1_independent;
x2_redundant = x1_independent + x1_correlated
x2 = [x2_independent; x2_correlated; x2_redundant];
y2 = processpca('apply',x2,ps)

Reverse the processing of y1 to get x1 again.

x1_again = processpca('reverse',y1,ps)

More About

collapse all

Reduce Input Dimensionality Using processpca

In some situations, the dimension of the input vector is large, but the components of the vectors are highly correlated (redundant). It is useful in this situation to reduce the dimension of the input vectors. An effective procedure for performing this operation is principal component analysis. This technique has three effects: it orthogonalizes the components of the input vectors (so that they are uncorrelated with each other), it orders the resulting orthogonal components (principal components) so that those with the largest variation come first, and it eliminates those components that contribute the least to the variation in the data set. The following code illustrates the use of processpca, which performs a principal-component analysis using the processing setting maxfrac of 0.02.

[pn,ps1] = mapstd(p);
[ptrans,ps2] = processpca(pn,0.02);

The input vectors are first normalized, using mapstd, so that they have zero mean and unity variance. This is a standard procedure when using principal components. In this example, the second argument passed to processpca is 0.02. This means that processpca eliminates those principal components that contribute less than 2% to the total variation in the data set. The matrix ptrans contains the transformed input vectors. The settings structure ps2 contains the principal component transformation matrix. After the network has been trained, these settings should be used to transform any future inputs that are applied to the network. It effectively becomes a part of the network, just like the network weights and biases. If you multiply the normalized input vectors pn by the transformation matrix transMat, you obtain the transformed input vectors ptrans.

If processpca is used to preprocess the training set data, then whenever the trained network is used with new inputs, you should preprocess them with the transformation matrix that was computed for the training set, using ps2. The following code applies a new set of inputs to a network already trained.

pnewn = mapstd('apply',pnew,ps1);
pnewtrans = processpca('apply',pnewn,ps2);
a = sim(net,pnewtrans);

Principal component analysis is not reliably reversible. Therefore it is only recommended for input processing. Outputs require reversible processing functions.

Principal component analysis is not part of the default processing for feedforwardnet. You can add this with the following command:

net.inputs{1}.processFcns{end+1} = 'processpca';

Algorithms

Values in rows whose elements are not all the same value are set to

y = 2*(x-minx)/(maxx-minx) - 1;

Values in rows with all the same value are set to 0.

Version History

Introduced in R2006a