Accelerating the pace of engineering and science

# Documentation Center

• Trial Software

# crossvalind

Generate cross-validation indices

## Syntax

Indices = crossvalind('Kfold', N, K)
[Train, Test] = crossvalind('HoldOut', N, P)
[Train, Test] = crossvalind('LeaveMOut', N, M)
[Train, Test] = crossvalind('Resubstitution', N, [P,Q])
[...] = crossvalind(Method, Group, ...)
[...] = crossvalind(Method, Group, ..., 'Classes', C)
[...] = crossvalind(Method, Group, ..., 'Min', MinValue)

## Description

Indices = crossvalind('Kfold', N, K) returns randomly generated indices for a K-fold cross-validation of N observations. Indices contains equal (or approximately equal) proportions of the integers 1 through K that define a partition of the N observations into K disjoint subsets. Repeated calls return different randomly generated partitions. K defaults to 5 when omitted. In K-fold cross-validation, K-1 folds are used for training and the last fold is used for evaluation. This process is repeated K times, leaving one different fold for evaluation each time.

[Train, Test] = crossvalind('HoldOut', N, P) returns logical index vectors for cross-validation of N observations by randomly selecting P*N (approximately) observations to hold out for the evaluation set. P must be a scalar between 0 and 1. P defaults to 0.5 when omitted, corresponding to holding 50% out. Using holdout cross-validation within a loop is similar to K-fold cross-validation one time outside the loop, except that non-disjointed subsets are assigned to each evaluation.

[Train, Test] = crossvalind('LeaveMOut', N, M), where M is an integer, returns logical index vectors for cross-validation of N observations by randomly selecting M of the observations to hold out for the evaluation set. M defaults to 1 when omitted. Using 'LeaveMOut' cross-validation within a loop does not guarantee disjointed evaluation sets. To guarantee disjointed evaluation sets, use 'Kfold' instead.

[Train, Test] = crossvalind('Resubstitution', N, [P,Q]) returns logical index vectors of indices for cross-validation of N observations by randomly selecting P*N observations for the evaluation set and Q*N observations for training. Sets are selected in order to minimize the number of observations that are used in both sets. P and Q are scalars between 0 and 1. Q=1-P corresponds to holding out (100*P)%, while P=Q=1 corresponds to full resubstitution. [P,Q] defaults to [1,1] when omitted.

[...] = crossvalind(Method, Group, ...) takes the group structure of the data into account. Group is a grouping vector that defines the class for each observation. Group can be a numeric vector, a string array, or a cell array of strings. The partition of the groups depends on the type of cross-validation: For K-fold, each group is divided into K subsets, approximately equal in size. For all others, approximately equal numbers of observations from each group are selected for the evaluation set. In both cases the training set contains at least one observation from each group.

[...] = crossvalind(Method, Group, ..., 'Classes', C) restricts the observations to only those values specified in C. C can be a numeric vector, a string array, or a cell array of strings, but it is of the same form as Group. If one output argument is specified, it contains the value 0 for observations belonging to excluded classes. If two output arguments are specified, both will contain the logical value false for observations belonging to excluded classes.

[...] = crossvalind(Method, Group, ..., 'Min', MinValue) sets the minimum number of observations that each group has in the training set. Min defaults to 1. Setting a large value for Min can help to balance the training groups, but adds partial resubstitution when there are not enough observations. You cannot set Min when using K-fold cross-validation.

## Examples

 Note:   The crossvalind function creates random partitions, which depend on the state of the default random stream. Therefore, your results from the following examples will vary from those shown.

Create a 10-fold cross-validation to compute classification error.

indices = crossvalind('Kfold',species,10);
cp = classperf(species);
for i = 1:10
test = (indices == i); train = ~test;
class = classify(meas(test,:),meas(train,:),species(train,:));
classperf(cp,class,test)
end
cp.ErrorRate

ans =

0.0200

Approximate a leave-one-out prediction error estimate.

x = Displacement; y = Acceleration;
N = length(x);
sse = 0;
for i = 1:100
[train,test] = crossvalind('LeaveMOut',N,1);
yhat = polyval(polyfit(x(train),y(train),2),x(test));
sse = sse + sum((yhat - y(test)).^2);
end
CVerr = sse / 100

CVerr =

4.9750

Divide cancer data 60/40 without using the 'Benign' observations. Assume groups are the true labels of the observations.

labels = {'Cancer','Benign','Control'};
groups = labels(ceil(rand(100,1)*3));
[train,test] = crossvalind('holdout',groups,0.6,'classes',...
{'Control','Cancer'});
sum(test) % Total groups allocated for testing

ans =

35

sum(train) % Total groups allocated for training

ans =

26