Documentation Center 
fitrm is a new function for fitting models to repeated measures data, where each subject has multiple response measurements. It produces an object of the new RepeatedMeasuresModel class. You can:
Perform analysis of variance for betweensubjects factors using anova.
Perform multivariate analysis of variance using manova.
Perform hypothesis tests on the coefficients using coeftest.
Perform repeated measures analysis of variance using ranova.
Test for sphericity (compound symmetry) with Mauchly's test using mauchly.
Plot data and estimated marginal means with optional grouping using plot and plotprofile.
Compute summary statistics organized by group using grpstats.
Perform multiple comparisons of marginal means using multcompare.
Make predictions on new data with the fitted repeated measures model using predict.
Generate random data with the fitted repeated measures model at new design points using random.
For the properties and methods of this object, see the RepeatedMeasuresModel class page.
You can now use the new fitcsvm function to train an SVM classifier for one or twoclass learning. fitcsvm creates an object of the new class ClassificationSVM or existing class ClassificationPartitionedModel.
ClassificationSVM is a new class for accessing and performing operations on the training data. CompactClassificationSVM is a new class for storing configurations of trained models without storing training data. The syntax and methods resemble those in the existing ClassificationTree and CompactClassificationTree classes.
The new fitcsvm function and ClassificationSVM and CompactClassificationSVM classes include the functionality of the svmtrain and svmclassify functions. ClassificationSVM provides several benefits compared to the svmtrain and svmclassify functions:
The new functionality
Supports computation of soft classification scores
Supports fitting posterior probabilities
Has improved training speed, especially on big data with wellseparated classes by providing shrinkage
Allows a warm restart by accepting an initial α value
Allows training to resume after the maximum number of iterations is exceeded
Supports robust learning in the presence of outliers
ClassificationSVM is built on the same framework as ClassificationTree, ClassificationDiscriminant, and ClassificationKNN, so you have a variety of options and methods, including:
Cross validation
Resubstitution statistics
Generalization statistics
Weighted classification
For all methods and properties of the new objects, see the ClassificationSVM and CompactClassificationSVM class pages.
There are two new methods for the objects created using the evalclusters function:
addK adds additional number of clusters to be evaluated. This method applies to all classes of cluster evaluation (i.e., clustering.evaluation.GapEvaluation, clustering.evaluation.SilhouetteEvaluation, clustering.evaluation.CalinskiHarabaszEvaluation, and clustering.evaluation.DaviesBouldinEvaluation).
increaseB increases the number of reference data sets for gap criterion simulations. This method applies to the clustering.evaluation.GapEvaluation class.
The default value of the 'SearchMethod' namevalue pair argument for clustering.evaluation.GapEvaluation objects is now always 'globalMaxSE'.
The default value of the 'SearchMethod' namevalue pair argument for clustering.evaluation.GapEvaluation objects is now always 'globalMaxSE' and does not change depending on the value of the 'KList' namevalue pair argument.
multcompare now returns the pvalue of each pairwise comparison of group means. multcompare returns the pvalue in the sixth column of its first output argument. The pvalue is the overall significance level at which the individual comparison is borderline significant.
The first output argument of multcompare now has six columns, instead of five. The sixth column contains the pvalue.
The following functions and methods now accept table inputs as alternative to dataset array inputs.
Functions and Methods  Class 

fitlm, fitglm, fitlme, fitnlm, stepwiseglm, stepwiselm, grpstats, datasample  N/A 
predict, random, feval  LinearModel 
devianceTest, random, predict, feval  GeneralizedLinearModel 
random, predict, feval  NonLinearModel 
random, predict  LinearMixedModel 
The following functions, methods, and model properties now return a table rather than a dataset array.
Functions and Methods  Class 

xptread, grpstats*  N/A 
anova  LinearModel 
devianceTest  GeneralizedLinearModel 
fixedEffects, randomEffects  LinearMixedModel 
Property  Class 

VariableInfo, ObservationInfo, Variables, Diagnostics, Residuals, Coefficients  LinearModel 
VariableInfo, ObservationInfo, Variables, Diagnostics, Residuals, Fitted, Coefficients  GeneralizedLinearModel 
VariableInfo, ObservationInfo, Variables, Diagnostics, Residuals, Coefficients  NonLinearModel 
VariableInfo, ObservationInfo, Variables, Coefficients, ModelCriterion  LinearMixedModel 
*grpstats now matches the output with input type.
The functions and properties listed now return a table instead of a dataset array. You can convert them to dataset arrays using the table2dataset function.
The default value of the 'EmptyAction' namevalue pair argument of the kmeans function is now 'singleton'.
To set the value of 'EmptyAction' to 'error', you must explicitly specify 'EmptyAction','error'.
The following are new functions for classification and regression trees, discriminant analysis, nearest neighbors, Naive Bayes classification, and Gaussian mixture models.
New Function  Replacing 

fitcdiscr  ClassificationDiscriminant.fit 
fitcknn  ClassificationKNN.fit 
fitctree  ClassificationTree.fit 
fitrtree  RegressionTree.fit 
fitNaiveBayes  NaiveBayes.fit 
fitgmdist  gmdistribution.fit 
templateDiscriminant  ClassificationDiscriminant.template 
templateKNN  ClassificationKNN.template 
templateTree  ClassificationTree.template or RegressionTree.template 
makecdiscr  ClassificationDiscriminant.make 
Functionality  What Happens When You Use This Functionality?  Use This Instead  Compatibility Considerations 

ClassificationDiscriminant.fit  Still runs  fitcdiscr  Replace instances of ClassificationDiscriminant.fit with fitcdiscr. 
ClassificationKNN.fit  Still runs  fitcknn  Replace instances of ClassificationKNN.fit with fitcknn. 
ClassificationTree.fit  Still runs  fitctree  Replace instances of ClassificationTree.fit with fitctree. 
RegressionTree.fit  Still runs  fitrtree  Replace instances of RegressionTree.fit with fitrtree. 
NaiveBayes.fit  Still runs  fitNaiveBayes  Replace instances of NaiveBayes.fit with fitNaiveBayes. 
gmdistribution.fit  Still runs  fitgmdist  Replace instances of gmdistribution.fit with fitgmdist. 
ClassificationDiscriminant.template  Still runs  templateDiscriminant  Replace instances of ClassificationDiscriminant.template with templateDiscriminant. 
ClassificationKNN.template  Still runs  templateKNN  Replace instances of ClassificationKNN.template with templateKNN. 
ClassificationTree.template or RegressionTree.template  Still runs  templateTree  Replace instances of ClassificationTree.template or RegressionTree.template with templateTree. 
ClassificationDiscriminant.make  Still runs  makecdiscr  Replace instances of ClassificationDiscriminant.make with makecdiscr. 
LinearMixedModel is a new class for fitting linear mixedeffects (LME) models. Fit multilevel LME models or LME models with nested and/or crossed random effects using the fitlme or fitlmematrix function. You can:
Specify LME models using either the formula notation or via matrix input.
Fit LME models using maximum likelihood (ML) or restricted maximum likelihood (REML).
Specify a covariance pattern for the random effects.
Calculate estimates of best linear unbiased predictors (BLUPs) for random effects.
Perform custom joint hypothesis tests on fixed and random effects.
Compute confidence intervals on fixed effects, random effects, and covariance parameters.
Examine residuals, diagnostic plots, fitted values, and design matrices.
Compare two different models via theoretical or simulated likelihood ratio tests.
Make predictions on new data using the fitted LME model.
Generate random data using the fitted LME model at new design points.
For the properties and methods of this object, see the class page for LinearMixedModel.
Many probability distribution and descriptive statistics functions are now supported for code generation. For a full list of Statistics Toolbox functions that are supported by MATLAB^{®} Coder™, see Statistics Toolbox Functions.
The new function evalclusters estimates the optimal number of clusters for various criterion values, and returns the clustering solution corresponding to the estimated optimal value.
You can provide clustering solutions, ask evalclusters to use one of the builtin clustering algorithms, 'kmeans', 'linkage', or 'gmdistribution', or provide a function handle.
The following criteria are available:
The CalinskiHarabasz (CH) index
The Silhouette index
The Gap statistic
The DaviesBouldin (DB) index
mvregress now accepts an nby(p + 1) design matrix X, when the response Y is an nbyd matrix with d > 1, where n is the number of observations, p is the number of predictor variables, d is the number of dimensions in the response, and X includes a column of ones for the intercept (constant) term.
Statistics Toolbox now provides upper tail probability calculations for cumulative distribution functions. You can compute the upper tail probabilities using a trailing 'upper' argument in the following functions:
cdf function for probability distribution objects, returned by pd = makedist(distname) or pd = fitdist(X,distname):
cdf(pd,X,'upper')
cdf function:
Y = cdf('name',X,A,'upper')
Y = cdf('name',X,A,B,'upper')
Y = cdf('name',X,A,B,C,'upper')
Distributionspecific cdf functions:
Distribution  New Syntax 

Beta  p = betacdf(X,A,B,'upper') 
Binomial  Y = binocdf(X,N,P,'upper') 
Chisquare  p = chi2cdf(X,V,'upper') 
Extreme Value  P = evcdf(X,mu,sigma,'upper') [P,PLO,PUP] = evcdf(X,mu,sigma,pcov,'upper') 
Exponential  P = expcdf(X,mu,'upper') [P,PLO,PUP] = expcdf(X,mu,pcov,'upper') 
F  P = fcdf(X,V1,V2,'upper') 
Gamma  P = gamcdf(X,A,B,'upper') [P,PLO,PUP] = gamcdf(X,A,B,pcov,'upper') 
Geometric  Y = geocdf(X,P,'upper') 
Generalized Extreme Value  P = gevcdf(X,k,sigma,mu,'upper') 
Generalized Pareto  P = gpcdf(X,sigma,theta,'upper') 
Hypergeometric  P = hygecdf(X,M,K,N,'upper') 
Lognormal  P = logncdf(X,mu,sigma,'upper') [P,PLO,PUP] = logncdf(X,mu,sigma,pcov,'upper') 
Negative Binomial  Y = nbincdf(X,R,P,'upper') 
Noncentral F  P = ncfcdf(X,NU1,NU2,DELTA,'upper') 
Noncentral t  P = nctcdf(X,NU,DELTA,'upper') 
Noncentral Chisquare  P = ncx2cdf(X,V,DELTA,'upper') 
Normal  P = normcdf(X,mu,sigma,'upper') [P,PLO,PUP] = normcdf(X,mu,sigma,pcov,'upper') 
Poisson  P = poisscdf(X,lambda,'upper') 
t  P = tcdf(X,V,'upper') 
Rayleigh  P = raylcdf(X,B,'upper') 
Uniform Discrete  P = unidcdf(X,N,'upper') 
Uniform Continuous  P = unidcdf(X,A,B,'upper') 
Weibull  P = wblcdf(X,A,B,'upper') [P,PLO,PUP] = wblcdf(X,A,B,pcov,'upper') 
The new function partialcorri computes linear partial correlation coefficients with internal adjustments. You can compute partial correlation between pairs of variables in Y and X, adjusting for the remaining variables in X, or between pairs of variables in Y and X, adjusting for the remaining variables in X, after first controlling both X and Y for the variables in Z.
You can also:
Specify whether to use Pearson or Spearman partial correlations.
Specify how to handle missing values.
Perform hypotheses test of zero correlation against a onesided or twosided alternative.
There are new functions for the fitting and stepwise algorithms of linear and generalized linear models, and the fitting algorithm of nonlinear models. The new functions are as follows.
New Function  Replacing 

fitlm  LinearModel.fit 
stepwiselm  LinearModel.stepwise 
fitglm  GeneralizedLinearModel.fit 
stepwiseglm  GeneralizedLinearModel.stepwise 
fitnlm  NonLinearModel.fit 
Functionality  What Happens When You Use This Functionality?  Use This Instead  Compatibility Considerations 

LinearModel.fit  Still runs  fitlm  Replace instances of LinearModel.fit with fitlm 
LinearModel.stepwise  Still runs  stepwiselm  Replace instances of LinearModel.stepwise with stepwiselm 
GeneralizedLinearModel.fit  Still runs  fitglm  Replace instances of GeneralizedLinearModel.fit with fitglm 
GeneralizedLinearModel.stepwise  Still runs  stepwiseglm  Replace instances of GeneralizedLinearModel.stepwise with stepwiseglm 
NonLinearModel.fit  Still runs  fitnlm  Replace instances of NonLinearModel.fit with fitnlm 
Support vector machines are now in Statistics Toolbox™. Train support vector machine classifier using svmtrain and classify data using svmclassify.
Two new features handle missing data in principal component analysis:
The new function adtest performs the AndersonDarling goodnessoffit test. adtest can perform:
Simple test: Test against a specific distribution with parameters specified. You can test against any continuous univariate parametric distribution.
Composite test: Test against a specified distribution family (also called an omnibus test). You can test against the normal, exponential, extremevalue, lognormal, or weibull distribution families.
The training speed for decision trees and their ensembles is improved. The improvement is best seen in decision tree ensembles obtained using the fitensemble function or TreeBagger class.
Improved efficiency of TreeBagger when used in parallel mode.
You can specify the number of surrogate splits saved in decision trees using the 'surrogate' namevalue pair argument in the fit and template methods of the ClassificationTree and RegressionTree classes.
ClassificationTree.fit and ClassificationTree.template provide several heuristic methods for splitting on categorical predictors with many levels. Use the 'AlgorithmForCategorical' namevalue pair argument to specify the algorithm to find the best split and the 'MaxCat' namevalue pair argument to specify the maximum number of categories you allow.
The scatterhist function has these namevalue pair arguments:
'Group' lets you specify a grouping variable and produces a grouped scatter plot.
'Kernel' lets you use grouped kernel density plots instead of overall histograms for the marginal distributions.
Additional options let you change colors, line properties, legends, and more.
These functions now accept additional error models and fixed or fitdependent weights.
NonLinearModel methods: 

nlinfit 

nlpredci 

Additional functionality changes are:
disp (NonLinearModel method) shows only estimable coefficients, and shows NaN for inestimable coefficients.
Ftest (NonLinearModel method) automatically decides whether to compare the full model against an interceptonly model or zero.
NonLinearModel properties such as Diagnostics, Residuals, LogLikelihood, SSE, and SST account for weights and error models.
Parametric hypothesis test functions accept optional input arguments as namevalue pair arguments.
adtest  AndersonDarling goodnessoffit test 
ansaribradley  AnsariBradley test 
dwtest  DurbinWatson test 
kstest  Onesample KolmogorovSmirnov test 
kstest2  Twosample KolmogorovSmirnov test 
lillietest  Lilliefors test 
ttest  Onesample ttest 
ttest2  Twosample ttest 
vartest  Onesample variance chisquare test 
vartest2  Twosample variance Ftest 
vartestn  Variance test across multiple groups 
ztest  ztest 
New probability distribution objects provide the following new functionality:
Create a distribution without fitting to data using the new makedist function.
Assign directly to parameter values.
Create truncated distributions.
Create and operate on arrays of distribution objects.
Create custom distributions. To begin, use dfittool and select Edit > Define Custom Distributions. Use the provided template to define the 'Laplace' distribution, or modify it to create your own.
Compute and plot likelihood ratio confidence intervals and profile likelihood for fitted probability distributions.
Additional distributions in the probability distribution framework:
Multinomial
Piecewise Linear
Triangular
Uniform
You can continue fitting distributions to data using the existing fitdist function.
The class names of probability distribution objects returned by fitdist are different than in earlier releases.
There are three new boosting algorithms for classification:
RUSBoost (boosting by random undersampling) for imbalanced data (data in which one class has many more observations than the other).
LPBoost (linear programming) and TotalBoost (totally corrective boosting) which selfterminate, can lead to a sparse ensemble, and can be used for multiclass boosting.
There is a new probability distribution object for the Burr Type XII distribution, a threeparameter family of continuous distributions on the real line. Use fitdist to fit this distribution to data. Use ProbDistUnivParam to specify the distribution parameters directly. Either function produces a distribution you can use to generate random samples or compute functions such as pdf and cdf.
You can now import data from a file directly into a dataset array using the MATLAB Import Tool.
The new pca function includes additional functionality for principal component analysis. Features of pca include:
Handling of NaN as missing data values.
Weighted principal component analysis with userspecified weights.
Choice of SVD or EIG algorithm for computing principal components.
Option to specify number of components to return.
Option to not center before computing principal components.
Statistics Toolbox now supports parallel execution for kmeans.
The dendrogram function has new options for reordering the nodes of hierarchical binary cluster trees:
The reorder option allows you to specify a permutation vector for the order of nodes in a dendrogram plot.
The checkcrossings option checks whether a requested permutation vector leads to crossing branches in a dendrogram plot.
The function optimalleaforder generates an optimal permutation of nodes.
You can add a vector of observation weights, or a handle to a function that returns a vector of observation weights, to these functions:
For an example of weighted fitting, see Weighted Nonlinear Regression.
Use either Weights or RobustWgtFun when performing weighted nonlinear regression.
The diagnostics in the Diagnostics dataset array for LinearModel objects are in a new order, and no longer appear in the Variables editor. The new order is:
Leverage
CooksDistance
Dffits
S2_i
CovRatio
Dfbetas
HatMatrix
To access the correct diagnostics, you should update any code that indexes the diagnostics dataset array columns by number.
Functionality  What Happens When You Use This Functionality?  Use This Instead  Compatibility Considerations 

princomp  Still runs  pca  Replace instances of princomp with pca 
LinearModel is a new class for performing linear regression. LinearModel.fit creates a model that:
Lets you fit models with both categorical and continuous predictor variables
Contains information about the quality of the fit, such as residuals and ANOVA tables
Lets you easily plot the fit
Allows for automatic or manual exclusion of unimportant variables
Enables robust fitting for reduced influence of outliers
Lets you specify quadratic and other models using a symbolic formula
Enables stepwise model selection
There are similar improvements for generalized linear and nonlinear modeling using the GeneralizedLinearModel and NonLinearModel classes. For details, see the class reference pages in the reference material, or Linear Regression, Stepwise Regression, Robust Regression — Reduce Outlier Effects, Generalized Linear Regression, or Nonlinear Regression in the User's Guide.
You can now edit, sort, plot, and select portions of dataset arrays from the MATLAB Variable Editor. For details, see Using Dataset Arrays in the User's Guide.
The lassoglm function regularizes generalized linear models. Use lassoglm to examine model alternatives and to constrain or remove redundant or unimportant variables in generalized linear regression. For details, see the function reference page, or Lasso Regularization of Generalized Linear Models in the User's Guide.
ClassificationKNN.fit creates a classification model that performs knearest neighbor classification. You can check the quality of the model with cross validation or resubstitution. For details, see the ClassificationKNN page in the reference material, or Classification Using Nearest Neighbors in the User's Guide.
fitensemble can construct random subspace ensembles to improve the classification accuracy of both knearest neighbor classifiers and discriminant analysis classifiers. For details, see Ensemble Methods or Random Subspace Classification in the User's Guide.
ClassificationDiscriminant models now have two parameters, Gamma and Delta, for regularization and lowering the number of variables. Set Gamma to regularize the discriminant. Set Delta to eliminate variables. Use cvshrink to obtain optimal Gamma and Delta parameters by cross validation. For details, see the reference pages, or Regularize a Discriminant Analysis Classifier in the User's Guide.
The stepwisefit function now returns the fitted coefficient history in the history.B field.
The WgtFun option is now called RobustWgtFun in the nlinfit, statget, and statset functions. RobustWgtFun also makes the Robust option superfluous.
The WgtFun and Robust options are currently accepted by all functions. To avoid potential future incompatibilities, update code that uses the WgtFun and Robust options to use the RobustWgtFun option.
The ClassificationTree predict method now chooses the class with minimal expected misclassification cost. Previously, it chose the class with maximal posterior probability. The new behavior is consistent with the cvLoss method. Furthermore, both ClassificationDiscriminant and ClassificationKNN predict using minimal expected misclassification cost. For details, see predict and loss.
If you use a nondefault cost matrix, some ClassificationTree classification predictions can differ from those in previous versions.
The lasso function incorporates both the lasso regularization algorithm and the elastic net regularization algorithm. Use lasso to remove redundant or unimportant variables in linear regression. The lassoPlot function helps you visualize lasso results, with a variety of coefficient trace plots and a crossvalidation plot.
For details, see Lasso and Elastic Net.
You can now use the ClassificationDiscriminant and CompactClassificationDiscriminant classes for classification via discriminant analysis. The syntax and methods resemble those in the existing ClassificationTree and CompactClassificationTree classes. The ClassificationDiscriminant class includes the functionality of the classify function. ClassificationDiscriminant provides several benefits compared to the classify function:
After you fit a classifier, you can predict without refitting.
ClassificationDiscriminant is built on the same framework as ClassificationTree, so you have a variety of options and methods, including:
Cross validation
Resubstitution statistics
A choice of cost functions
Weighted classification
ClassificationDiscriminant can fit several models, including linear, quadratic, and linear or quadratic with pseudoinverse.
For details, see Discriminant Analysis.
The rangesearch function finds all members of a data set that are within a specified distance of members of another data set. As with the knnsearch function, you can set a variety of distance metrics, or program your own. rangesearch has counterparts that are methods of the ExhaustiveSearcher and KDTreeSearcher classes.
The datasample function samples with or without replacement from a data set. It can also perform weighted sampling, with or without replacement.
The fracfactgen function now allows up to 52 factors, instead of the previous limit of 26 factors. Specify factors as casesensitive strings, using 'a' through 'z' for the first 26 factors, and 'A' through 'Z' for the remaining factors.
fracfact now checks for an arbitrary level of interaction in confounding, instead of the previous limit of confounding up to products of two factors. Set the MaxInt namevalue pair to the level of interaction you want. You can also set names for the factors using the FactorNames namevalue pair.
The nlmefit function now returns the covariance matrix of the estimated coefficients as the covb field of the stats structure.
The signrank test now defines ties to be entries that differ by 2*eps or less. Previously, ties were entries that were identical to machine precision.
For R2011b, error and warning message identifiers have changed in Statistics Toolbox.
If you have scripts or functions that use message identifiers that changed, you must update the code to use the new identifiers. Typically, message identifiers are used to turn off specific warning messages, or in code that uses a try/catch statement and performs an action based on a specific error identifier.
For example, if you use the 'resubstitution' method, the 'stats:plsregress:InvalidMCReps' identifier has changed to 'stats:plsregress:InvalidResubMCReps'. If you use the 'resubstitution' method and your code checks for 'stats:plsregress:InvalidMCReps', you must update it to check for 'stats:plsregress:InvalidResubMCReps' instead.
To determine the identifier for a warning, run the following command just after you see the warning:
[MSG,MSGID] = lastwarn;
This command saves the message identifier to the variable MSGID.
To determine the identifier for an error, run the following command just after you see the error:
exception = MException.last; MSGID = exception.identifier;
The new fitensemble function constructs ensembles of decision trees. It provides:
Several popular boosting algorithms (AdaBoostM1, AdaBoostM2, GentleBoost, LogitBoost, and RobustBoost) for classification
Leastsquares boosting (LSBoost) for regression
Most TreeBagger functionality for ensembles of bagged decision trees
There is also an improved interface for classification trees (ClassificationTree) and regression trees (RegressionTree), encompassing the functionality of classregtree.
For details, see Ensemble Methods.
The linkage and clusterdata functions have a new savememory option that can use less memory than before. With savememory set to 'on', the functions do not build a pairwise distance matrix, so use less memory and, depending on problem size, can use less time. You can use the savememory option when:
The linkage method is 'ward', 'centroid', or 'median'
The linkage distance metric is 'euclidean' (default)
For details, see the linkage and clusterdata function reference pages.
The nlmefit and nlmefitsa functions now provide the conditional weighted residuals of the fit. Use this information to assess the quality of the model; see Example: Examining Residuals for Model Verification.
The statset Options structure now includes 'DerivStep', which enables you to set finite differences for gradient estimation.
knnsearch now optionally returns all kth nearest neighbors of points, instead of just one. The knnsearch methods for ExhaustiveSearcher and KDTreeSearcher also have this option.
MATLAB functions generated with the Distribution Fitting Tool now use the fitdist function to create fitted probability distribution objects. The generated functions return probability distribution objects as output arguments.
ncx2cdf is now faster and more accurate for large values of the noncentrality parameter.
If the two categories in a binomial regression model (such as logit or probit) are perfectly separated, the bestfitting model is degenerate with infinite coefficients. In this case, the glmfit function is likely to exceed its iteration limit. glmfit now tries to detect this perfect separation and display a diagnostic message.
mdscale now enforces that, in each column of the output Y, the value with the largest magnitude has a positive sign. This change makes results consistent across releases and platforms—small changes used to lead to sign reversals.
Statistics Toolbox now supports parallel execution for the following functions:
For more information, see the Parallel Statistics chapter in the User's Guide.
New filter algorithm, relieff, is based on nearest neighbors. The ReliefF algorithm accounts for correlations among predictors by computing the effect of every predictor on the class label (or true response for regression) locally and then integrates these local estimates over the entire predictor space.
nlmefit now supports the following error models:
combined
constant
exponential
proportional
You can specify an error model with both nlmefitsa and nlmefit.
The nlmefit bic calculation has changed. Now the degrees of freedom value is based on the number of groups rather than the number of observations. This conforms with the bic definition used by the nlmefitsa function.
Both nlmefit and nlmefitsa now store the estimated error parameters in the errorparm field of the output stats structure. The rmse field of the structure now contains the root mean squared residual for all error models; this value is computed on the log scale for the exponential model.
In the previous release, the rmse field was used by nlmefitsa for both mean squared residual and the estimated error parameter. Change your code, if necessary, to address the appropriate field in the stats structure.
As described in nlmefit Support for Error Models, and nlmefitsa changes, nlmefit now calculates different bic values than in previous releases.
The new surrogate splits feature in classregtree allows for better handling of missing values, more accurate estimation of variable importance, and calculation of the predictive measure of association between variables.
TreeBagger and CompactTreeBagger classes have two new properties:
NVarSplit provides the number of decision splits for each predictor variable.
VarAssoc provides a measure of association between pairs of predictor variables.
The distribution fitting GUI (dfittool) now allows you to export fits to the MATLAB workspace as probability distribution fit objects. For more information, see Modeling Data Using the Distribution Fitting Tool.
If you load a distribution fitting session that was created with previous versions of Statistics Toolbox, you cannot save an existing fit. Fit the distribution again to enable saving.
partialcorr now accepts a new syntax, RHO = partialcorr(X), which returns the sample linear partial correlation coefficients between pairs of variables in X, controlling for the remaining variables in X. For more information, see the function reference page.
quantile now accepts a new syntax, Y = quantile(X,N,...), which returns quantiles at the cumulative probabilities (1:N)/(N+1) where N is a scalar positive integer value.
scatterhist now accepts three parameter name/value pairs that control where and how the histogram plots appear. The new parameter names are NBins, Location, and Direction. For more information, see the function reference page.
bootci has a new output option which returns the bootstrapped statistic computed for each of the NBoot bootstrap replicate samples. For more information, see the function reference page.
New stochastic algorithm for fitting NLME models is more robust with respect to starting values, enables parameter transformations, and relaxes assumption of constant error variance. See nlmefitsa.
New functions for kNearest Neighbor (kNN) search efficiently to find the closest points to any query point. For information, see kNearest Neighbor Search and Radius Search.
A new option in the perfcurve function computes confidence intervals for classifier performance curves.
Statistics Toolbox now supports parallel execution for the following functions:
For more information on parallel computing in the Statistics Toolbox, see Parallel Computing Support for Resampling Methods.
dataset.unstack converts a "tall" dataset array to an equivalent dataset array that is in "wide format", by "unstacking" a single variable in the tall dataset array into multiple variables in wide. dataset.stack reverses this manipulation by converting a "wide" dataset array to an equivalent dataset array that is in "tall format", by "stacking up" multiple variables in the wide dataset array into a single variable in tall.
Statistics Toolbox now supports importing and exporting files in SAS Transport (.xpt) format. For more information, see the xptread and dataset.export reference pages.
An enhanced dataset.join method provides additional types of join operations:
join can now perform more complicated inner and outer join operations that allow a manytomany correspondence between dataset arrays A and B, and allow unmatched observations in either A or B.
join can be of Type 'inner', 'leftouter', 'rightouter', 'fullouter', or 'outer' (which is a synonym for 'fullouter'). For an inner join, the dataset array, C, only contains observations corresponding to a combination of key values that occurred in both A and B. For a left (or right) outer join, C also contains observations corresponding to keys in A (or B) that did not match any in B (or A).
join can now return index vectors indicating the correspondence between observations in C and those in A and B.
join now supports using multiple keys.
join now supports an optional parameter for specifying missing key behavior rather than raising an error.
An enhanced dataset.export method now supports exporting directly to Microsoft^{®} Excel^{®} files.
The NaiveBayes classification object is suitable for data sets that contain many predictors or features.
It supports normal, kernel, multinomial, and multivariate multinomial distributions.
New classification objects, TreeBagger and CompactTreeBagger, provide improved performance through bootstrap aggregation (bagging).
Includes Breiman's "random forest" method.
Enhanced classregtree has more options for growing and pruning trees.
New perfcurve function provides graphical method to evaluate classification results.
Includes ROC (receiver operating characteristic) and other curves.
Provides a consistent interface for working with probability distributions.
Can be created directly using the ProbDistUnivParam constructor, or fit to data using the fitdist function.
Option to fit distributions by group.
Includes kernel object methods and parametric object methods that you can use to analyze the distribution represented by the object.
Includes kernel object properties and parametric object properties that you can access to determine the fit results and evaluate their accuracy.
Related enhancements in the chi2gof, histfit, kstest, probplot, and qqplot functions.
The new confusionmat function tabulates misclassifications by comparing known and predicted classes of observations.
Dataset arrays constructed by the dataset function can now be written to an external text file using the new export function.
When reading external text files into a dataset array, dataset has a new 'TreatAsEmpty' parameter for specifying strings to be treated as empty.
In previous versions, dataset used eval to evaluate strings in external text files before writing them into a dataset array. As a result, strings such as '1/1/2008' were treated as numerical expressions with two divides. Now, dataset treats such expressions as strings, and writes a string variable into the dataset array whenever a column in the external file contains a string that does not represent a valid scalar value.
The crossvalidation function, crossval, has new options for directly specifying loss functions for meansquared error or misclassification rate, without having to provide a separate function Mfile.
The procrustes function has new options for computing linear transformations without scale or reflection components.
The multivariate normal functions mvnpdf, mvncdf, and mvnrnd now accept vector specification of diagonal covariance matrices, with corresponding gains in computational efficiency.
The hypergeometric distribution has been added to both the disttool and randtool graphical user interfaces.
The ksdensity function may give different answers for the case where there are censoring times beyond the last observed value. In this case, ksdensity tries to reduce the bias in its density estimate by folding kernel functions across a folding point so that they do not extend into the area that is completely censored. Two things have changed for this release:
In previous releases the folding point was the last observed value. In this release it is the first censoring time after the last observed value.
The folding procedure is applied not just when the 'function' parameter is 'pdf', but for all 'function' values.
The new nlmefit function fits nonlinear mixedeffects models to data with both fixed and random sources of variation. Mixedeffects models are commonly used with data over multiple groups, where measurements are correlated within groups but independent between groups.
The boxplot function has new options for handling multiple grouping variables and extreme outliers.
The lsline, gline, refline, and refcurve functions now work with scatter plots produced by the scatter function. In previous versions, these functions worked only with scatter plots produced by the plot function.
The following visualization functions now have custom data cursors, displaying information such as observation numbers, group numbers, and the values of related variables:
Changes to boxplot have altered a number of default behaviors:
Box labels are now drawn as text objects rather than tick labels. Any code that customizes the box labels by changing tick marks should now set the tick locations as well as the tick labels.
The function no longer returns a handles array with a fixed number handles, and the order and meaning of the handles now depends on which options are selected. To locate a handle of interest, search for its 'Tag' property using findobj. 'Tag' values for box plot components are listed on the boxplot reference page.
There are now valid handles for outliers, even when boxes have no outliers. In previous releases, the handles array returned by the function had NaN values in place of handles when boxes had no outliers. Now the 'xdata' and 'ydata' for outliers are NaN when there are no outliers.
For small groups, the 'notch' parameter sometimes produces notches that extend outside of the box. In previous releases, the notch was truncated to the extent of the box, which could produce a misleading display. A new value of 'markers' for this parameter avoids the display issue.
As a consequence, the anova1 function, which displays notched box plots for grouped data, may show notches that extend outside the boxes.
The statistics options structure created by statset now includes a Jacobian field to specify whether or not an objective function can return the Jacobian as a second output.
Bootstrap confidence intervals computed by bootci are now more accurate for lumpy data.
The formula for bootci confidence intervals of type 'bca' or 'cper' involves the proportion of bootstrap statistics less than the observed statistic. The formula now takes into account cases where there are many bootstrap statistics exactly equal to the observed statistic.
Two new crossvalidation functions, cvpartition and crossval, partition data and assess models in regression, classification, and clustering applications.
A new sequential feature selection function, sequentialfs, selects predictor subsets that optimize userdefined prediction criteria.
The new nnmf function performs nonnegative matrix factorization (NMF) for dimension reduction.
The new sobolset and haltonset functions produce quasirandom point sets for applications in Monte Carlo integration, spacefilling experimental designs, and global optimization. Options allow you to skip, leap over, and scramble the points. The qrandstream function provides corresponding quasirandom number streams for intermittent sampling.
The new plsregress function performs partial leastsquares regression for data with correlated predictors.
The normspec function now shades regions of a normal density curve that are either inside or outside specification limits.
The statistics options structure created by statset now includes fields for TolTypeFun and TolTypeX, to specify tolerances on objective functions and parameter values, respectively.
The new gmdistribution class represents Gaussian mixture distributions, where random points come from different multivariate normal distributions with certain probabilities. The gmdistribution constructor creates mixture models with specified means, covariances, and mixture proportions, or by fitting a mixture model with a specified number of components to data. Methods for the class include:
The cluster function for hierarchical clustering now accepts a vector of cutoff values, and returns a matrix of cluster assignments, with one column per cutoff value.
The kmeans function now returns a vector of cluster indices of length n, where n is the number of rows in the input data matrix X, even when X contains NaN values. In the past, rows of X with NaN values were ignored, and the vector of cluster indices was correspondingly reduced in size. Now the vector of cluster indices contains NaN values where rows have been ignored, consistent with other toolbox functions.
The kstest function now uses a more accurate method to calculate the pvalue for a singlesample KolmogorovSmirnov test.
kstest now compares the computed pvalue to the desired cutoff, rather than comparing the test statistic to a table of values. Results may differ from those in previous releases, especially for small samples in twosided tests where an asymptotic formula was used in the past.
A new fitting function, copulafit, has been added to the family of functions that describe dependencies among variables using copulas. The function fits parametric copulas to data, providing a link between models of marginal distributions and models of data correlations.
A number of probability functions now have improved accuracy, especially for extreme parameter values. The functions are:
betainv — More accurate for probabilities in P near 1.
binocdf — More efficient and less likely to run out of memory for large values in X.
binopdf — More accurate when the probabilities in P are on the order of eps.
fcdf — More accurate when the parameter ratios V2./V1 are much less than the values in X.
ncx2cdf — More accurate in some extreme cases that previously returned 0.
poisscdf — More efficient and less likely to run out of memory for large values in X.
tcdf — More accurate when the squares of the values in X are much less than the parameters in V.
tinv — More accurate when the probabilities in P are very close to 0.5 and the outputs are very small in magnitude.
Functionstyle syntax for paretotails objects has been removed.
The changes to the probability functions listed above may lead to different, but more accurate, outputs than in previous releases.
In previous releases, syntax of the form obj(x) for a paretotails objects obj invoked the cdf method. This syntax now produces a warning. To evaluate the cumulative distribution function, use the syntax cdf(obj,x).
The new corrcov function converts a covariance matrix to the corresponding correlation matrix.
The mvregress function now supports an option to force the estimated covariance matrix to be diagonal.
In previous releases the mvregress function, when using the 'cwls' algorithm, estimated the covariance of coefficients COVB using the estimated, rather than the initial, covariance of the responses SIGMA. The initial SIGMA is now used, and COVB differs to a degree dependent on the difference between the initial and final estimates of SIGMA.
The boxplot function has a new 'compact' plot style suitable for displaying large numbers of groups.
New categorical and dataset arrays are available for organizing and processing statistical data.
Categorical arrays facilitate the use of nominal and ordinal categorical data.
Dataset arrays provide a natural way to encapsulate heterogeneous statistical data and metadata, so that it can be accessed and manipulated using familiar methods analogous to those for numerical matrices.
Categorical and dataset arrays are supported by a variety of new functions for manipulating the encapsulated data.
Categorical arrays are now accepted as input arguments in all Statistics Toolbox functions that make use of grouping variables.
Expanded options are available for linear hypothesis testing.
The new linhyptest function performs linear hypothesis tests on parameters such as regression coefficients. These tests have the form H*b = c for specified values of H and c, where b is a vector of unknown parameters.
The covb output from regstats and the SIGMA output from nlinfit are suitable for use as the covariance matrix input argument required by linhyptest. The following functions have been modified to return a covb output for use with linhyptest: coxphfit, glmfit, mnrfit, robustfit.
The new cholcov function computes a Choleskylike decomposition of a covariance matrix, even if the matrix is not positive definite. Factors are useful in many of the same ways as Cholesky factors, such as imposing correlation on random number generators.
The classify function for discriminant analysis has been improved.
The function now computes the coefficients of the discriminant functions that define boundaries between classification regions.
The output of the function is now of the same type as the input grouping variable group.
The classify function now returns outputs of different type than it did in the past. If the input argument group is a logical vector, output is now converted to a logical vector. In the past, output was returned as a cell array of 0s and 1s. If group is numeric, the output is now converted to the same type. For example, if group is of type uint8, the output will be of type uint8.
New paretotails objects are available for modeling distributions with an empirical cdf or similar distribution in the center and generalized Pareto distributions in the tails.
The paretotails function converts a data sample to a paretotails object. The objects are useful for generating random samples from a distribution similar to the data, but with tail behavior that is less discrete than the empirical distribution.
Objects from the paretotails class are supported by a variety of new methods for working with the piecewise distribution.
The paretotails class provides functionlike behavior, so that p(x) evaluates the cdf of p at values x.
The new mvregresslike function is a utility related to the mvregress function for fitting regression models to multivariate data with missing values. The new function computes the objective (log likelihood) function, and can also compute the estimated covariance matrix for the parameter estimates.
New classregtree objects are available for creating and analyzing classification and regression trees.
The classregtree function fits a classification or regression tree to training data. The objects are useful for predicting response values from new predictors.
Objects from the classregtree class are supported by a variety of new methods for accessing information about the tree.
The classregtree class provides functionlike behavior, so that t(X) evaluates the tree t at predictor values in X.
The following functions now create or operate on objects from the new classregtree class: treefit, treedisp, treeval, treefit, treeprune, treetest.
Objects from the classregtree class are intended to be compatible with the structure arrays that were produced in previous versions by the classification and regression tree functions listed above. In particular, classregtree supports dot indexing of the form t.property to obtain properties of the object t. The class also provides functionlike behavior through parenthesis indexing, so that t(x) uses the tree t to classify or compute fitted values for predictors x, rather than index into t as a structure array as it did in the past. As a result, cell arrays should now be used to aggregate classregtree objects.
The new scatterhist function produces a scatterplot of 2D data and illustrates the marginal distributions of the variables by drawing histograms along the two axes. The function is also useful for viewing properties of random samples produced by functions such as copularnd, mvnrnd, and lhsdesign.
The following demo has been updated:
Selecting a Sample Size — Modified to highlight the new sampsizepwr function
The following visualization functions, commonly used in the design of experiments, have been added:
interactionplot — Twofactor interaction plot for the mean
maineffectsplot — Main effects plot for the mean
multivarichart — Multivari chart for the mean
The following functions for hypothesis testing have been added or improved:
jbtest — Replaces the chisquare approximation of the test statistic, which is asymptotic, with a more accurate algorithm that interpolates pvalues from a table of quantiles. A new option allows you to run Monte Carlo simulations to compute pvalues outside of the table.
lillietest — Uses an improved version of Lilliefors' table of quantiles, covering a wider range of sample sizes and significance levels, with more accurate values. New options allow you to test for exponential and extreme value distributions, as well as normal distributions, and to run Monte Carlo simulations to compute pvalues outside of the tables.
runstest — Adds a test for runs up and down to the existing test for runs above or below a specified value.
sampsizepwr — New function to compute the sample size necessary for a test to have a specified power. Options are available for choosing a variety of test types.
If the significance level for a test lies outside the range of tabulated values, [0.001, 0.5], then both jbtest and lillietest now return an error. In previous versions, jbtest returned an approximate pvalue and lillietest returned an error outside a smaller range, [0.01, 0.2]. Error messages suggest using the new Monte Carlo option for computing values outside the range of tabulated values.
If the data sample for a test leads to a pvalue outside the range of tabulated values, then both jbtest and lillietest now return, with a warning, either the smallest or largest tabulated value. In previous versions, jbtest returned an approximate pvalue and lillietest returned NaN.
Support has been added for multinomial regression modeling of discrete multicategory response data, including multinomial logistic regression. The following new functions supplement the regression models in glmfit and glmval by providing for a wider range of response values:
The new mvregress function carries out multivariate regression on data with missing response values. An option allows you to specify how missing data is handled.
coxphfit — A new option allows you to specify the values at which the baseline hazard is computed.
The following new functions consolidate and expand upon existing functions for statistical process control:
capability — Computes a wider range of probabilities and capability indices than the capable function found in previous releases
controlchart — Displays a wider range of control charts than the ewmaplot, schart, and xbarplot functions found in previous releases
controlrules — Supplements the new controlchart function by providing for a wider range of control rules (Western Electric and Nelson)
gagerr — Performs a gage repeatability and reproducibility study on measurements grouped by operator and part
The capability function subsumes the capable function that appeared in previous versions of Statistics Toolbox software, and the controlchart function subsumes the functions ewmaplot, schart, and xbarplot. The older functions remain in the toolbox for backwards compatibility, but they are no longer documented or supported.
Support for nested and continuous factors has been added to the anovan function for Nway analysis of variance.
The following functions have been added to supplement the existing bootstrp function for bootstrap estimation:
The following demos have been added to the toolbox:
Bayesian Analysis for a Logistic Regression Model
Time Series Regression of Airline Passenger Data
The following demo has been updated to demonstrate new features:
Random Number Generation
The new fracfactgen function finds a set of fractional factorial design generators suitable for fitting a specified model.
The following functions for Doptimal designs have been enhanced:
cordexch, daugment, dcovary, rowexch — New options specify the range of values and the number of levels for each factor, exclude factor combinations, treat factors as categorical rather than continuous, control the number of iterations, and repeat the design generation process from random starting points
candexch — New options control the number of iterations and repeat the design generation process from random starting points
candgen — New options specify the range of values and the number of levels for each factor, and treat factors as categorical rather than continuous
x2fx — New option treats factors as categorical rather than continuous
The new dwtest function performs a DurbinWatson test for autocorrelation in linear regression.
Two new functions have been added to compute multivariate cdfs. These supplement existing functions for pdfs and random number generators for the same distributions.
New functions have been added to the toolbox that allow you to use copulas to model correlated multivariate data and generate random numbers from multivariate distributions.
copulacdf — Cumulative distribution function for a copula
copulaparam — Copula parameters as a function of rank correlation
copulapdf — Probability density function for a copula
copularnd — Random numbers from a copula
copulastat — Rank correlation for a copula
The following functions generate random numbers from nonstandard distributions using Markov Chain Monte Carlo methods:
mhsample — Generate random numbers using the MetropolisHasting algorithm
slicesample — Generate random numbers using a slice sampling algorithm
The following demos have been added to the toolbox:
Curve Fitting and Distribution Fitting
Fitting a Univariate Distribution Using Cumulative Probabilities
Fitting an Orthogonal Regression Using Principal Components Analysis
Modelling Tail Data with the Generalized Pareto Distribution
Pitfalls in Fitting Nonlinear Models by Transforming to Linearity
Weighted Nonlinear Regression
The following demo has been updated:
Modelling Data with the Generalized Extreme Value Distribution
The new partialcorr function computes the correlation of one set of variables while controlling for a second set of variables.
The grpstats function now computes a wider variety of descriptive statistics for grouped data. Choices include the mean, standard error of the mean, number of elements, group name, standard deviation, variance, confidence interval for the mean, and confidence interval for new observations. The function also supports the computation of userdefined statistics.
The new chi2gof function tests if a sample comes from a specified distribution, against the alternative that it does not come from that distribution, using a chisquare test statistic.
Three functions have been added to test sample variances:
vartest — Onesample chisquare variance test. Tests if a sample comes from a normal distribution with specified variance, against the alternative that it comes from a normal distribution with a different variance.
vartest2 — Twosample Ftest for equal variances. Tests if two independent samples come from normal distributions with the same variance, against the alternative that they come from normal distributions with different variances.
vartestn — Bartlett multiplesample test for equal variances. Tests if multiple samples come from normal distributions with the same variance, against the alternative that they come from normal distributions with different variances.
The new ansaribradley function tests if two independent samples come from the same distribution, against the alternative that they come from distributions that have the same median and shape but different variances.
The new runstest function tests if a sequence of values comes in random order, against the alternative that the ordering is not random.
Support has been added for two new distributions:
The Generalized Extreme Value distribution combines the Gumbel, Frechet, and Weibull distributions into a single distribution. It is used to model extreme values in data.
The following distribution functions have been added:
The cophenet function now returns cophenetic distances as well as the cophenetic correlation coefficient.
Release  Features or Changes with Compatibility Considerations 

R2014a  
R2013b  None 
R2013a  Probability distribution enhancements 
R2012b  
R2012a  
R2011b  Conversion of Error and Warning Message Identifiers 
R2011a  None 
R2010b  
R2010a  None 
R2009b  None 
R2009a  None 
R2008b  
R2008a  Descriptive Statistics 
R2007b  
R2007a  
R2006b  
R2006a  None 
R14SP3  None 
R14SP2  None 