Accelerating the pace of engineering and science

# Statistics Toolbox

## Exploratory Data Analysis

Statistics Toolbox provides multiple ways to explore data: statistical plotting with interactive graphics, algorithms for cluster analysis, and descriptive statistics for large datasets.

### Statistical Plotting and Interactive Graphics

Statistics Toolbox includes graphs and charts to explore your data visually. The toolbox augments MATLAB® plot types with probability plots, box plots, histograms, scatter histograms, 3D histograms, control charts, and quantile-quantile plots. The toolbox also includes specialized plots for multivariate analysis, including dendograms, biplots, parallel coordinate charts, and Andrews plots.

Group scatter plot matrix showing interactions between five variables.

Compact box plot with whiskers providing a five-number summary of a dataset.

Scatter histogram using a combination of scatter plots and histograms to describe the relationship between variables.

Plot comparing the empirical CDF for a sample from an extreme value distribution with a plot of the CDF for the sampling distribution.

### Cluster Analysis

Statistics Toolbox offers multiple algorithms to analyze data using hierarchical clustering, k-means clustering, and Gaussian mixtures.

Two-component Gaussian mixture model fit to a mixture of bivariate Gaussians.

Output from applying a clustering algorithm to the same example.

Dendrogram plot showing a model with four clusters.

Cluster Analysis (Example)
Use k-means and hierarchical clustering to discover natural groupings in data.

### Descriptive Statistics

Descriptive statistics enable you to understand and describe potentially large sets of data quickly. Statistics Toolbox includes functions for calculating:

• Measures of central tendency (measures of location), including average, median, and various means
• Measures of dispersion (measures of spread), including range, variance, standard deviation, and mean or median absolute deviation
• Linear and rank correlation (partial and full)
• Results based on data with missing values
• Percentile and quartile estimates
• Density estimates using a kernel-smoothing function

These functions help you summarize values in a data sample using a few highly relevant numbers.

In some cases, estimating summary statistics using parametric methods is not possible. To deal with these cases, Statistics Toolbox provides resampling techniques, including:

• Generalized bootstrap function for estimating sample statistics using resampling
• Jackknife function for estimating sample statistics using subsets of the data
• `bootci` function for estimating confidence intervals