Large, high-dimensional data sets are common in the new age of computer-based instrumentation and electronic data storage. High-dimensional data present much more challenges for statistical visualization, analysis, and modeling. Data visualization, of course, is impossible after a few dimensions. As a result, pattern recognition, data preprocessing, and model selection should rely heavily on numerical methods.

A basic challenge in high-dimensional data analysis is the so-called curse of dimensionality. Observations in a high-dimensional space are truly sparser and less representative than those in a low-dimensional space. In higher dimensions, data over-represent the edges of a sample distribution, because the regions of higher-dimensional space have the majority of their volume near the surface.

Often, many of the dimensions in a data set have the measured features are not useful in creating a model. Features may be irrelevant or redundant. Regression and classification algorithms may take large amounts of storage and computation time to compute raw data, and even if the algorithms are successful the resulting models may contain an incomprehensible large number of terms.

Because of these challenges, multivariate statistical methods generally begin with some type of dimension reduction, in which data are shown by points in a lower-dimensional space. Dimension reduction is the target of the methods presented in this section. Dimension reduction often points to simpler models and fewer measured variables, with consequent benefits when measurements are expensive and visualization is important. MATLAB gives us these functions to make our work easier.

beta = mvregress(X,Y)

beta = mvregress(X,Y,Name,Value)

[beta,Sigma] = mvregress(___)>

[beta,Sigma,E,CovB,logL] = mvregress(___)

beta = mvregress(X,Y)this function returns the estimated coefficients for a multivariate normal regression of the d-dimensional responses in Y on the design matrices in X.

beta = mvregress(X,Y,Name,Value) this function returns the estimated coefficients using additional options specified by one or more name-value pair arguments

[beta,Sigma] = mvregress(___) this function also returns the estimated d-by-d variance-covariance matrix of Y, using any of the input arguments from the previous syntaxes.

[beta,Sigma,E,CovB,logL] = mvregress(___) this function also returns a matrix of residuals E, estimated variance-covariance matrix of the regression coefficients CovB, and the value of the log likelihood objective function after the last iteration logL.