Linear Regression

Prepare Data

To begin fitting a regression, put your data into a form that fitting functions expect. All regression techniques begin with input data in an array X and response data in a separate vector y, or input data in a table or dataset array tbl and response data as a column in tbl. Each row of the input data represents one observation. Each column represents one predictor (variable).

For a table or dataset array tbl, indicate the response variable with the 'ResponseVar' name-value pair:

mdl = fitlm(tbl,'ResponseVar','BloodPressure');

The response variable is the last column by default.

You can use numeric categorical predictors. A categorical predictor is one that takes values from a fixed set of possibilities.

  • For a numeric array X, indicate the categorical predictors using the 'Categorical' name-value pair. For example, to indicate that predictors 2 and 3 out of six are categorical:

    mdl = fitlm(X,y,'Categorical',[2,3]);
    % or equivalently
    mdl = fitlm(X,y,'Categorical',logical([0 1 1 0 0 0]));
  • For a table or dataset array tbl, fitting functions assume that these data types are categorical:

    • Logical vector

    • Categorical vector

    • Character array

    • String array

    If you want to indicate that a numeric predictor is categorical, use the 'Categorical' name-value pair.

Represent missing numeric data as NaN. To represent missing data for other data types, see Missing Group Values.

Dataset Array for Input and Response Data

To create a dataset array from an Excel® spreadsheet:

ds = dataset('XLSFile','hospital.xls', ...
    'ReadObsNames',true);

To create a dataset array from workspace variables:

load carsmall
ds = dataset(MPG,Weight);
ds.Year = categorical(Model_Year);

Table for Input and Response Data

To create a table from an Excel spreadsheet:

tbl = readtable('hospital.xls', ...
    'ReadRowNames',true);

To create a table from workspace variables:

load carsmall
tbl = table(MPG,Weight);
tbl.Year = categorical(Model_Year);

Numeric Matrix for Input Data, Numeric Vector for Response

For example, to create numeric arrays from workspace variables:

load carsmall
X = [Weight Horsepower Cylinders Model_Year];
y = MPG;

To create numeric arrays from an Excel spreadsheet:

[X, Xnames] = xlsread('hospital.xls');
y = X(:,4); % response y is systolic pressure
X(:,4) = []; % remove y from the X matrix

Notice that the nonnumeric entries, such as sex, do not appear in X.

Choose a Fitting Method

There are three ways to fit a model to data:

  • Least-Squares Fit

  • Robust Fit

  • Stepwise Fit

Least-Squares Fit

Use fitlm to construct a least-squares fit of a model to the data. This method is best when you are reasonably certain of the model’s form, and mainly need to find its parameters. This method is also useful when you want to explore a few models. The method requires you to examine the data manually to discard outliers, though there are techniques to help (see Examine Quality and Adjust Fitted Model).

Robust Fit

Use fitlm with the RobustOpts name-value pair to create a model that is little affected by outliers. Robust fitting saves you the trouble of manually discarding outliers. However, step does not work with robust fitting. This means that when you use robust fitting, you cannot search stepwise for a good model.

Stepwise Fit

Use stepwiselm to find a model, and fit parameters to the model. stepwiselm starts from one model, such as a constant, and adds or subtracts terms one at a time, choosing an optimal term each time in a greedy fashion, until it cannot improve further. Use stepwise fitting to find a good model, which is one that has only relevant terms.

The result depends on the starting model. Usually, starting with a constant model leads to a small model. Starting with more terms can lead to a more complex model, but one that has lower mean squared error. See Compare large and small stepwise models.

You cannot use robust options along with stepwise fitting. So after a stepwise fit, examine your model for outliers (see Examine Quality and Adjust Fitted Model).

Choose a Model or Range of Models

There are several ways of specifying a model for linear regression. Use whichever you find most convenient.

  • Brief Name

  • Terms Matrix

  • Formula

For fitlm, the model specification you give is the model that is fit. If you do not give a model specification, the default is 'linear'.

For stepwiselm, the model specification you give is the starting model, which the stepwise procedure tries to improve. If you do not give a model specification, the default starting model is 'constant', and the default upper bounding model is 'interactions'. Change the upper bounding model using the Upper name-value pair.

Note

There are other ways of selecting models, such as using lassolassoglmsequentialfs, or plsregress.

Brief Name

Name Model Type
'constant' Model contains only a constant (intercept) term.
'linear' Model contains an intercept and linear terms for each predictor.
'interactions' Model contains an intercept, linear terms, and all products of pairs of distinct predictors (no squared terms).
'purequadratic' Model contains an intercept, linear terms, and squared terms.
'quadratic' Model contains an intercept, linear terms, interactions, and squared terms.
'polyijk' Model is a polynomial with all terms up to degree i in the first predictor, degree j in the second predictor, etc. Use numerals 0 through 9. For example, 'poly2111' has a constant plus all linear and product terms, and also contains terms with predictor 1 squared.

For example, to specify an interaction model using fitlm with matrix predictors:

mdl = fitlm(X,y,'interactions');

To specify a model using stepwiselm and a table or dataset array tbl of predictors, suppose you want to start from a constant and have a linear model upper bound. Assume the response variable in tbl is in the third column.

mdl2 = stepwiselm(tbl,'constant', ...
    'Upper','linear','ResponseVar',3);

Terms Matrix

A terms matrix T is a t-by-(p + 1) matrix specifying terms in a model, where t is the number of terms, p is the number of predictor variables, and +1 accounts for the response variable. The value of T(i,j) is the exponent of variable j in term i.

For example, suppose that an input includes three predictor variables x1x2, and x3 and the response variable y in the order x1x2x3, and y. Each row of T represents one term:

  • [0 0 0 0] — Constant term or intercept

  • [0 1 0 0] — x2; equivalently, x1^0 * x2^1 * x3^0

  • [1 0 1 0] — x1*x3

  • [2 0 0 0] — x1^2

  • [0 1 2 0] — x2*(x3^2)

 

The 0 at the end of each term represents the response variable. In general, a column vector of zeros in a terms matrix represents the position of the response variable. If you have the predictor and response variables in a matrix and column vector, then you must include 0 for the response variable in the last column of each row.

Formula

A formula for a model specification is a character vector or string scalar of the form

'y ~ terms',

  • y is the response name.

  • terms contains

    • Variable names

    • + to include the next variable

    • - to exclude the next variable

    • : to define an interaction, a product of terms

    • * to define an interaction and all lower-order terms

    • ^ to raise the predictor to a power, exactly as in * repeated, so ^ includes lower order terms as well

    • () to group terms

Tip

Formulas include a constant (intercept) term by default. To exclude a constant term from the model, include -1 in the formula.

Examples:

'y ~ x1 + x2 + x3' is a three-variable linear model with intercept.
'y ~ x1 + x2 + x3 - 1' is a three-variable linear model without intercept.
'y ~ x1 + x2 + x3 + x2^2' is a three-variable model with intercept and a x2^2 term.
'y ~ x1 + x2^2 + x3' is the same as the previous example, since x2^2 includes a x2 term.
'y ~ x1 + x2 + x3 + x1:x2' includes an x1*x2 term.
'y ~ x1*x2 + x3' is the same as the previous example, since x1*x2 = x1 + x2 + x1:x2.
'y ~ x1*x2*x3 - x1:x2:x3' has all interactions among x1x2, and x3, except the three-way interaction.
'y ~ x1*(x2 + x3 + x4)' has all linear terms, plus products of x1 with each of the other variables.

For example, to specify an interaction model using fitlm with matrix predictors:

mdl = fitlm(X,y,'y ~ x1*x2*x3 - x1:x2:x3');

To specify a model using stepwiselm and a table or dataset array tbl of predictors, suppose you want to start from a constant and have a linear model upper bound. Assume the response variable in tbl is named 'y', and the predictor variables are named 'x1''x2', and 'x3'.

mdl2 = stepwiselm(tbl,'y ~ 1','Upper','y ~ x1 + x2 + x3');

Fit Model to Data

The most common optional arguments for fitting:

  • For robust regression in fitlm, set the 'RobustOpts' name-value pair to 'on'.

  • Specify an appropriate upper bound model in stepwiselm, such as set 'Upper' to 'linear'.

  • Indicate which variables are categorical using the 'CategoricalVars' name-value pair. Provide a vector with column numbers, such as [1 6] to specify that predictors 1 and 6 are categorical. Alternatively, give a logical vector the same length as the data columns, with a 1 entry indicating that variable is categorical. If there are seven predictors, and predictors 1 and 6 are categorical, specify logical([1,0,0,0,0,1,0]).

  • For a table or dataset array, specify the response variable using the 'ResponseVar' name-value pair. The default is the last column in the array.

For example,

mdl = fitlm(X,y,'linear', ...
    'RobustOpts','on','CategoricalVars',3);
mdl2 = stepwiselm(tbl,'constant', ...
    'ResponseVar','MPG','Upper','quadratic');

 

Matlabsolutions.com provides guaranteed satisfaction with a commitment to complete the work within time. Combined with our meticulous work ethics and extensive domain experience, We are the ideal partner for all your homework/assignment needs. We pledge to provide 24*7 support to dissolve all your academic doubts. We are composed of 300+ esteemed Matlab and other experts who have been empanelled after extensive research and quality check.

Matlabsolutions.com provides undivided attention to each Matlab assignment order with a methodical approach to solution. Our network span is not restricted to US, UK and Australia rather extends to countries like Singapore, Canada and UAE. Our Matlab assignment help services include Image Processing Assignments, Electrical Engineering Assignments, Matlab homework help, Matlab Research Paper help, Matlab Simulink help. Get your work done at the best price in industry.

Machine Learning in MATLAB

Train Classification Models in Classification Learner App

Train Regression Models in Regression Learner App

Distribution Plots

Explore the Random Number Generation UI

Design of Experiments

Machine Learning Models

Logistic regression

Logistic regression create generalized linear regression model - MATLAB fitglm 2

Support Vector Machines for Binary Classification

Support Vector Machines for Binary Classification 2

Support Vector Machines for Binary Classification 3

Support Vector Machines for Binary Classification 4

Support Vector Machines for Binary Classification 5

Assess Neural Network Classifier Performance

Naive Bayes Classification

ClassificationTree class

Discriminant Analysis Classification

Ensemble classifier

ClassificationTree class 2

Train Generalized Additive Model for Binary Classification

Train Generalized Additive Model for Binary Classification 2

Classification Using Nearest Neighbors

Classification Using Nearest Neighbors 2

Classification Using Nearest Neighbors 3

Classification Using Nearest Neighbors 4

Classification Using Nearest Neighbors 5

Linear Regression

Linear Regression 2

Linear Regression 3

Linear Regression 4

Nonlinear Regression

Nonlinear Regression 2

Visualizing Multivariate Data

Generalized Linear Models

Generalized Linear Models 2

RegressionTree class

RegressionTree class 2

Neural networks

Gaussian Process Regression Models

Gaussian Process Regression Models 2

Understanding Support Vector Machine Regression

Understanding Support Vector Machine Regression 2

RegressionEnsemble