Train Generalized Additive Model for Binary Classification

This example shows how to train a Generalized Additive Model (GAM) for Binary Classification with optimal parameters and how to assess the predictive performance of the trained model. The example first finds the optimal parameter values for a univariate GAM (parameters for linear terms) and then finds the values for a bivariate GAM (parameters for interaction terms). Also, the example explains how to interpret the trained model by examining local effects of terms on a specific prediction and by computing the partial dependence of the predictions on predictors.

Load Sample Data

Load the 1994 census data stored in census1994.mat. The data set consists of demographic data from the US Census Bureau to predict whether an individual makes over $50,000 per year. The classification task is to fit a model that predicts the salary category of people given their age, working class, education level, marital status, race, and so on.

load census1994

census1994 contains the training data set adultdata and the test data set adulttest. To reduce the running time for this example, subsample 500 training observations and 500 test observations by using the datasample function.

rng(1) % For reproducibility
NumSamples = 5e2;
adultdata = datasample(adultdata,NumSamples,'Replace',false);
adulttest = datasample(adulttest,NumSamples,'Replace',false);

Train GAM with Optimal Hyperparameters

Train a GAM with hyperparameters that minimize the cross-validation loss by using the OptimizeHyperparameters name-value argument.

You can specify OptimizeHyperparameters as 'auto' or 'all' to find optimal hyperparameter values for both univariate and bivariate parameters. Alternatively, you can find optimal values for univariate parameters using the 'auto-univariate' or 'all-univariate' option, and then find optimal values for bivariate parameters using the 'auto-bivariate' or 'all-bivariate' option. This example uses 'auto-univariate' and 'auto-bivariate'.

Train a univariate GAM. Specify OptimizeHyperparameters as 'auto-univariate' so that fitcgam finds optimal values of the InitialLearnRateForPredictors and NumTreesPerPredictor name-value arguments. For reproducibility, use the 'expected-improvement-plus' acquisition function. Specify ShowPlots as false and Verbose as 0 to disable plot and message displays, respectively.

Mdl_univariate = fitcgam(adultdata,'salary','Weights','fnlwgt', ...
    'OptimizeHyperparameters','auto-univariate', ...
    'HyperparameterOptimizationOptions',struct('AcquisitionFunctionName','expected-improvement-plus', ...
    'ShowPlots',false,'Verbose',0))
Mdl_univariate = 
  ClassificationGAM
                       PredictorNames: {'age'  'workClass'  'education'  'education_num'  'marital_status'  'occupation'  'relationship'  'race'  'sex'  'capital_gain'  'capital_loss'  'hours_per_week'  'native_country'}
                         ResponseName: 'salary'
                CategoricalPredictors: [2 3 5 6 7 8 9 13]
                           ClassNames: [<=50K    >50K]
                       ScoreTransform: 'logit'
                            Intercept: -1.3118
                      NumObservations: 500
    HyperparameterOptimizationResults: [1×1 BayesianOptimization]


  Properties, Methods

fitcgam returns a ClassificationGAM model object that uses the best estimated feasible point. The best estimated feasible point indicates the set of hyperparameters that minimizes the upper confidence bound of the objective function value based on the underlying objective function model of the Bayesian optimization process. You can obtain the best point from the HyperparameterOptimizationResults property or by using the bestPoint function.

x = Mdl_univariate.HyperparameterOptimizationResults.XAtMinEstimatedObjective
x=1×2 table
    InitialLearnRateForPredictors    NumTreesPerPredictor
    _____________________________    ____________________

               0.02257                       118         

bestPoint(Mdl_univariate.HyperparameterOptimizationResults)
ans=1×2 table
    InitialLearnRateForPredictors    NumTreesPerPredictor
    _____________________________    ____________________

               0.02257                       118         

 

Train a bivariate GAM. Specify OptimizeHyperparameters as 'auto-bivariate' so that fitcgam finds optimal values of the InteractionsInitialLearnRateForInteractions, and NumTreesPerInteraction name-value arguments. Use the univariate parameter values in x so that the software finds optimal parameter values for interaction terms based on the x values.

Mdl = fitcgam(adultdata,'salary','Weights','fnlwgt', ...
    'InitialLearnRateForPredictors',x.InitialLearnRateForPredictors, ...
    'NumTreesPerPredictor',x.NumTreesPerPredictor, ...
    'OptimizeHyperparameters','auto-bivariate', ...
    'HyperparameterOptimizationOptions',struct('AcquisitionFunctionName','expected-improvement-plus', ...
    'ShowPlots',false,'Verbose',0))
Mdl = 
  ClassificationGAM
                       PredictorNames: {'age'  'workClass'  'education'  'education_num'  'marital_status'  'occupation'  'relationship'  'race'  'sex'  'capital_gain'  'capital_loss'  'hours_per_week'  'native_country'}
                         ResponseName: 'salary'
                CategoricalPredictors: [2 3 5 6 7 8 9 13]
                           ClassNames: [<=50K    >50K]
                       ScoreTransform: 'logit'
                            Intercept: -1.4587
                         Interactions: [6×2 double]
                      NumObservations: 500
    HyperparameterOptimizationResults: [1×1 BayesianOptimization]


  Properties, Methods

Display the optimal bivariate hyperparameters.

Mdl.HyperparameterOptimizationResults.XAtMinEstimatedObjective
ans=1×3 table
    Interactions    InitialLearnRateForInteractions    NumTreesPerInteraction
    ____________    _______________________________    ______________________

         6                     0.0061954                        422          

The model display of Mdl shows a partial list of the model properties. To view the full list of the model properties, double-click the variable name Mdl in the Workspace. The Variables editor opens for Mdl. Alternatively, you can display the properties in the Command Window by using dot notation. For example, display the ReasonForTermination property.

Mdl.ReasonForTermination
ans = struct with fields:
      PredictorTrees: 'Terminated after training the requested number of trees.'
    InteractionTrees: 'Terminated after training the requested number of trees.'

You can use the ReasonForTermination property to determine whether the trained model contains the specified number of trees for each linear term and each interaction term.

Display the interaction terms in Mdl.

Mdl.Interactions
ans = 6×2

     5    12
     1     6
     6    12
     1    12
     7     9
     2     6

Each row of Interactions represents one interaction term and contains the column indexes of the predictor variables for the interaction term. You can use the Interactions property to check the interaction terms in the model and the order in which fitcgam adds them to the model.

Display the interaction terms in Mdl using the predictor names.

Mdl.PredictorNames(Mdl.Interactions)
ans = 6×2 cell
    {'marital_status'}    {'hours_per_week'}
    {'age'           }    {'occupation'    }
    {'occupation'    }    {'hours_per_week'}
    {'age'           }    {'hours_per_week'}
    {'relationship'  }    {'sex'           }
    {'workClass'     }    {'occupation'    }

 

Matlabsolutions.com provides guaranteed satisfaction with a commitment to complete the work within time. Combined with our meticulous work ethics and extensive domain experience, We are the ideal partner for all your homework/assignment needs. We pledge to provide 24*7 support to dissolve all your academic doubts. We are composed of 300+ esteemed Matlab and other experts who have been empanelled after extensive research and quality check.

Matlabsolutions.com provides undivided attention to each Matlab assignment order with a methodical approach to solution. Our network span is not restricted to US, UK and Australia rather extends to countries like Singapore, Canada and UAE. Our Matlab assignment help services include Image Processing Assignments, Electrical Engineering Assignments, Matlab homework help, Matlab Research Paper help, Matlab Simulink help. Get your work done at the best price in industry.

Machine Learning in MATLAB

Train Classification Models in Classification Learner App

Train Regression Models in Regression Learner App

Distribution Plots

Explore the Random Number Generation UI

Design of Experiments

Machine Learning Models

Logistic regression

Logistic regression create generalized linear regression model - MATLAB fitglm 2

Support Vector Machines for Binary Classification

Support Vector Machines for Binary Classification 2

Support Vector Machines for Binary Classification 3

Support Vector Machines for Binary Classification 4

Support Vector Machines for Binary Classification 5

Assess Neural Network Classifier Performance

Naive Bayes Classification

ClassificationTree class

Discriminant Analysis Classification

Ensemble classifier

ClassificationTree class 2

Train Generalized Additive Model for Binary Classification

Train Generalized Additive Model for Binary Classification 2

Classification Using Nearest Neighbors

Classification Using Nearest Neighbors 2

Classification Using Nearest Neighbors 3

Classification Using Nearest Neighbors 4

Classification Using Nearest Neighbors 5

Linear Regression

Linear Regression 2

Linear Regression 3

Linear Regression 4

Nonlinear Regression

Nonlinear Regression 2

Visualizing Multivariate Data

Generalized Linear Models

Generalized Linear Models 2

RegressionTree class

RegressionTree class 2

Neural networks

Gaussian Process Regression Models

Gaussian Process Regression Models 2

Understanding Support Vector Machine Regression

Understanding Support Vector Machine Regression 2

RegressionEnsemble