Retain dummy variable labels from converting categorical to dummyvar

Illustration
Dhruv Ghulati - 2022-04-30T14:16:48+00:00
Question: Retain dummy variable labels from converting categorical to dummyvar

Hi there,     I have 19 categorical columns which I have converted into being a number for each category. However, I want to increase the number of columns so that I have a dummy for each category. What I find is that I have no idea where the dummy variables have gone, which I need to make an interpretable solution e.g. if a user is from Thailand or not, that variable is significant in a logistic regression.     Here is my code:     %categoricalnbs is the number converted version for all the categorical %variables. Some columns in that table have categories 1-200, some just %have categories 1 to 20. categoricalnbsarray = table2array(categoricalnbs); % categoricalnbsarray = table2array(finalnbs(:,[9:26,28])); %finalnbs keeps the actual category names, which I thought could help with %generating the column labels for the dummyvars, but using that line %doesn't help. [~, ~, ugroupA] = unique(categoricalnbsarray(:,2)); dummyvars=dummyvar(ugroupA); array2table(dummyvars); This increases the columns in categoricalnbs from 19 to 200, and retains the same number of rows. But how do I interpret the output...

Expert Answer

Profile picture of Neeta Dsouza Neeta Dsouza answered . 2025-11-20

I wrote a function that does this, here you go:

 

function Tdummy = dummytable(T)
% Tdummy = dummytable(T) - convert categorical variables in table to dummy
% variables
%
% This function takes the categorical variables in a table and converts
% them to separate dummy variables with intelligent names.  This way they
% can be used in the Classification Learner App and the variable names make
% sense for feature selection, etc.
%
% Usage:
%
%     Tdummy = dummytable(T)
%
% Inputs:
%
%     T:        Table with categoricals or categorical variable
%
% Outputs: 
%
%     Tdummy:   T with categorical variables turned into dummy variables with
%               intelligent names
%
% Example:
%
%        % Simple Table
%        T = table(rand(10,1),categorical(cellstr('rbbgbgbbgr'.')),...
%           'VariableNames',{'Percent','Color'});
%        disp(T)
% 
%        % Turn it into a dummy table 
%        Tdummy = dummytable(T);
%        disp(Tdummy)
%
% See Also: dummyvar, table, categorical, classificationLearner

% Copyright 2015 The MathWorks, Inc.
% Sean de Wolski Apr 13, 2014

      % Error checking
      narginchk(1,1)    
      validateattributes(T,{'categorical', 'table'},{},mfilename,'T',1);

      % If it's a categorical, do out best to convert it to a table with an
      % intelligent variable name
      if iscategorical(T)
          % Try to use existing variable name
          cname = inputname(1);
          if isempty(cname)
              % It's a MATLAB Expression, default to Var1
              cname = 'Var1';
          end
          T = table(T,'VariableNames',{cname});
      end 

      % Identify categoricals and their names
      cats = varfun(@iscategorical,T,'OutputFormat','uniform');

      % Short circuit if there are no categoricals
      if ~any(cats)
          Tdummy = T;
          return
      end            

      % Store everything in a cell.  w will be the total width of the table
      % with each variable dummyvar'd
      w = nnz(~cats)+sum(varfun(@(x)numel(categories(x)),T(:,cats),'OutputFormat','uniform'));

      % Preallocate storage
      datastorage = cell(1,w);
      namestorage = cell(1,w);

      % Engine
      idx = 0; % Start nowhere in cell
      for ii = 1:width(T)
          idx = idx+1;
          % Loop over table deciding what to do with each variable
          if cats(ii)
              % It's a categorical,
              % Extract it and build keep its categories and dummyvar
              Tii = T{:,ii};
              categoriesii = categories(Tii)';
              ncatii = numel(categoriesii); % How many?

              % Build dummy var as a row cell with columns in each
              dvii = num2cell(dummyvar(Tii), 1); % Dummy var then cell                                    

              % Build names
              namesii = strcat(T.Properties.VariableNames{ii}, '_', categoriesii);

              % Insert
              datastorage(idx:(idx+ncatii-1)) = dvii;
              namestorage(idx:(idx+ncatii-1)) = namesii;

              % Increment
              idx = idx+ncatii-1;                        

          else
              % Extract non categorical into current storage location
              datastorage{idx} = T{:,ii};
              namestorage(idx) = T.Properties.VariableNames(ii);
          end
      end

      % Build Tdummy with comma separated list expansion
      Tdummy = table(datastorage{:},'VariableNames',matlab.lang.makeValidName(namestorage));

end

 


Not satisfied with the answer ?? ASK NOW

Get a Free Consultation or a Sample Assignment Review!