Patrick Schlegel asked . 2022-04-04

Feature Selection in TreeBagger

Hello Matlabsolutions community
 
I'm currently working with the TreeBagger class to generate some classification tree esembles. Now I would like to know, how it decides wich features are used for splitting the data. If I create for example an esemble of tree stumps with 5000 trees and use it to classify a dataset with two features (e.g. VRQL-Value and maximum frequency), and then check which feature was selected for splitting for every single tree like this:
 
 
cellArray={};
for y=1:length(Random_Forest_Model.Trees)
cellArray{y}=Random_Forest_Model.Trees{y}.CutPredictor{1};
end
It happens in some cases, that only one feature was selected for all 5000 trees and the other feature was selected in not a single case (i.e. cellArray looks like this: {'x2', 'x2', 'x2', ..., 'x2', }). This can also happen with multiple features: only one feature is selected, the others are ignored.
 
Maybe important things to mention about the dataset:
 
-One feature achieves Values from 1 to 100, the other one from about 200 to 1200
-The classes are imbalanced (class 1: 52 entries, class 2: over 300 entries)
-only the greater class contains the NaNs
-both features contain NaNs
 
My question now is: how can I achieve, that the TreeBagger uses all features for classification and not only one or how can I in genreal achieve a more balanced selection of features.

classification , treebagger , splits

Expert Answer

John Williams answered . 2024-05-17 21:17:30

Why this number specifically? I don't know...
 
But why is it important to take a subset of the features and not the whole set of features? It's because if you always take the same features (say the whole set of features) you will get highly correlated decision trees in every iteration, and thereby will not be able to cancel out their inherint great varience.
 
I beleive that the features are sampled in a uniform fashion, which means that if you have many trees, approximately all features should be represented equally over all of the trees.
 
However, in your case the subset of the features has the same size of the original feature set ( ceil(sqrt(2)) = 2 ). Once the set of features is selected, a certain criterion is used to select which feature should the split be based on. The criteria can be the Gini index, or information gain (entropy).
 
So my guess is that since you're always ending up with the whole set of features, and everytime the same criterion is used to choose which feature to go with, you're always ending up with the same feature, and the other one is excluded.


Not satisfied with the answer ?? ASK NOW

Frequently Asked Questions

MATLAB offers tools for real-time AI applications, including Simulink for modeling and simulation. It can be used for developing algorithms and control systems for autonomous vehicles, robots, and other real-time AI systems.

MATLAB Online™ provides access to MATLAB® from your web browser. With MATLAB Online, your files are stored on MATLAB Drive™ and are available wherever you go. MATLAB Drive Connector synchronizes your files between your computers and MATLAB Online, providing offline access and eliminating the need to manually upload or download files. You can also run your files from the convenience of your smartphone or tablet by connecting to MathWorks® Cloud through the MATLAB Mobile™ app.

Yes, MATLAB provides tools and frameworks for deep learning, including the Deep Learning Toolbox. You can use MATLAB for tasks like building and training neural networks, image classification, and natural language processing.

MATLAB and Python are both popular choices for AI development. MATLAB is known for its ease of use in mathematical computations and its extensive toolbox for AI and machine learning. Python, on the other hand, has a vast ecosystem of libraries like TensorFlow and PyTorch. The choice depends on your preferences and project requirements.

You can find support, discussion forums, and a community of MATLAB users on the MATLAB website, Matlansolutions forums, and other AI-related online communities. Remember that MATLAB's capabilities in AI and machine learning continue to evolve, so staying updated with the latest features and resources is essential for effective AI development using MATLAB.

Without any hesitation the answer to this question is NO. The service we offer is 100% legal, legitimate and won't make you a cheater. Read and discover exactly what an essay writing service is and how when used correctly, is a valuable teaching aid and no more akin to cheating than a tutor's 'model essay' or the many published essay guides available from your local book shop. You should use the work as a reference and should not hand over the exact copy of it.

Matlabsolutions.com provides guaranteed satisfaction with a commitment to complete the work within time. Combined with our meticulous work ethics and extensive domain experience, We are the ideal partner for all your homework/assignment needs. We pledge to provide 24*7 support to dissolve all your academic doubts. We are composed of 300+ esteemed Matlab and other experts who have been empanelled after extensive research and quality check.

Matlabsolutions.com provides undivided attention to each Matlab assignment order with a methodical approach to solution. Our network span is not restricted to US, UK and Australia rather extends to countries like Singapore, Canada and UAE. Our Matlab assignment help services include Image Processing Assignments, Electrical Engineering Assignments, Matlab homework help, Matlab Research Paper help, Matlab Simulink help. Get your work done at the best price in industry.