Including or excluding variables in a strategy or decision tree depends on whether you’re talking about an automatically grown tree, or an interactively grown tree.
If you’re growing a decision tree automatically, you can’t force a variable into the tree. You can set which variables are eligible to be used in the tree, but the algorithm that performs the growth automatically is always able to exclude a variable that it doesn’t find predictive.
When you’re using the Insert Decision Tree Wizard, the 3rd page of the wizard (the Split Report page) looks like this:
That “Include” column on the right is where you want to look. In this case, the user has chosen not to include the variables Customer ID and fnlwgt, meaning that they will definitely not be in the tree. All the other variables here may be in the tree.
Even after you’ve finished with the wizard, you can always change which variables are Included or Excluded in this way, by using the Attribute Editor which can be accessed via the Tools menu while in the tree view. The attribute editor looks like this:
Notice the “Include” column second from the left. You can see that the two values that were excluded from the wizard (Customer ID and fnlwgt) are still excluded in this view. However, you can change the Inclusion of any variable right in the attribute editor by clicking on the cell and choosing “yes” or “no” from the drop down.
If, on the other hand, you’re growing a decision tree interactively, you are probably right clicking on nodes and choosing Find Split… or Force Split… to insert a new split in the tree.
Choosing Find Split… will have the algorithm automatically find the next best split for you according to whichever measure you chose when originally creating the tree (though you can always change this measure via the Tree Training tab in the preferences menu). However, if you want to include a specific split during tree growing, you can choose Force Split… to get a list like this:
From this list, you can choose whichever variable you like to include next. Once you choose a variable, you will have the option of how to bin it in the split. If you don’t want to manually choose how to bin that variable, you can select the Optimal Binning option in the wizard as seen above.
Of course, if you don’t want to use the Force Split… option, but want to specifically exclude a variable from the tree, you can always go to the attribute editor and exclude it from the tree. You can also use options like Next Split or Go to Split to pass by a specific variable during tree growing with Find Split… without actually excluding that variable from the tree altogether.
As for Strategy Trees, the wizard doesn’t have a Split Report page, but everything else in the above explanation applies, including excluding variables via the attribute editor.
- Data Fabric: Stitch Up Your Data Strategy With Visualization - September 13, 2019
- Data Fabric: Save a Stitch in Time with Predictive Analytics - September 5, 2019
- Data Fabric: Skip the Patchwork with Powerful Data Prep - August 14, 2019