Method for readjusting the search space for tree-based algorithms to ensure that maxBins search space does not initiate a model run where maxBins value is below the cardinality value of nominal fields in the data set.
Method for readjusting the search space for tree-based algorithms to ensure that maxBins search space does not initiate a model run where maxBins value is below the cardinality value of nominal fields in the data set. Having a cardinality of a field that is higher than maxBins will prevent calculation of InformationGain / gini for tree split calculations, since it won't be able to adequately perform the summarization of values for the entropy calculation. Resetting the search space based on the data presented for modeling will eliminate the possibility of attempting to search an invalid space.
DataFrame prepared for modeling
fields to ignore from cardinality checks
label field (not needed for cardinality check)
feature field (not needed for cardinality check)
An updated NumericMapping for the model's search space (where maxBins is located for the tree based algorithms)
0.6.2