AD ALTA
JOURNAL OF INTERDISCIPLINARY RESEARCH
around medoids and balancing classes with 250 observations per
each, producing a total of 500 observations. The resulting set is
then utilized over selection schemes.
Figure 2.
Conceptual depiction of machine learning pipeline, Source: author
To ensure the algorithm captures the original data structure,
projection to two-dimensional space is made with Isomap
embeddings in Fig. 4. The structure of the train set after balanced
clustering resembles the original train set, the majority class of
non-churners is largely under-sampled, which is to be expected.
We focus on two types of selection procedure – filter selection
and wrapper selection. In filter-based selection whole train set is
utilized at once, feature importance is estimated, unimportant
features are filtered out. For wrapper-based selection, stratified
4-fold cross validation with 2 repeats is utilized in RFE
procedure. RFE classification learner is subject to randomized
hyperparameter search with 5 steps; target metric for both RFE
selection and parameter search is set to
í µí°´í µí±ˆí µí°¶/í µí±…í µí±‚í µí°¶, as it is not
subjective-dependent. Number of features to be selected is the
function of each procedure, albeit univariate filter selection is set
to return at least 20 explaining variables.
Algorithm 1:
1.1 for each class in target class do:
1.2 get feature data, where target class equals class
1.3 cluster features, set number of clusters to expected number of observations per target class
1.4 get the observation which is the nearest to each cluster center
1.5 add a class label to the selected observations
1.6 return the temporary results
1.7 row-bind temporary results to the feature train set
Figure 3.
Balanced clustering for feature selection, Source: author
Model training and evaluation – Model training block digests
processed train set and annotations of feature selection and
classification method. A classifier is trained for all combinations
of feature selection method and classification method.
Experimental setup for randomized hyperparameter search is
based on stratified 4-fold cross validation with 2 repeats;
parameter search is done in 15 steps; its target metric is set to
í µí°´í µí±ˆí µí°¶/í µí±…í µí±‚í µí°¶. Final classification learners are built on top of the
processed train set, feature selection, and randomized hyper-
parameter search. Learner's performance on unseen data is
estimated on the processed test set; to address bias-variance
trade-off performance on the processed train set is also
evaluated. Applied metrics are described in detail in the previous
section.
Figure 4.
Structure of the original train set (left) and train set after balanced clustering (right), Source: author
5 Results
In order to summarize the performance of multiple approaches to
explanatory variable procedures, mean point estimates across
feature selection methods are depicted in Tab.3; the best
indicators are marked in bold; 95 % confidence intervals for
underlying distributions are constructed.
Classifiers combined with RFE selection show marginally better
performance on both train and on test sets. Consistency, EBM
schemes, and OneR display noteworthy behavior when
significantly reducing the number of original features while (1)
being almost on par with RFE procedures in all performance
measures and (2) being less computationally demanding. Other
selection procedures do not perform that well, which is induced
by a considerable drop in retained features.
Statistical significance of a difference is assessed by paired t-
tests with Bonferroni correction. Test performance of each
selection scheme is compared to test performance without
selection procedure; observations are paired on classification
learners and pipeline repeats.
í µí°»
0
states that true difference in
sample means is equal to 0,
í µí°»
í µí°´
means the true difference in
sample means is not equal to 0. We reject
í µí°»
0
for all feature
selection schemes except SVM-RFE, LR-RFE, RF-RFE on
unadjusted
í µí»¼=0.01; this holds for all performance indicators.
In other words, there is not enough evidence that SVM-RFE,
LR-RFE, RF-RFE selection schemes improve test set
- 59 -