Page 60

AD ALTA

JOURNAL OF INTERDISCIPLINARY RESEARCH

performance; on the other hand, previously mentioned
procedures allow us to reduce explanatory variables by ~ 40 %
while retaining the same level of classification performance as

with original dataset. Other feature selection methods appear to
lead to inferior results.

Table 2.

Classification methods with respective parameters

Classification method

Optimized parameters

Implementation

LOGIT

regularization forms: {L1, L2 dual, L2 primal}, cost

Fan et al., 2008

CIT

max tree depth, p-value threshold

Hothorn, Zeileis, 2015

number of selected predictors, splitting rule, minimal node size

Wright, Ziegler, 2017

SVM

kernel: {RBF}, cost, sigma

Karatzoglou et al., 2004

Source: author

To inspect explanatory variable importance in original feature
space (Tab. 1), co-occurrence matrix of selection scheme-feature
is constructed; the number of feature occurrence for both
individual and interaction terms are included. Moreover, the co-
occurrence matrix is scaled by the maximum possible incidence

of a feature (scheme-feature pair for the procedure without
feature selection). The result of the outlined steps is depicted
using heatmap and dendrograms in Fig. 5; the explanatory
variable state is not present as it is eliminated in data
preprocessing step due to near-zero variance.

Table 3.

Classification performance indicators aggregated by the feature selection method

feature

selection

method

number of

features

feature

selection

runtime [s]

Train

ACC

(95 % CI)

Train

AUC

(95 % CI)

Train

TDL

(95 % CI)

test

ACC

(95 % CI)

test

AUC

(95 % CI)

test

TDL

(95 % CI)

CFS

10.3

6.1

0.923

(0.852,

0.994)

0.911

(0.801,

1.022)

5.678

(3.441,

7.914)

0.905

(0.860,

0.950)

0.874

(0.813,

0.935)

5.171

(3.462,

6.880)

Consistency

18.2

139.6

0.939

(0.870,

1.007)

0.929

(0.836,

1.021)

6.165

(4.432,

7.898)

0.919

(0.877,

0.960)

0.890

(0.848,

0.931)

5.696

(4.381,

7.011)

24.8

0.1

0.920

(0.860,

0.979)

0.907

(0.797,

1.017)

5.632

(3.822,

7.442)

0.904

(0.867,

0.941)

0.868

(0.786,

0.950)

5.123

(3.809,

6.437)

Relief

25.4

179.9

0.915

(0.850,

0.980)

0.896

(0.759,

1.032)

5.444

(3.309,

7.579)

0.898

(0.860,

0.935)

0.852

(0.765,

0.940)

4.868

(3.430,

6.305)

IGR

44.4

0.6

0.937

(0.872,

1.002)

0.927

(0.831,

1.022)

6.124

(4.429,

7.819)

0.918

(0.878,

0.958)

0.889

(0.836,

0.941)

5.652

(4.281,

7.024)

45.0

0.7

0.939

(0.876,

1.003)

0.930

(0.844,

1.016)

6.196

(4.603,

7.789)

0.920

(0.883,

0.957)

0.892

(0.855,

0.929)

5.711

(4.463,

6.959)

47.5

0.5

0.940

(0.875,

1.005)

0.929

(0.839,

1.020)

6.196

(4.642,

7.750)

0.920

(0.883,

0.958)

0.892

(0.845,

0.939)

5.751

(4.517,

6.984)

OneR

51.4

0.5

0.943

(0.881,

1.005)

0.933

(0.849,

1.016)

6.286

(4.883,

7.688)

0.923

(0.889,

0.958)

0.895

(0.862,

0.928)

5.856

(4.777,

6.935)

SVM-RFE

87.9

2190.3

0.952

(0.884,

1.020)

0.940

(0.860,

1.020)

6.465

(4.965,

7.965)

0.932

(0.883,

0.980)

0.900

(0.865,

0.935)

6.108

(4.692,

7.524)

LR-RFE

91.4

2190.4

0.951

(0.879,

1.023)

0.940

(0.858,

1.022)

6.437

(4.854,

8.019)

0.931

(0.880,

0.982)

0.899

(0.859,

0.939)

6.088

(4.614,

7.562)

RF-RFE

96.8

2190.2

0.952

(0.881,

1.022)

0.940

(0.859,

1.020)

6.454

(4.916,

7.991)

0.932

(0.881,

0.983)

0.900

(0.860,

0.939)

6.114

(4.626,

7.603)

none

158.0

0.0

0.950

(0.882,

1.018)

0.940

(0.863,

1.016)

6.441

(5.030,

7.853)

0.931

(0.880,

0.982)

0.899

(0.861,

0.936)

6.075

(4.562,

7.588)

Source: author

There are two evident analytic perspectives arising from co-
occurrence matrix, (1)

feature importance across different

selection procedures and (2) underlying similarity amongst
results of feature selection schemes.

Considering the former perspective (1), three diverse groups of
impact on the target variable are identified by the row-wise
dendrogram. The bottom cluster consists of just one element –
international_plan, which is recognized to be very important by
all selection schemes; the middle cluster contains three elements
– total_day_charge, number_customer_service_calls,

total_day_minutes, that are also observed to be important
indicators of customer's propensity to churn; the structure of the
upper cluster is rather ambiguous, except for area_code element
which is generally omitted.

From the latter perspective (2), three distinct groups of feature
structures are identified by the column-wise dendrogram. The
left cluster contains multivariate filter selection methods and
Fischer’s score; the middle cluster consists of EBM schemes and
OneR; the right cluster is reserved for RFE procedures. The
underlying similarity amongst selection schemes appears to be
driven by both number and structure of included features; this is
supported by the internal coherence of clusters considering the

- 60 -