AD ALTA
JOURNAL OF INTERDISCIPLINARY RESEARCH
that the individual Watson-Glaser tests are extensive, and it was
not in our competence to change their scope and wording.
Our critical thinking tool consisted of 12 questions and the time
limit for solving the test was 12 minutes. When evaluating the
critical thinking tests, we took into account the factual
correctness of the answer, while we also accepted answers with
grammatical or spelling errors. We did not evaluate the
“eloquence” of the answers, but the ability to encompass the
essence to solve the problem. Pupils’ free answers in open
questions may have some influence on the evaluation of the test,
so the evaluation was carried out by two persons who followed
the guidelines for the evaluation of the critical thinking test.
In a pedagogical experiment, a dependent variable represented
the level of pupils’ critical thinking. An independent variable
was the activities within the Philosophy for Children program
(discussion, questioning, dramatization, role-playing, etc.). By
performing the experiment in the control and experimental group
(EG and CG), we obtained certain score in the tests we
evaluated.
To evaluate the data, we used descriptive statistics, analysis of
the paired t-Test for average value and analysis of the difference
score (difference between the post-test and pre-test), which
focuses on the change between the pre-test and post-test of
individual groups. The obtained results were processed and
analyzed by the computer program Excel for descriptive
statistics methods.
5 Results and Discussion
We assumed that pupils to whom the program Philosophy for
Children will be applied experimentally in teaching will achieve
a better level of critical thinking than pupils to whom the
program will not be applied. We were also interested in the
extent, to which we could develop critical thinking by
implementing a model class through the P4C program over a
period of two to three months.
Table 1 presents descriptive statistics of the pre-test and post-test
results of the groups (mean, minimum, maximum, standard
deviation, mean error, median) – the scores for the control and
experimental groups.
Table 1: Descriptive Statistics on Pre-test and Post-test Scores
for Experimental and Control Group
M
N
SD
SEM
Min
Max
Median
pretest_
EG
8.925
40
3.253
0.514
3
15
9
posttest_
EG
12.325
3.765
0.595
5
19
13
pretest_
CG
8.548
42
2.461
0.380
4
13
9
posttest_
CG
8.857
2.374
0.366
4
13
9.5
(N – Count, M – Mean, SEM – Standard Error, SD – Standard Deviation,
MIN - Minimum, MAX - Maximum, MEDIAN – Median)
Figures 1 and 2 show box plots which, in addition to a graphical
representation of the scores of the experimental and control
groups in the pre-test and post-test, also contain descriptive
statistics data (unrounded average, minimum, maximum, and
median).
Figure 1: Box Plot: Pre-test Scores for Experimental and
Control Group
Figure 2: Box Plot: Post-test Scores for Experimental and
Control Group
Table 2: Paired t-Test for Average Value
M
df
t
P
pretest_EG
8.925
38
-9.522
< 0.001
posttest_EG
12.325
pretest_CG
8.548
40
-1.394
< 0.086
posttest_CG
8.857
(df – Degrees of Freedom, t-Test Statistics, P – P-value)
The results of the t-test show that the differences in group scores
between the pre-test and post-test are significant at a level of
statistical significance <0.05.
5.1 Difference Score Analysis for Control and Experimental
Group
The difference score was obtained as the difference between the
score achieved in the post-test and the score achieved in the pre-
test. Table 3 shows descriptive statistics of the difference scores
of the experimental and control groups.
Table 3: Descriptive Statistics on Difference Score for Control
and Experimental Group
M
N
SEM
SD
MIN
MAX
Median
EG
3.4
41
2.193
0.347
-2
9
4
CG
0.333
42
1.476
0.228
-3
6
0
The average difference score in the EG is 3.4 (standard deviation
0.347), which means that EG pupils achieved a better score in
the post-test than in the pre-test. The average CG difference
score is 0.333 (0.228), which means that CG pupils also
- 184 -