Page 184

AD ALTA

JOURNAL OF INTERDISCIPLINARY RESEARCH

that the individual Watson-Glaser tests are extensive, and it was
not in our competence to change their scope and wording.
Our critical thinking tool consisted of 12 questions and the time
limit for solving the test was 12 minutes. When evaluating the
critical thinking tests, we took into account the factual
correctness of the answer, while we also accepted answers with
grammatical or spelling errors. We did not evaluate the
“eloquence” of the answers, but the ability to encompass the
essence to solve the problem. Pupils’ free answers in open
questions may have some influence on the evaluation of the test,
so the evaluation was carried out by two persons who followed
the guidelines for the evaluation of the critical thinking test.

In a pedagogical experiment, a dependent variable represented
the level of pupils’ critical thinking. An independent variable
was the activities within the Philosophy for Children program
(discussion, questioning, dramatization, role-playing, etc.). By
performing the experiment in the control and experimental group
(EG and CG), we obtained certain score in the tests we
evaluated.
To evaluate the data, we used descriptive statistics, analysis of
the paired t-Test for average value and analysis of the difference
score (difference between the post-test and pre-test), which
focuses on the change between the pre-test and post-test of
individual groups. The obtained results were processed and
analyzed by the computer program Excel for descriptive
statistics methods.

5 Results and Discussion

We assumed that pupils to whom the program Philosophy for
Children will be applied experimentally in teaching will achieve
a better level of critical thinking than pupils to whom the
program will not be applied. We were also interested in the
extent, to which we could develop critical thinking by
implementing a model class through the P4C program over a
period of two to three months.

Table 1 presents descriptive statistics of the pre-test and post-test
results of the groups (mean, minimum, maximum, standard
deviation, mean error, median) – the scores for the control and
experimental groups.

Table 1: Descriptive Statistics on Pre-test and Post-test Scores

for Experimental and Control Group

SEM

Min

Max

Median

pretest_

8.925

3.253

0.514

posttest_

12.325

3.765

0.595

pretest_

8.548

2.461

0.380

posttest_

8.857

2.374

0.366

9.5

(N – Count, M – Mean, SEM – Standard Error, SD – Standard Deviation,

MIN - Minimum, MAX - Maximum, MEDIAN – Median)

Figures 1 and 2 show box plots which, in addition to a graphical
representation of the scores of the experimental and control
groups in the pre-test and post-test, also contain descriptive
statistics data (unrounded average, minimum, maximum, and
median).

Figure 1: Box Plot: Pre-test Scores for Experimental and

Control Group

Figure 2: Box Plot: Post-test Scores for Experimental and

Control Group

Table 2: Paired t-Test for Average Value

pretest_EG

8.925

-9.522

< 0.001

posttest_EG

12.325

pretest_CG

8.548

-1.394

< 0.086

posttest_CG

8.857

(df – Degrees of Freedom, t-Test Statistics, P – P-value)

The results of the t-test show that the differences in group scores
between the pre-test and post-test are significant at a level of
statistical significance <0.05.

5.1 Difference Score Analysis for Control and Experimental
Group

The difference score was obtained as the difference between the
score achieved in the post-test and the score achieved in the pre-
test. Table 3 shows descriptive statistics of the difference scores
of the experimental and control groups.

Table 3: Descriptive Statistics on Difference Score for Control

and Experimental Group

SEM

MIN

MAX

Median

3.4

2.193

0.347

-2

0.333

1.476

0.228

-3

The average difference score in the EG is 3.4 (standard deviation
0.347), which means that EG pupils achieved a better score in
the post-test than in the pre-test. The average CG difference
score is 0.333 (0.228), which means that CG pupils also

- 184 -