Statistical Implicative Analysis - Theory and Applications PDF
Statistical Implicative Analysis - Theory and Applications PDF
Statistical
Implicative
Analysis
Theory and Applications
13
Régis Gras, Einoshin Suzuki, Fabrice Guillet and Filippo Spagnolo (Eds.)
Statistical Implicative Analysis
Studies in Computational Intelligence, Volume 127
Editor-in-chief
Prof. Janusz Kacprzyk
Systems Research Institute
Polish Academy of Sciences
ul. Newelska 6
01-447 Warsaw
Poland
E-mail: [email protected]
Further volumes of this series can be found on our Vol. 117. Da Ruan, Frank Hardeman and Klaas van der Meer
homepage: springer.com (Eds.)
Intelligent Decision and Policy Making Support Systems,
Vol. 107. Margarita Sordo, Sachin Vaidya and Lakhmi C. Jain 2008
(Eds.) ISBN 978-3-540-78306-0
Advanced Computational Intelligence Paradigms Vol. 118. Tsau Young Lin, Ying Xie, Anita Wasilewska
in Healthcare - 3, 2008 and Churn-Jung Liau (Eds.)
ISBN 978-3-540-77661-1 Data Mining: Foundations and Practice, 2008
Vol. 108. Vito Trianni ISBN 978-3-540-78487-6
Evolutionary Swarm Robotics, 2008 Vol. 119. Slawomir Wiak, Andrzej Krawczyk and Ivo Dolezel
ISBN 978-3-540-77611-6 (Eds.)
Vol. 109. Panagiotis Chountas, Ilias Petrounias and Janusz Intelligent Computer Techniques in Applied
Kacprzyk (Eds.) Electromagnetics, 2008
Intelligent Techniques and Tools for Novel System ISBN 978-3-540-78489-0
Architectures, 2008 Vol. 120. George A. Tsihrintzis and Lakhmi C. Jain (Eds.)
ISBN 978-3-540-77621-5 Multimedia Interactive Services in Intelligent Environments,
2008
Vol. 110. Makoto Yokoo, Takayuki Ito, Minjie Zhang,
ISBN 978-3-540-78491-3
Juhnyoung Lee and Tokuro Matsuo (Eds.)
Electronic Commerce, 2008 Vol. 121. Nadia Nedjah, Leandro dos Santos Coelho
ISBN 978-3-540-77808-0 and Luiza de Macedo Mourelle (Eds.)
Quantum Inspired Intelligent Systems, 2008
Vol. 111. David Elmakias (Ed.)
ISBN 978-3-540-78531-6
New Computational Methods in Power System Reliability,
2008 Vol. 122. Tomasz G. Smolinski, Mariofanna G. Milanova
ISBN 978-3-540-77810-3 and Aboul-Ella Hassanien (Eds.)
Applications of Computational Intelligence in Biology, 2008
Vol. 112. Edgar N. Sanchez, Alma Y. Alanı́s and Alexander ISBN 978-3-540-78533-0
G. Loukianov
Discrete-Time High Order Neural Control: Trained with Vol. 123. Shuichi Iwata, Yukio Ohsawa, Shusaku Tsumoto,
Kalman Filtering, 2008 Ning Zhong, Yong Shi and Lorenzo Magnani (Eds.)
ISBN 978-3-540-78288-9 Communications and Discoveries from Multidisciplinary
Data, 2008
Vol. 113. Gemma Bel-Enguix, M. Dolores Jiménez-López ISBN 978-3-540-78732-7
and Carlos Martı́n-Vide (Eds.)
New Developments in Formal Languages and Applications, Vol. 124. Ricardo Zavala Yoe
2008 Modelling and Control of Dynamical Systems: Numerical
ISBN 978-3-540-78290-2 Implementation in a Behavioral Framework, 2008
ISBN 978-3-540-78734-1
Vol. 114. Christian Blum, Maria José Blesa Aguilera, Andrea
Roli and Michael Sampels (Eds.) Vol. 125. Larry Bull, Ester Bernadó-Mansilla
Hybrid Metaheuristics, 2008 and John Holmes (Eds.)
ISBN 978-3-540-78294-0 Learning Classifier Systems in Data Mining, 2008
ISBN 978-3-540-78978-9
Vol. 115. John Fulcher and Lakhmi C. Jain (Eds.)
Computational Intelligence: A Compendium, 2008 Vol. 126. Oleg Okun and Giorgio Valentini (Eds.)
ISBN 978-3-540-78292-6 Supervised and Unsupervised Ensemble Methods and their
Applications, 2008
Vol. 116. Ying Liu, Aixin Sun, Han Tong Loh, Wen Feng Lu ISBN 978-3-540-78980-2
and Ee-Peng Lim (Eds.)
Advances of Computational Intelligence in Industrial Vol. 127. Régis Gras, Einoshin Suzuki, Fabrice Guillet
Systems, 2008 and Filippo Spagnolo (Eds.)
ISBN 978-3-540-78296-4 Statistical Implicative Analysis, 2008
ISBN 978-3-540-78982-6
Régis Gras
Einoshin Suzuki
Fabrice Guillet
Filippo Spagnolo
(Eds.)
123
Régis Gras Einoshin Suzuki
LINA, FRE 2729 CNRS Department of Informatics
14 avenue de la Chaise Kyushu University
35170 Bruz, France 744 Motooka, Nishi, Fukuoka
[email protected] 819-0395, Japan
[email protected]
This work is subject to copyright. All rights are reserved, whether the whole or part of the material
is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broad-
casting, reproduction on microfilm or in any other way, and storage in data banks. Duplication of
this publication or parts thereof is permitted only under the provisions of the German Copyright Law
of September 9, 1965, in its current version, and permission for use must always be obtained from
Springer-Verlag. Violations are liable to prosecution under the German Copyright Law.
The use of general descriptive names, registered names, trademarks, etc. in this publication does not
imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
Cover design: Deblik, Berlin, Germany
springer.com
Preface
Review Committee
All published chapters have been reviewed by at least 2 referees.
• Saddo Ag Almouloud (University of Sao Paulo, Brazil)
• Carmen Batanero (University of Grenada)
• Hans Bock (Aachen University, Germany)
• Henri Briand (LINA, University of Nantes, France)
• Guy Brousseau (University of Bordeaux 3, France)
• Alex Freitas (University of Kent, UK)
• Athanasios Gagatsis (University of Chyprius)
• Robin Gras (University of Windsor, Canada)
• Howard Hamilton (University of Regina, Canada)
• Jiawei Han (University of Illinois, USA)
• David J. Hand (Imperial College, London, UK)
• André Hardy (University of Namur, Belgium)
• Robert Hilderman (University of Regina, Canada)
• Yves Kodratoff (LRI, University of Paris-Sud, France)
• Pascale Kuntz (LINA, University of Nantes, France)
• Ludovic Lebart (ENST, Paris, France)
• Amédéo Napoli (LORIA, University of Nancy, France)
• Maria-Gabriella Ottaviani (University of Roma, Italy)
• Balaji Padmanabhan (University of Pennsylvania, USA)
• Jean-Paul Rasson (University of Namur, Belgium)
• Jean-Claude Régnier (University of Lyon 2, France)
• Gilbert Ritschard (Geneve University, Switzerland)
• Lorenza Saitta (University of Piemont, Italy)
• Gilbert Saporta (CNAM, Paris, France)
• Dan Simovici (University of Massachusetts Boston, USA)
• Djamel Zighed (ERIC, University of Lyon 2, France)
Associated Reviewers
Nadja Maria Carmen Diaz, Philippe Lenca,
Acioly-Régnier, Pablo Gregori, Elsa Malisani,
Angela Alibrandi, Alain Kuzniak, Rajesh Natajaran,
Jérôme Azé, Eduardo Lacasta, Pilar Orús,
Maurice Bernadet, Dominique Gérard Ramstein,
Julien Blanchard, Lahanier-Reuter, Ansaf Salleb,
Catherine-Marie Chiocca, Stéphane Lallich, Aldo Scimone,
Raphaël Couturier, Letitzia La Tona, Benoît Vaillant,
Stéphane Daviet, Patrick Leconte, Ingrid Verscheure
Jérôme David, Rémi Lehn,
Manuscript coordinator
Bruno Pinaud (LINA, University of Nantes, France)
VIII Preface
Acknowledgments
The editors would like to thank the chapter authors for their insights and
contributions to this book.
The editors would also like to acknowledge the members of the review com-
mittee and the associated referees for their involvement in the review process
of the book, and without whose support the book would not have been satis-
factorily completed.
Thanks also to J. Blanchard who has managed the cyberchair web site.
Introduction
Régis Gras, Einoshin Suzuki, Fabrice Guillet, Filippo Spagnolo . . . . . . . . . 1
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 509
List of Contributors
In the framework of data mining, which has been recognized as one of the ten
emergent technologies for computer sciences, association rule discovery aims
at mining potentially useful implicative patterns from data. Initially stimu-
lated by researches in didactics of mathematics, the Statistical Implicative
Analysis (SIA) offers an original statistical approach based on Implication
Intensity measure, which is dedicated to rule extraction and analysis. Im-
plication Intensity, the first method in SIA in its initial form, evaluates the
interestingness of a rule x → y by the rarity of its number of counter-examples
(xy), according to its probability distribution under an independence hypothe-
sis of x and y. This interestingness measure has been involving a large number
of research works and applications due to its theoretical ethics and practical
merits. Through a graphical interface, CHIC (Cohesive Hierarchical Implica-
tive Classification) software allows easy use of various techniques in SIA for a
wide range of users from experts in data analysis to practitioners with little
background in computer science.
This book includes two complementary topics: on the one hand, theoretical
works related to SIA, or linking SIA with other data analysis methods; and
on the other hand, applied works illustrating the use of SIA in applicative
domains such as: psychology, social sciences, bioinformatics and didactics.
It should be of interest to developers of data mining systems as well as
researchers, students and practitioners devoted to data mining and statistical
data analysis.
R. Gras et al.: Introduction, Studies in Computational Intelligence (SCI) 127, 1–7 (2008)
www.springerlink.com © Springer-Verlag Berlin Heidelberg 2008
2 R. Gras, E. Suzuki, F. Guillet and F. Spagnolo
The book is structured in four parts. The first one gathers three general chap-
ters defining the methodology and the concepts for the Statistical Implicative
Analysis approach. The second part contains six chapters dealing with the use
of SIA as a decision aid tool for the analysis of concept learning in the frame-
work of education, teaching and didactics. In the third part, seven chapters
illustrate the use of SIA as a methodological answer in various application
fields. Lastly, the fourth part includes six chapters describing the extension of
SIA and its application to capture rule interestingness in data mining.
variables.
associated fuzzy rules which depend on the chosen fuzzy operators. The
best fuzzy operators are selected by applying the generalized modus po-
nens on the items of several databases and by comparing its results to
the effective conclusions. By studying methods to aggregate fuzzy rules,
this chapter shows that in order to keep classical reduction schemes,
fuzzy operators must be chosen differently. However, one of these possible
operator sets is also one of the best for processing the generalized modus
ponens.
Part I
1 Introduction
Two important components are involved in the operational human processes of
knowledge acquisition: facts and rules between facts or between rules them-
selves. Through one’s own culture and one’s own personal experience, the
learning process integrates a progressive elaboration of these knowledge forms.
It can be faced with regressions, questions or changes which arise from decisive
quashing, but the knowledge forms contribute to maintain a certain equilib-
rium. The rules formed inductively become quite stable when their success
number -which depends on their explicative or inferential quality- reaches a
certain level of confidence. At first, it is often difficult to replace an initial rule
by another when few counter-examples appear. If they increase, the confidence
in the rule can decrease and the rule can be reajusted or even rejected. How-
ever, when confirmations are numerous and counter-examples are rare, the
rule is robust and can stay in our minds. For instance, let us consider the
acceptable rule “All Ferraris are red”. Even if one or two counter-examples
happen this rule is maintained, and it will be even confirmed again by new
examples.
R. Gras and P. Kuntz: An overview of the Statistical Implicative Analysis (SIA) development,
Studies in Computational Intelligence (SCI) 127, 11–40 (2008)
www.springerlink.com © Springer-Verlag Berlin Heidelberg 2008
12 R. Gras, P. Kuntz
“If a question is more complex than another, then each pupil who succeeds
in the first one should also succeed in the second one”. Every teacher knows
that this situation shows exceptions whatever the complexity degree between
questions. The evaluation and the structuration of such implicative relation-
ships between didactic situations are the generic problems at the origin of the
development of the Statistical Implicative Analysis (SIA) [11]. These prob-
lems, which have also drawn attention from psychologists interested in ability
tests [5, 27], have known a significant renewed interest in the last decade in
data mining.
Indeed, quasi-implications, also called association rules in this field, have
become the major concept in data mining to represent implicative trends be-
tween itemset patterns. In data mining, the paradigmatic framework is the
so-called basket analysis where a quasi-implication Ti → Tj means that if a
transaction contains a set of items Ti then it is likely to contain a set of items
Tj too. For simplicity’s sake, let us now on call “rule” a quasi-implication.
In data mining, rules are computed on large size databases. From the sem-
inal work of Agrawal et al. [1], numerous algorithms have been proposed to
mine such rules. Most of them attempt to extract a restricted set of relevant
rules, easy to interpret for decision-making. Yet, comparative experiments
have shown that results may vary with the choice of rule quality measures
(e.g. [13, 25]). In the rich literature devoted to this problem, interestingness
measures are often classified into two categories: the subjective (user-driven)
ones and the objective (data-driven) ones. Subjective measures aim at taking
into account unexpectedness and actionability relatively to prior knowledge,
while objective measures give priority to statistical criteria. Among the latter,
the most commonly used criterion for quantifiying the quality of a rule a → b
is the combination of the support (the frequency f (a ∧ b)) which indicates
whether the items a and b occur reasonably often in the database, with the
confidence (the conditional frequency). However, it is well-known that the
confidence presents a major default: it is insensitive to the dilatation of f (a),
f (b) and the database size. Other functions measure a link or an absence of
An overview of the Statistical Implicative Analysis (SIA) development 13
link between the items but, like χ2 , they do not clearly specify the direction
of the relationship. Moreover, in addition to rule filtering, rule structuring is
necessary to highlight relationships and makes rule interpretation both easier
and more accurate.
The SIA provides a complete framework to evaluate the interestingness of
the rules and to structure them in order to discover relationships at differ-
ent granularity levels. The underlying objective is to highlight the emerging
properties of the whole system which can not be deduced from a simple de-
composition into sub-parts (e.g. [30]). All these properties, which emerge from
complex interactions -probably non linear-, contribute to the interpretation
of the global nature of the system.
2.3 Definitions
card X ∩ Y − nannb
Q a, b = q
na nb
n
We denote by q a, b the observed value of Q a, b in the experimental
realization. It is defined by
na nb
n −
q a, b = a∧b
q n
na nb
n
16 R. Gras, P. Kuntz
This value measures a deviation between the contingency and the expected
value when a and b are independent.
the approximation is justified (e.g. λ > 4) the random variable
When
Q a, b is approximatively N (0, 1)-distributed.
∂c 1
=−
∂na∧b na
Consequently, as expected, the confidence increases when na∧b decreases.
However, the variation of the decreasing speed is constant whatever n and nb .
This situation highlights the limits of the parameter role in the sensitivity of
the measure.
Formalization
Let us denote by a (i) and b (i) the values of i ∈ E for the modal variables a
and b, and by sa and sb their empirical standard-deviations.
Definition 4. [24]. For a pair (a, b) of modal variables, the implication in-
tensity, called the propension index, is defined by
P n n
a (i) b (i) − an b
qp a, b = qi∈E
(n2 s2a +n2a )(n2 s2 +n2 )
b b
n3
Proposition 3. When a and b are binary variables then qp a, b = q a, b .
In this case, it is easy to prove that n2 s2a + n2a = nna , n2 sb2 + n2b = nnb
P
and i∈E a (i) b (i) = na∧b .
This extension remains valid for the frequential variables and the positive
numerical variables when they are normalized: e a (i) = a (i) / maxi∈E a (i).
A similar measure has been recently introduced by Régnier and Gras [29]
for ranking variables associated with a total order on a set of choices presented
to a judge population. In this case, the considered implication is “if an object
i is ordered by the judges at a place pi then an object j is ordered by the
same judges at a place pj > pi ”.
An overview of the Statistical Implicative Analysis (SIA) development 19
Let us consider a score distribution of a class for different subjects. The consid-
ered implication is “the sub-interval [2; 5.5] in mathematics generally implies
20 R. Gras, P. Kuntz
The previous approach can be adapted to the interval variables, which are
symbolic data. Let us consider two variables a and b which are associated
with a series of intervals due to the measure imprecision: Iia (resp. Iib ) is the
interval of a (resp. b) for the individual i ∈ E. Let I a (resp. I b ) be the interval
which contains all the a (resp. b) values. We can define on I a and I b a partition
which optimizes a given criterium. The intersections between Iia and I a and
between Iib and I b follow a distribution that takes into account the common
parts. Consequently, the problem is similar to the computation of the rules
between the on-interval variables (we refer to [16] for details).
Pertinent results have been obtained with the implicative intensity ϕ for var-
ious applications where the data corpuses are relatively small (n < 300).
However, in data mining, numerical experiments have highlighted two limits
of ϕ for large datasets. First, it tends to be not discriminant enough when
the size of E dramatically increases (e.g. [8]); its values are close to 1 even
though the inclusion A ⊂ B is far from being perfect. Second, like numerous
measures proposed in the literature, it does not take into account the con-
trapositive b ⇒ a which could allow to reinforce the affirmation of the good
quality of the implicative relationship between a and b, and the capacity to
estimate the causality between the variables.
number of counter-examples na∧b is high (resp. small) for the rule and its
contrapositive considering the observed numbers na and nb .
A well-known index for taking the imbalances into account non-linearly is
the Shannon’s conditional entropy. The conditional entropy Hb/a of cases (a
and b) and (a and b) given a is defined by
na∧b na∧b n n
Hb/a = − log2 − a∧b log2 a∧b
na na na na
and, similarly, the conditional entropy Hb/a of cases (a and b) and (a and
b) given b is defined by
na∧b n n n
Ha/b = − log2 a∧b − a∧b log2 a∧b
nb nb nb nb
We can here consider that these entropies measure the average uncertainty
on the random experiments in which we check whether b (resp. a) is realized
when a (resp. b) is observed. The complements of 1 for these uncertainties
Ib/a = 1 − Hb/a and Ib/a = 1 − Hb/a can be interpreted as the average
information collected by the realization of these experiments; the higher this
information is, the stronger the guarantee of the quality of the implication
and its contrapositive will be.
Intuitively, the expected behavior of the measure φ is determined by three
phases:
1. a slow reaction to the first counter-examples (robustness to noise).
2. an acceleration of the reject in the neighborhood of the balance.
3. an increasing rejection beyond the balance -which was not guaranteed by
the basic implication intensity ϕ.
Hence, in order to have the expected significance, our model must satisfy the
following constraints:
1. Integrating both the information relative to a → b and that relative to
b → a respectively measured by Ib/a and Ia/b . A product Ib/a .Ia/b is
well-adapted to simultaneously highlight the quality of these two values.
2. Raising the conditional entropies to the power for a fixed number α > 1
in the information definitionsto reinforce
the
contrast
between the differ-
1/β
α α
ent phases described below: 1 − Hb/a . 1 − Ha/b with β = 2α to
remain of the same dimension as ϕ.
3. The need to consider that the implications have lost their inclusive mean-
ing when the number of counter-examples is greater than half of the ob-
servation
of a and
b. Beyond
these values we consider that the terms
α α
1 − Hb/a and 1 − Hb/a are equal to 0.
22 R. Gras, P. Kuntz
n
Let fa = nna (resp. fb = nb ) be the frequency of a (resp. b) on E and fa∧b
be the frequency of the counter-examples. The proposed adjustment of the
previous informations Ib/a and Ia/b can be defined by-
α
α f f f fa∧b
α
Ic b/a = 1−Hb/a = 1+ 1 − a∧b log2 1 − a∧b + a∧b log2
fa fa fa fa
h h
if fa∧b ∈ 0, f2a ; otherwise, Ic
α
a/b = 0
and
α
α α fa∧b fa∧b fa∧b fa∧b
I a/b = 1−Hb/a = 1+
c 1− log2 1 − + log2
fb fb fb fb
h h
if fa∧b ∈ 0, f2a ; otherwise, Ic
α
a/b = 0.
and, the weighted version of the implication intensity —called the entropic
implication intensity— is given by
1/2
φ (a, b) = (ϕ (a, b) .τ (a, b))
Example 3.
P
b b
a 200 400 600
a 600 2800 3400
P
800 3200 4000
P
b b
a 400 200 600
a 1000 2400 3400
P
1400 2600 4000
P
b b
a 40 20 60
a 60 280 340
P
100 300 400
For the table 1.a, the implicative intensity is ϕ (a, b) = 0.9999. The entropic
functions are Ha/b = 0 = Hb/a . The weighting coefficient is τ (a, b) = 0. And,
An overview of the Statistical Implicative Analysis (SIA) development 23
are partially ordered by the relation “thiner than” which reflects the simi-
larity between the class elements. To complete the information provided by
the previous models, we have proposed the concept of R-rules (rules of rules)
which are an extension of the quasi-implications: their premisses and their
conclusions can be rules themselves [15, 17, 20, 23]. To guide the intuition a
parallel can be drawn from the proof theory with the logical implication:
(X ⇒ Y ) ⇒ (Z ⇒ W ) describes an implication between the two theorems
X ⇒ Y and Z ⇒ W previously established.
where r = k + h.
26 R. Gras, P. Kuntz
a e d b c
→
−
Fig. 1. Graphical representation of the implicative hierarchy H V = {a, b, c, d,
e, b → c, e → d, a → (e → d)}
6.2 Definitions
interlocking conditions. For instance, in the example given below, the R-rule
a → (e → d) is associated with the permutation aed. And, this is the only
possible association as the R-rule (a → e) associated with the permutation ae
→
− →
−
is not in H V . The class set HV associated with H V is
From the condition 2, a hierarchy is a partially ordered set with the in-
b defined on ΩV by: C 0 ⊂C
clusion relation ⊂ b 00 if and only if C 0 ∩
b C 00 = C 0 . The
condition 3 is required to recover all the classes of the hierarchy.
The isolated interpretation of a class of the hierarchy is tricky since it is
a k-permutation which does not state the implication composition. For in-
stance, if we analyse the class aed ∈ HV all alone, we do not know the exact
meaning of aed: it could be either a → (e → d) or (a → e) → d. However, the
28 R. Gras, P. Kuntz
for any a, b, c ∈ V .
From the Benzécri-Johnson theorem [4, 22] this property a posteriory jus-
tifies our choice of the word “hierarchy”.
Let us note that the cohesion coefficient defined in the section 5.3 can be
associated with a pre-ordering c on P = V × V − {(a, a) , (b, b) , . . .}:
Let C be the class built at the level hk of the hierarchy HV . This class results
from the amalgamation of two classes C 0 ∈ HV and C 00 ∈ Hv not amalgameted
at the previous level hk−1 .
An overview of the Statistical Implicative Analysis (SIA) development 31
The variable pair (a, b) is a generic pair at hk if ϕ (a, b) ≥ ϕ (i, j) for any
i ∈ C 0 and j ∈ C 00 . The generic intensity at hk is denoted by ϕk = ϕ (a, b).
This pair characterizes the most noticeable implicative effect for a given class.
Moreover, the classes C 0 and C 00 are themselves the results of an amalga-
mation at a lower level. Hence, at each level hg , g ≤ k, of HV , we can deter-
k
mine a generic pair: the resulting vector (ϕ1 , ϕ2 , . . . , ϕk ) ∈ [0, 1] is called the
implicative vector of the class C built at hk .
A similar representation can be used for evaluating the impact of an in-
dividual on the formation of a path on the implicative graph GM,α . Let us
consider a path P of length k on GM,α with a transitive closure (i.e. each
arc is associated with a rule with an implication intensity greater than 0.5).
Then, P contains k (k − 1) /2 transitive arcs. A pair (a, b) of P is generic if
ϕ (a, b) ≥ ϕ (i, j) for any i, j ∈ P .
k
The vectors (ϕ1 , ϕ2 , . . . , ϕk ) ∈ [0, 1] form a representation space where
the individuals can be projected. In the following, we precise the properties
of this space for an implicative hierarchy. They could be similarly defined for
an implicative graph.
k
!1/2
2
1 X (ϕg − ϕi,g )
d1 (i, C) =
k g=1 1 − ϕg
k
!1/2
2
1 X (ϕi,g − ϕj,g )
dC (i, j) =
k g=1 1 − ϕg
The distance dC (i, j) measures the behavior difference between i and j consid-
ering C. It defines a discrete topological C-structure
− on E. Let us consider the
→ − →
vectors (ϕi,1 , ϕi,2 , . . . , ϕi,k ) and the norm
i − j
= dC (i, j). This topology
d1 (i, C)
γ (i, C) = 1 −
Maxj∈E d1 (j, C)
Definition 14. The most typical group of the class C is the subset Ei ⊂ E
which minimizes the probability pi on the set {pi = Pr (Zi > card (Xi ∩ G∗C ))}i .
The probability pi is an error of the first kind: the risk of making a mistake
when considering that the group is not typical.
8.5 Contribution
9 Illustration
We illustrate the applicative interest of the different concepts presented be-
low on a data set stemming from a survey of the French Public Education
Mathematical Teacher Society on the level in mathematics of pupils in the
final year of secondary education and the perception of this subject [8]. In
parallel with evaluation tests for students, a set of 311 teachers have been
asked on the objectives of the training in mathematics (table 2 presents some
items used in the following) and their opinions about commonly shared ideas
on this subject (table 3). For each proposition, the teacher could answer
“I agree with this idea” (positive opinion), “I disagree” (negative opinion) or
“I partially agree”.
The figure 3 presents a part of the directed hierarchy obtained on the set
composed of the objectives and the different modalities for the opinions (51
items). The interpretation of the whole set of rules is far beyond the scope of
this paper. Nevertheless, we have selected some of them, easy to interpret for
a non specialist in education theory, to show the use of a directed hierarchy
on a real-life corpus. As for the complementarity of this structure with a more
classical approach based on the relationship representation by a graph, it is
highlighted in figure 2. The vertex set V of this graph contains the same items
as those selected for figure, and there is an arc between two vertices ai and
aj of V if and only if ϕ (ai , aj ) ≥ 0.5 and for any ak ∈ V , ϕ (ai , ak ) < 0.5 and
ϕ (aj , ak ) < 0.5 (e.g. [2]). The choice of the threshold comes from the fact that
beyond 0.5 the implicative tendency (e.g. ai → aj ) is better than neutrality.
It is important to note that, due to the non transitivity of the relationship
on A induced by ϕ, the existence of two arcs of the form (ai , aj ) and (aj , ak )
does not entail the existence of the arc (ai , ak ). For instance, in figure, we can
not deduce a relationship between the items E and OP 7.
An overview of the Statistical Implicative Analysis (SIA) development 35
E OP2
I OP5 N
OP8 OP4 A
OP7 OP6
Fig. 2. A part of the implicative graph on the items of the survey on the training
in mathematics
Fig. 3. A part of the directed hierarchy on the items of the survey on the training
in mathematics
36 R. Gras, P. Kuntz
On the other hand, beside the binary rules, most of the R-rules of the total
directed hierarchy involve three or four items. The interpretation of rules with
more attributes are generally more difficult to interpret. Nevertheless, they
provide more information than the set of the implied binary rules.
The R-rule (N → A) → OP 6 has the following meaning: if know-how
acquisition must be accompanied by knowledge acquisition, then the teacher
ask for well-defined programs. In this case, focussing on knowledge requires
a predefined charter from the institution. The R-rule allows to give a more
synthetical interpretation than the binary rules: these are concerned with
the behaviour, as seen within the behavioural framework, whereas the R-rule
here describes a conduct of a higher order which determines the behaviour.
Teachers who consider that the objective C (Preparation to civic and social
life) is not relevant are mostly responsible for this R-rule. They have a very
restrictive representation of the teaching of maths, focussed on the subject,
and their teaching conforms to national standard without any questioning.
The R-rule (OP 2 → (OP 5 → OP 4)) can be interpreted as follows: if I wish
to keep up the complete problem for the A-level exam and if the importance
given to the demonstration in maths is subordinated to a fixed scale of grading,
then I conform to the national syllabus instructions. This rule corresponds
to a class of teachers subjected to the institution and conservative in their
educational choices. They consider that, in France, the land of Descartes,
the demonstration is the foundation of the mathematical activity and that
the complete problem at the exam is the evaluation criterion. For them, the
syllabuses and the grading scales defined by the institution are essential to
teaching and assessment. We find again a very classical teaching conception
based on an explicit and unconditional support to the institution.
Contrary to the previous ones, the R-rule (I → (E → (OP 8 → OP 7)))
can be interpreted as a sign of an openminded didactic conception. Indeed, it
means that if a teacher lays the emphasis on the critical mind development
and the imagination and creativity, then he considers that a personal train-
ing of the pupils in the search of examples and counter-examples is sufficient
for discovering divisibility features by themselves. This R-rule reveals a rela-
tionship between the non-dogmatic behaviours of the teacher and the wish to
place the pupil in a situation of personal research.
A Knowledge acquisition
B Preparation to professional life
C Preparation to civic and social life
D Preparation to examinations
E Development of imagination and creativity
I Development of critical mind
N Know-how acquisition
Table 2. Some items from the list of the objectives of training in mathematics
An overview of the Statistical Implicative Analysis (SIA) development 37
Table 3. Some items from the list of the commonly shared ideas in the teaching of
maths
is more than 6 times greater than the typicality. This remark illustrates the
nuances brought by the two concepts: typicality and contribution.
10 Conclusion
In this paper we have proposed an overview of the Statistical Implicative
Analysis. Beyond the results, we have related the genesis of the considered
problems which arise from questions of experts in different fields. The the-
oretical basis is quite simple, but the numerous questions on the original
assumptions, which do not appear here, have lead to modifications and some-
times to deeper revisions. Fortunately, the proposed answers go beyond the
original framework, and SIA is now a data analysis method, based on a non
symmetrical approach, which has been shown to be relevant for various ap-
plications.
In the next future, we are planning to consider new problems: (i) the exten-
sion of SIA to vectorial data, (ii) and to fuzzy variables, (iii) the integration
of missing data, (iv) the redundant rule reduction. We are also interested in
the complementarity of SIA with other approaches, in particular with decision
trees (see Ritschard’s paper in this book). And, we will obviously carry on ex-
ploring real-life data sets and confronting our theoretical tools to experimental
analysis to make them evolve.
References
1. R. Agrawal, T. Imielinsky, and A. Swami. Mining association rules between sets
of items in large databases. In Proc. of the ACM SIGMOD’93, pages 679–696.
AAAI Press, 1993.
2. M. Bailleul. Des réseaux implicatifs pour mettre en évidence des relations.
Mathématiques, Informatique et Sciences Humaines, 154:31–46, 2001.
3. M. Bailleul and R. Gras. L’implication statistique entre variables modales.
Mathématiques et Sciences Humaines, 128:41–57, 1995.
4. J.P Benzécri. L’analyse des données (vol. 1): Taxonomie. Dunod, Paris, 1973.
5. J.M. Bernard and S. Poitrenaud. L’analyse implicative bayesienne d’un ques-
tionnaire binaire : quasi-implications et treillis de galois simplifié. Mathéma-
tiques, Informatique et Sciences Humaines, 147:25–46, 1999.
6. J. Blanchard, P. Kuntz, F. Guillet, and R. Gras. Mesure de la qualité des
règles d’association par l’intensité entropique. Revue des Nouvelles Technologies
de l’Information-Numéro spécial Mesures de qualité pour la fouille de données,
RNTI-E-1:33–44, 2004.
7. J. Blanchard, P. Kuntz, G. Guillet, and R. Gras. Implication intensity: From the
basic definition to the entropic version - chapter 28. In Statistical Data Mining
and Knowledge Discovery, pages 475–493. CRC Press - Chapman et al., 2003.
8. A. Bodin and R. Gras. Analyse du préquestionnaire enseignants. Bulletin
de l’Association des Professeurs de Mathématiques de l’Enseignement PUblic,
425:772–786, 1999.
An overview of the Statistical Implicative Analysis (SIA) development 39
Raphaël Couturier
Summary. CHIC is a data analysis tool based on SIA. Its aim is to discover the
more relevant implications between states of different variables. It proposes two
different ways to organize these implications into systems: i) In the form of an
oriented hierarchical tree and ii) as an implication graph. Besides, it also produces
a (non oriented) similarity tree based on the likelihood of the links between states.
The paper describes its main features and its usage.
Key words: data mining tool, oriented hierarchical tree, implication graph, simi-
larity tree, CHIC.
1 Introduction
Statistical Implicative Analysis was initiated by Gras [7, 8]. The first goal of
this method was to define a way of answering the question: “If an object has
a property, does it also have another one? ”. Of course the answer is rarely
true. Nevertheless it is possible to notice that a trend is appearing. SIA aims
at highlighting such tendencies in a set of properties. SIA can be considered
as a method to produce association rules. Compared to other association rule
methods, SIA distinguishes itself by providing a non linear measure that sat-
isfies some important criteria. First of all, the method is based on implication
intensity that measures the degree of astonishment inherent in a rule. Hence,
some trivial rules that are potentially well known to an expert are discarded.
In fact, a rule of the form A ⇒ B is considered trivial if almost all objects
of the population have property B. In this case, the implication intensity is
close to 0 and this is not the case when rules can be considered as surprising.
This implication intensity may be reinforced by the degree of validity that
is based on Shannon’s entropy, if the user chooses this computation mode.
This measure does not only take into account the validity of a rule itself, but
its counterpart too. Indeed, when an association rule is estimated as valid,
i.e. the set of items A is strongly associated with the set of items B, then
it is legitimate and intuitive to expect that its counterpart is also valid, i.e.
the set of non-B items is strongly associated with the set of non-A items.
Both the implication intensity and the degree of validity can be completed
by a classical utility measure based on the size of the support rule and are
combined to define a final relevance measure that inherits the qualities of the
three measures (with the entropic theory), i.e. it is noise-resistant as the rule
counterpart is taken into account and it only selects non trivial rules. For
further information the reader is invited to consult [9]. Based on that original
measure, CHIC, given a set of data, enables one to extract association rules.
CHIC and SIA have been used in wide domain areas, for example [3, 4, 6, 14].
Based on the implication intensity and the similarity intensity CHIC allows
to build two trees and one graph. The most classical tree is a similarity tree
(usually known as dendogram). It is based on the similarity index defined by
Lerman [13]. In a similar way, the implication intensity can be used to build
an oriented hierarchy tree. The implication intensity can also be used to define
an implication graph, which lets the user select the association rules and the
variables he or she wants.
In opposition to most of other multidimensional data analysis methods,
the SIA establishes the following properties between the variables it handles:
• relationship between variables are dissymmetrical
• the association measures are non linear and are based on probabilities
• the user can use graphical representations which follow the semantic of the
relationship
For example, most of the following methods: the factor analysis, the discrim-
inant analysis or the preference analysis are based on metric space distances.
Most of hierarchical classification methods use proximity or similar indexes.
So, relationships between variables are essentially symmetric. Moreover, most
of the times those relationships vary linearly with observation parameters.
Some methods are built on measures based on probabilities which simplifies
the results interpretation. Some papers present comparison between different
measures. Interested readers can consult [12] or the chapter entitled “On the
behaviour of the generalisation of the intensity of implication : a data-driven
comparative study” in this book.
Section 2 addresses the variables that can be handled in CHIC, their format
and the options that may help users. In Section 3 some details are explained
on the way to compute the association rules. Section 4 presents the similarity
and the hierarchy tree. Section 5 describes the implication graph. In Section 6
some other features of CHIC are presented. Section 7 gives an illustration
with interval variables and computation of typicality and contribution. Finally
Section 8 concludes this paper.
CHIC: Cohesive Hierarchical Implicative Classification 43
2 Variables
Initially CHIC as the SIA was designed to handle binary variables. Later,
SIA was enhanced by other kinds of variables and so was CHIC. Currently,
CHIC allows the user to handle binary variables, frequency variables, variables
over intervals and interval-variables. The case of binary variables is obviously
the simplest one. Ordinal variables (also called nominal ones) can be coded
using as many binary variables as number of categories. Frequency variables
take a real value between 0 and 1. This kind of variables allows the user
to include the case of discrete variables which only takes a fixed number of
values (or modalities) ranging between 0 and 1. Of course, the way of defining
modalities is very important, because it strongly affects the results of CHIC
whether the values of modalities are close to 0 or 1. This remark is also
true concerning the frequency variables. It should be noticed that ordinal
variables are also coded using frequency ones. The user must pay attention to
the way real variables are transformed into frequency ones. Several strategies
are available depending on the values. If the values are positive, they can
be divided by the maximum value. Another possibility resides in considering
that the minimum value represents 0 and the maximum represents 1, all the
other variables are proportionally distributed between the minimum and the
maximum values. If a real variable has both positive and negative values, it
is possible to split the variables into two variables, one for positive values
and another one for negative values. In this case, previous remarks are true
for both new variables. However, it is possible to consider that the minimum
value (even if it is negative) represents 0 and the maximum represents 1. In
this case, all other values are transformed into the interval [0, 1].
Variables over intervals and interval-variables are used to model more com-
plex situations. Both these kinds of variables are explained in the following
section. Variables over intervals allow to stage the following problem. In fact,
the conversion of a real variable into a frequency one may imply difficult
choices from the user’s point of view, as explained previously. Using the same
real values, a variable over intervals proceeds differently. It consists in decom-
posing values of a variable into a given number of intervals. The number of
intervals is chosen by the user and then the algorithm of dynamic clouds [5]
automatically constitutes the intervals which have distinct bounds. This al-
gorithm has the particularity of building intervals by minimizing inertia in
each interval. Then each interval is represented by a binary variable and an
individual has value 1 if it belongs to the interval and 0 otherwise. Using such
a decomposition, an individual belongs only to one interval. Hence, the num-
ber of variables increases with this method. Let us take an example. Assume
that we have a set of individuals and that for each of them we have their
weight and height. Then, assume that everybody weighs between 40kg and
140kg and the height ranges between 140cm and 200cm. Figure 1 shows an
example with few individuals, values have been chosen arbitrarily. Supposing
that we are interested in decomposing each variable into 4 intervals, according
44 Raphaël Couturier
Fig. 1. A simple example of data with interval variables and supplementary variables
150kg. Of course the number of intervals may have a great influence on the
result.
Whereas for a variable over intervals each individual takes a value 1 for
only one interval, the particularity of an interval-variable is that an individ-
ual takes some values on different intervals. Moreover the intervals can be
contiguous and represent a discrete decomposition, as it is the case using an
automatic decomposition method like the dynamic clouds one, but they can
also be defined by the user according to appropriate criteria. Taking the pre-
vious example with the weight and height, a user may prefer to state that
people may be thin, normal or healthy and that they can be small, normal
or tall. Nonetheless, the particularity of an interval-variable is that an indi-
vidual may take values between several intervals but the sum of all its values
must be less or equal to 1. In most cases, the sum will be equal to 1, but
this is not mandatory. Roughly speaking, it is far from being easy to classify
objects and individuals because opinions may frequently diverge on the fact
that somebody or something should be described as “small” or “normal” for
example. Consequently, saying that someone is rather slim may be expressed
by assigning this individual with 0.75 thin and 0.25 normal. It should also
be noted that this allows the user to handle fuzzy variables which are very
useful in several problems [2]. The fuzziness characteristic comes from either
a human appreciation, which by definition is subjective, or by an inaccurate
measurement process which for some reason introduces a bias. In any case,
CHIC uses the standard methods presented in the next section.
As to the data format, CHIC uses the CSV (comma separated values)
format, a standard in spreadsheet tools. Labels for individuals are recorded
on the first row and labels for variables are recorded on the first column.
Values of individuals are represented into a 2-dimensional array. The values
for each variable of an individual are stored into a line in this array (the
first element is the name of the individual). The values for each individual
of a variable are stored into a column in this array (the first element is the
name of the variable). Of course, the nature of the values in the array differs
according to the kind of variables (binary, frequency variable, . . . ).
As explained in the article of Gras and Kuntz in this book, supplementary
variables can be used in CHIC in order to explain some important facts.
This kind of variables do not intervene in the computation but it is used to
give sense for the computation of typicality and contribution. Let us take an
example. Assuming that we want to study the impact of a new tramway in
a part of a town and that in this aim a survey has been performed. This
survey gathers several information concerning the needs and the hopes of
this project’s potential users. Of course the gender of the people questioned
is given. For example, some rules such as: working people living far from
their work are generally very interested in the project, or family with young
children are also very favorable to it, may be extracted. Using the gender of
people as a supplementary variable, it is then possible to know if people that
46 Raphaël Couturier
are responsible for the construction of the previous rules are rather men or
women or if there is no distinction.
Before starting any computation with an appropriate presentation of the
results, the user must choose some options. The most important one is choos-
ing the computation method: either the classical one or the entropic one. This
criterion will produce different results. The entropic version of the implication
does not only take into account the validity of a rule itself, but its counterpart
too. Usually, the entropic version is best suited for large data set. It is also
more severe than the classical one which produces more intensive rules but is
totally inappropriate with a large data set.
Then, at each step, CHIC computes a set of new classes with all the ex-
isting ones. In order to build a new class, CHIC either aggregates an existing
class with a variable which has not been aggregated in another class yet or
aggregates two existing non aggregated classes. Nonetheless each couple of
variables between the two classes must have a valid intensity, i.e. greater than
0.5. For example the formation of a class ((a, b), c) entails that classes (a, c)
and (b, c) are meaningful from the analysis point of view (similarity or implica-
tion). The class ((a, b), c) represents the rule (a ⇒ b) ⇒ c with the implicative
analysis and represents the fact that a and b are similar and that this class
is similar to c from the similarity point of view. For more details on the class
formation, interested readers are invited to read the article of Gras and Kuntz
in this book and in [10].
If the user is interested to know how the tree without one or more variables
would look like, he can simply deselect them in the item toolbox. It should be
noticed that this toolbox is available for all kinds of representation provided by
CHIC (trees of graph). Unfortunately, a modification of the variables involved
in the computation (even a small one) implies a complete rebuilding of the
tree. This step of class construction is strongly dependent on the number of
variables (the algorithm has a complexity which depends on the factorial of
the number of variables).
Before running an analysis the user can choose in the computation options
to highlight the significant level in the tree.
Figure 2 shows a similarity tree and Figure 3 shows a hierarchy tree.
For the latter one, significant levels are pointed out. They are represented
by a red line (in CHIC). Each significant level means that the current level
CHIC: Cohesive Hierarchical Implicative Classification 49
is more significant than the previous one and than the next one, which is
not significant by definition. For more details on its construction, interested
readers should refer to the definition given in this book and in [11].
The similarity index is computed by the classical theory or by the entropic
one. The last one should be preferred with a large number of individuals.
Moreover, the construction of the similarity tree with the classical index leads
to only one class that gather all the others. On the contrary, with the entropic
version of the similarity index, the algorithm very frequently builds more than
one class. In fact, according to similarities of data, the number of classes varies.
5 Implication graph
6 Other possibilities
A ⇒ B has an implicative intensity equals to 0.7, then the most typical indi-
viduals respectively have values close to 0.5 and 1 for A and B (those values
depend on how the rule was created, i.e. what computation mode was chosen,
and especially the cardinality of the set A and B). By opposition, the notion
of contribution is defined to measure if individuals are more responsible for
the creation of the rule than the other ones. With the previous example, the
most contributive individuals are those who have 1 for the variables A and
B. So, the notions of typicality and contribution are different. In the same
way, the notion of typicality (resp. contribution) of a set of individuals (or of
a category of individuals) is defined in order to know if a considered set of
individuals is typical (resp. contributive) to a rule. In order to have formal
definitions of those notions, one should refer to the chapter of R. Gras and
P. Kuntz in this book.
In the graph some interesting rules are visible. For example, we can see the
rules: weight1 ⇒ height12 and height34 ⇒ height2-4. Because the number of
individuals is small, and consequently not significant, and because the values of
this set have been arbitrarily generated, nothing else than: “light individuals
have generally a rather small height and that tall individuals are not light
ones” can be concluded. Nevertheless these rules show an implication between
the partitions of the two variables. Considering that these data can have a
52 Raphaël Couturier
sense for the expert, then we could have computed the typicality and the
contribution of the group of individuals. For example, concerning the rule
height34 ⇒ weight2-4, CHIC determines that the variable man contributes
the most to it (the error is 0.00638 for man, so close to 0; the error is equal to
1 for woman, so it does not contribute at all). In the opposite way the most
typical variable to rule weight1 ⇒ height12 is the variable woman (the error
is 0.00499 for woman, so this is a very good typicality; the error equals 1 for
man, so it does not contribute at all). Both those results are not surprising
analyzing the data.
8 Conclusion
References
1. R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between
sets of items in large databases. In Proceedings of the 1993 ACM SIGMOD
International Conference on Management of Data, pages 207–216, 1993.
2. G. Bojadziev and M. Bojadziev. Fuzzy sets, fuzzy logic, applications. World
scientific, 1996.
3. R. Couturier. Un système de recommandation basé sur l’ASI. In Troisième
rencontre internationale de l’Analyse Statistique Implicative (ASI3), pages 157–
162, 2005.
4. R. Couturier, R. Gras, and F. Guillet. Reducing the number of variables us-
ing implicative analysis. In International Federation of Classification Societies,
IFCS 2004, pages 277–285. Springer Verlag: Classification, Clustering, and Data
Mining Applications, 2004.
5. E. Diday. La méthode des nuées dynamiques. Revue de statistique appliquée,
19(2):19–34, 1971.
6. G. Froissard. CHIC et les études docimologiques. In Troisième rencontre inter-
nationale de l’Analyse Statistique Implicative (ASI3), pages 187–197, 2005.
CHIC: Cohesive Hierarchical Implicative Classification 53
1 Introduction
Frequent pattern discovery in sequences of events1 (generally temporal
sequences) is a major task in data mining. Research work in this domain
consists of two approaches:
• discovery of frequent episodes in a long sequence of events (approach
initiated by Mannila, Toivonen, and Verkamo [12, 13]),
• discovery of frequent sequential patterns in a set of sequences of events
(approach initiated by Agrawal and Srikant [1, 17]).
The similarity between episodes and sequential patterns is that they are
sequential structures, i.e., a structure defined with an order (partial or total).
Such a structure can be, for example:
1
Here we speak about sequences of qualitative variables. Such sequences are
generally not called time series.
J. Blanchard et al.: Assessing the interestingness of temporal rules with Sequential Implication
Intensity, Studies in Computational Intelligence (SCI) 127, 55–71 (2008)
www.springerlink.com © Springer-Verlag Berlin Heidelberg 2008
56 Blanchard et al.
Our measure, SII, evaluates sequential rules extracted from one unique
sequence. This approach can be easily generalized to several sequences, for
example by computing an average or minimal SII on the set of sequences.
ω
Rules are of the form a −−−→ b, where a and b are episodes (these ones can
even be structured by intra-episode time constraints). However, in this article,
we restrict our study to sequential rules where the episodes a and b are two
single events.
3
We consider here that the size of the time window is negligible compared to the
size of the sequence, and we leave aside the possible side effects which could make
new patterns appear overlapping the end of a sequence and the beginning of the
following repeated sequence.
58 Blanchard et al.
2.2 Notations
such that:
Sequential Implication Intensity 59
Fig. 2. Among the 3 windows of size ω beginning on events a, one can find 2
ω
examples and 1 counter-example of the rule a −−−→ b.
ω
Definition 1 A sequential rule is a triple (a, b, ω) noted a −−−→ b where
a and b are events of different types and ω is a strictly positive real number.
It means: “if an event a appears in the sequence then an event b certainly
appears within the next ω time units”.
ω
Definition 2 The examples of a sequential rule a −−−→ b are the events
a which are followed by at least one event b within the next ω time units.
Therefore the number of examples of the rule is the cardinality noted nab (ω):
0 0
nab (ω) = (a, t) ∈ S | ∃(b, t ) ∈ S, 0 ≤ t − t ≤ ω
60 Blanchard et al.
ω
Definition 3 The counter-examples of a sequential rule a −−−→ b are the
events a which are not followed by any event b during the next ω time units.
Therefore the number of counter-examples of the rule is the cardinality noted
nab (ω):
0 0 0
nab (ω) = (a, t) ∈ S | ∀(b, t ) ∈ S, (t < t ∨ t > t + ω)
Contrary to association rules, nab and nab are not data constants but depend
on the parameter ω.
Let us note na the number of events a in the sequence. We have the usual
ω
equality na = nab + nab . A sequential rule a −−−→ b is completely described
by the quintuple (nab (ω), na , nb , ω, L). The examples of a sequential rule now
being defined, we can specify our measure for the frequency of the rules:
ω
Definition 4 The frequency of a sequential rule a −−−→ b is the proportion
of examples compared to the size of the sequence:
ω nab (ω)
f requency(a −−−→ b) =
L
With these notations, the confidence, recall, and J-measure are given by
the following formula:
ω nab (ω)
conf idence(a −−−→ b) =
na
ω nab (ω)
recall(a −−−→ b) =
nb
ω nab (ω) nab (ω)L nab (ω) nab (ω)L
J−measure(a −−−→ b) = log2 + log2
L na nb L na (L − nb )
Sequential Implication Intensity 61
Following the implication intensity for association rules [6], the sequential
implication intensity SII measures the statistical significance of the rules
ω
a −−−→ b. To do so, it quantifies the unlikelihood of the smallness of the num-
ber of counter-examples nab (ω) with respect to the independence hypothesis
between the types of events a and b. Therefore, in a search for a random
model, we suppose that the types of events a and b are independent. Our
goal is to determine the distribution of the random variable Nab (number of
counter-examples of the rule) given the size L of the sequence, the numbers
na and nb of events of types a and b, and the size ω of the time window which
is used.
We suppose that the arrival process of the events of type b satisfies the
following hypotheses:
• the times between two successive occurrences of b are independent random
variables,
• the probability that a b appears during [t, t + ω] only depends on ω.
Moreover, two events of the same type cannot occur simultaneously in the
sequence S (see section 2.2). In these conditions, the arrival process of the
events of type b is a Poisson process of intensity λ = nLb . So, the number of b
appearing in a window of size ω follows Poisson’s Law with parameter ω.n L .
b
ω
Definition 5 The sequential implication intensity (SII ) of a rule a −−−→
b is defined by:
ω
SII(a −−−→ b) = P(Nab > nab (ω))
62 Blanchard et al.
Numerically, we have:
nab (ω)
ω ω ω
X
SII(a −−−→ b) = 1−P(Nab ≤ nab (ω)) = 1− Ckna (e− L nb )k (1−e− L nb )na −k
k=0
In this section, we study the measures when the number nab of counter-
ω
examples increases (with all other parameters constant). For a rule a −−−→ b,
Sequential Implication Intensity 63
Fig. 4. SII, confidence, recall, and J-mesure w.r.t. the number of counter-examples.
na = 50, nb = 130, ω = 10, L = 1000
this can be seen as making the events a and b more distant in the sequence
while keeping the same numbers of a and b. This operation transforms events
a from examples to counter-examples.
Fig. 4 shows that SII clearly distinguishes between acceptable numbers
of counter-examples (assigned to values close to 1) and non-acceptable num-
bers of counter-examples (assigned to values close to 0) with respect to the
other parameters na , nb , ω, and L. On the contrary, confidence and recall
vary linearly, while J-measure provides very little discriminative power. Due
to its entropic nature, the J-measure could even increase when the number
of counter-examples increases, which is disturbing for a rule interestingness
measure.
We call sequence repetition the operation which makes the sequence longer by
repeating it γ times one after the other (we leave aside the possible side effects
which could make new patterns appear by overlapping the end of a sequence
and the beginning of the following repeated sequence). With this operation,
the frequencies of the events a and b and the frequencies of the examples and
counter-examples remain unchanged.
Fig. 7 shows that the values of SII are more extreme (close to 0 or 1)
with sequence repetition. This is due to the statistical nature of the measure.
Sequential Implication Intensity 65
Statistically, a rule is all the more significant when it is assessed on a long se-
quence with lots of events: the longer the sequence, the more one can trust the
imbalance between examples and counter-examples observed in the sequence,
and the more one can confirm the good or bad quality of the rule. On the
contrary, the frequency-based measures like confidence, recall, and J-measure
do not vary with sequence repetition (see Fig. 8).
na nb L
nab (ω) = na − ω, if ω ≤
L nb
nab (ω) = 0 , otherwise.
This is a simple model, considering that the number of examples observed in
the sequence is proportional to ω: nab (ω) = naLnb ω. The formula is based on
the following postulates:
• According to definitions 2 and 3, nab must increase with ω and nab must
decrease with ω.
• If ω = 0 then there is no time window, and the data mining algorithm
cannot find any example4 . So we have nab = 0 and nab = na .
• Let us consider that the events b are regularly spread over the sequence
(Fig. 9). If ω ≥ nLb , then any event a can capture at least one event b within
the next ω time units. So we are sure that all the events a are examples,
i.e. na = nab and nab = 0.
In practice, since the events b are not regularly spread over the sequence, the
maximal gap between two consecutive events b can be greater than nLb . So the
4
We consider that two events a and b occurring at the same time do not make an
example.
Sequential Implication Intensity 67
threshold ω ≥ nLb is not enough to be sure that na = nab . This is the reason
why we introduce a coefficient k into the function nab (ω):
na nb ω kL
nab (ω) = na − , if ω ≤
L k nb
nab (ω) = 0 , otherwise.
The coefficient k can be seen as a non-uniformity index for the events b in the
sequence. We have k = 1 only if the events b are regularly spread over the
sequence (Fig. 9).
With this model for nab (ω), we can now study the interestingness measures
with regard to ω and k. Several interesting behaviors can be pointed out for
SII (see illustration in Fig. 11):
• There exists a range of values for ω which allows SII to be maximized.
This is intuitively satisfying5 . The higher the coefficient k, the smaller the
range of values.
• If ω is too large, then SII = 0. Indeed, the larger the time window, the
greater the probability of observing a given series of events in the sequence,
and the less significant the rule.
• As for the small values of ω (before the range of values which maximizes
SII):
– If k ≈ 1, then nab increases fast enough with ω to have SII increase
(Fig. 11 at the top).
5
When using a sequence mining algorithm to discover a specific phenomenon in
data, lots of time is spent to find the “right” value for the time window ω.
68 Blanchard et al.
– If k is larger, then nab does not increase fast enough with ω. SII
decreases until nab becomes more adequate (Fig. 11 at the bottom).
On the other hand, confidence (idem for recall) increases linearly with ω
(see Fig. 12 with a logarithmic scale). Above all, the three measures confi-
dence, recall, and J-measure do not tend to 0 when ω is large6 . Indeed, these
measures depend on ω only through nab , i.e. the parameter ω does not explic-
itly appear in the formulas of the measures. If ω is large enough to capture all
the examples, then nab = 0 is fixed and the three measures become constant
functions (with a good value since there is no counter-example). This behavior
is absolutely counter-intuitive. Only SII takes ω explicitly into account and
allows rules with too large time window to be discarded.
4 Conclusion
References
1. R. Agrawal and R. Srikant. Mining sequential patterns. In Proceedings of the
international conference on data engineering (ICDE), pages 3–14. IEEE Com-
puter Society, 1995.
2. R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In
Jorge B. Bocca, Matthias Jarke, and Carlo Zaniolo, editors, Proceedings of the
twentieth international conference on very large data bases (VLDB 1994), pages
487–499. Morgan Kaufmann, 1994.
3. J. Blanchard. Un système de visualisation pour l’extraction, l’évaluation, et
l’exploration interactives des règles d’association. PhD thesis, Université de
Nantes, 2005.
4. J. Blanchard, F. Guillet, and H. Briand. L’intensité d’implication entropique
pour la recherche de règles de prédiction intéressantes dans les séquences de
6
This does not depend on any model chosen for nab (ω).
Sequential Implication Intensity 69
1 Introduction
the system solves transformational tasks, to ILEs such as Aplusix where the
student has to perform all the transformations with the given expressions.
To avoid combinatorial explosion some ILEs limit the student action to one
calculation step. The expected step is often very simple. Such is for example
the case of Cognitive Tutors [7]: each student action is required to be on an
interpretable path. At each step, the student action is compared to applica-
ble rules in the model and immediate feedback is conventionally provided. If
the student action matches one of the applicable rules, the tutor accepts the
action and applies the rule to update the internal representation of the prob-
lem state. If the student action does not match the action of any applicable
rule in the model, the action does not register and the tutor provides a brief
message in the hint window. In case of ambiguity about the interpretation of
the student action, the student is presented with a disambiguation menu to
identify the appropriate interpretation of the action. The feedback is immedi-
ate and immediate error correction is required. As a consequence, Cognitive
Tutors make it possible to follow the student very closely but, as Mc Arthur
points out, “because each incorrect rule is paired with a particular tutorial
action [. . . ], every student who takes a given incorrect step gets the same
message, regardless of how many times the same error has been made or how
many other errors have been made” [8].
Other authors ask the students to mark the rule they wish to apply. In
MathXpert [9] for example, the student selects a sub-expression, then chooses
a rule in a menu providing the rules that are applicable to this sub-expression,
see Fig. 1. The chosen rule is then automatically applied. The opportunity to
make mistakes in such an environment is strongly restricted. The T-algebra
environment [10] differs from MathXpert in the fact that it is the student
who has to write the result of the rule application (when the system is in ‘free
mode’). However, the rules menu presented by the system is also contextual:
it depends on the selected sub-expression.
A completely different approach is used in the Aplusix environment [11]:
this microworld leaves the students entirely free to produce the expressions
they wish, without specifying which rules they should apply. This environment
allows students to apply several rules in one single step, as they do in the
usual paper and pencil environment. Therefore, with such environments, we
are closer to the real mental processes of students. For this reason, we have
chosen Aplusix environment for our study of students’ actions.
contrary to the ILEs, the commands of a CAS are very powerful: simplify, fac-
torise, expand, solve, differentiate, integrate, and so on.
78 M.-C. Croset et al.
3.1 Presentation
Fig. 2. Screenshots of Aplusix. On the left, the student is in test mode, he/she has
done several transformations in each step. On the right, the student is in training
mode, with feedback about correctness
Student’s Algebraic Knowledge Modelling 79
Aplusix permits the occurrence of complex errors and actions in one stu-
dent’s step, as shown in Fig. 2. As a result, the understanding of the student
reasoning is complicated and the difficulty of providing a diagnosis of mistakes
increases. It is then necessary to subdivide a student’s step into elementary
steps. This is done by an automatic process: the rules diagnosis provided by
Anaïs, presented in the following section.
The files used for an automatic analysis consist only of the calculation steps
validated by the student (i.e., the student’s steps): corrections, hesitations and
time are not taken into account. Software, called Anaïs, has been developed to
analyse students’ productions. It is based on rules established from a didactical
analysis and gathered in a library. The analysis consists of searching for the
best sequence of the rules (correct or incorrect) that can explain a given
student’s step. The process of the analysis is as follows: from the expression
that is the source of the student’s step, Anaïs develops a tree by applying all
the rules applicable to this expression.
• The application of a rule produces a new node. Anaïs thus gradually builds
a research tree, at each level choosing the node to be developed by using
a heuristic that takes into account the goal (the expression resulting from
the student’s step).
• When the process is successful, the goal may be reached by several paths,
each of them being a diagnosis. The selection of the best diagnosis is
based on a cost of paths expressed in terms of the number and kind of
rules applied.
The Anaïs software provides a diagnosis in the form of a sequence of in-
termediate elementary stages (rewriting rules) to explain the steps produced
by the student, as shown in Tab. 1.
We call elementary step an automatic intermediate stage provided by
Anaïs. An elementary step has an initial expression and a final (intermediate)
expression.
A single rule explains each elementary step. A step can belong to one of
four different tasks, whatever the type of the exercise: expansion, factoring,
collecting like terms, and movement, cf. Tab. 1. An exercise of the equation-
solving type may involve factoring, collecting like terms, and movement steps.
Remark 1. One can question the interest of leaving such a large freedom to
the student’s answers and thus having to reconstruct the intermediate steps.
Indeed, the ILEs presented in section 2 do not need to do this complex work.
However, when a student solves an algebra exercise, he/she does not always
see an expression transformation as the application of a rule, with an initial
and a final expression. Requiring that students cite the rules applied may be
80 M.-C. Croset et al.
of didactical interest, but it may lead us away from the student’s real way of
thinking. The freedom the students have while working with Aplusix puts us
as close as possible to their real mental processes.
One of the difficulties in catching stable errors is to define what stability is:
what is a good threshold to decide when a rule can be considered as having
been regularly used? The most common and spontaneous technique for catch-
ing stability is to count the number of opportunities for a rule application and
divide it by the number of effective student’s applications [12,13]. Authors do
not define very precisely what is called opportunity for a rule application.
They recognize themselves that “different mal-rules have widely different ‘op-
portunities’ of occurring, and in some cases the number of opportunities is
impossible to quantify without looking closely at individual students proto-
cols” [13].
3x − 5 + 3x E1
3x + 3x − 5 E2
−5 + 3x + 3x E3
7x − 8 + 3 + 7x E4
5x × 2x E5
The four expressions E1, E2, E3, and E4 present an opportunity to apply
R. Clearly, E5 does not present an opportunity for application of R because
the initial expression is a product instead of a sum. Therefore, it seems that
there are four opportunities for application of R. Let us see what the answers
of students for the four expressions are:
Let us suppose that student A’s answer is −5 for each of the first four
expressions (and, for example, 10x2 , for E5). This means that student A has
used R for these four expressions: the frequency of application of R associated
to student A is then 4/4. It seems that the behaviour of student A is very
stable with respect to this rule.
Let us suppose now that student B’s answers are respectively: −5, 6x − 5,
−5 + 6x and −5 for the first four expressions (and, for example, 10x2 , for
E5). This would mean that the student B has used R only for E1 and E4.
The frequency of application of R for the student B would then be 2/4. Does
it mean that the application of R for the student B is unstable? We do not
believe that.
Balacheff explains that two different behaviours (using two different rules
here) “can appear as conflicting but this inconsistency can be explained either
by the time evolution or by the situation/context” [14]. The algebraic context
of the four expressions is not the same: in E1 and E4, the minus sign is between
the two monomials. Moreover, the minus sign is placed side by side with 3x,
respectively 7x. In the expressions E2 and E3, there is also a minus sign, but it
is not between the two equal monomials. The didactical variable, which is the
position of the minus sign in the expressions, does not take the same values for
each expression, and it influences the behaviour of student B. We can say that
the behaviour of student B is stable, with respect to the minus sign position:
for him/her, the expressions E2 and E3 do not present an opportunity to
apply R. Each student has his own conception of the opportunity for applying
the rule R. For this reason, it does not seem possible to objectively count the
number of opportunities for a rule application. The application opportunity
is a subjective notion: a “transfer from one situation to another one is not
an obvious process, even if in the eyes of an observer these situations are
isomorphic” [15]. We see that stability definition depends on what is called
opportunity.
In our work, we decided not to have a priori ideas of what constitute oppor-
tunities for a given rule application. We suppose that the sources of errors in
incorrect transformations are principally in the characteristics of the expres-
sion, such as the degree of the initial expression, the nature of its coefficients,
the presence of a minus sign, and so on. We need to describe precisely the
algebraic characteristics of the initial expression to which a rule is applied
and we are looking for the algebraic context that can ‘better’ explain the rule
utilisation for each student.
82 M.-C. Croset et al.
Since we are looking for causes of students’ errors and not just correlations, the
choice of SIA seems worthwhile. Indeed, the SIA approach makes it possible
to find implicative links between attributes, in our case between algebraic
characteristics and rules. In addition, this technique takes into account the
number of times that a context appears relative to the other contexts. For
example, if a student S1 uses a rule R1 ten times in a context C0 , and he/she
uses a rule R2 five times in the same context C0 , then the quasi-implication
C0 → R1 can be evaluated.2 Let us consider another student, S2 , who uses
the rule R1 100 times in the context C0 and the rule R2 50 times in the same
context. The frequency of R1 application by both students is the same (2/3),
but the SIA approach makes a distinction between 10/15 and 100/150.
Let us consider a rule set, {Rk }1≤k≤p .3 To this set, we associate algebraic con-
text variables, {Vi }1≤i≤n that are the main characteristics of initial expressions
on which each rule Rk can be applied. Each algebraic context variable, Vi , (just
called variable in what follows) can be assigned values noted {V ij }1≤j≤mi .
We call contexts the vectors (V 1j1 , V 2j2 , . . . , V njn ), associated to an ex-
i=n
Y
pression. The number of such vectors is mi . We denote a context by Cl ,
i=1
i=n
Y
where l ∈ {1, . . . , mi }.
i=1
Remark 2. Since we will use the CHIC software, we prefer to have variables
with binary values. Therefore, we consider the values {V ij }1≤j≤mi as new
variables, called binary context variables, which take 0 or 1 as values. When
necessary, the distinction between binary context variables and algebraic con-
text variables will be made.
We illustrate this with an example. Let us consider the expression 3x − 5.
The operator and the presence of a minus sign are two algebraic context vari-
ables of the expression. The operator can be one of the five binary context
variables which are times, plus, minus (e.g. the expression −(2x + 7)), bracket
and exponent. These binary context variables can be assigned binary values:
2
Quasi-implication is called a “rule” by the authors of the SIA approach. In order
not to confuse it with what we call algebraic rules, we will call it implication or
quasi-implication, noted →, while the transformation of an algebraic expression
is noted 7→.
3
In general, rule set is associated to a task or a part of task.
Student’s Algebraic Knowledge Modelling 83
the expression has or does not have the particular operator. The second vari-
able, presence of a minus sign, has one binary variable: itself. A context extract
of the expression 3x − 5 is then (Plus, Presence of minus sign).
Obviously, variables depend on tasks: variables for factoring or for move-
ment are not the same. We assume that the variables have impact on the use
of a rule by a student. This means that the rule used depends on the value of
a variable.
We will consider a behaviour as stable if the student uses the same rule
each time (or almost each time) that he/she is in the same algebraic context.
Description and choice of variables are determining factors for catching sta-
bility and this is a difficult task. The values depend on didactical decisions,
as we will show in the next subsection.
Experimentation with Aplusix has been carried out for different purposes
by teachers and researchers [18]. However, experimentation presented in this
chapter was all conducted in the test mode (i.e., without information about
correctness of students’ steps, see section 3). Log files were gathered in a
database and analysed by Anaïs to provide a sequence of elementary steps for
each student’s step.
We can query (with the Structured Query Language — SQL) the database
and get result sets. These sets are tables which can be recorded in the CSV
format and directly used by the CHIC software. We will call these tables CHIC
tables.
The lines of the CHIC tables used in section 6 and 7 consist of the elemen-
tary steps from the automatic diagnosis for individual students. At least one
CHIC table is associated to one student. The columns, called attributes, con-
sist of the binary context variables and the actions. The actions can be either
the rules diagnosed by Anaïs (section 6) or a collection of rules (section 7),
according to what we want to model: a precise task or a set of tasks. The
values assigned to the rules are binary: either the rule is used or not. If an
elementary step in line i is explained by the rule which is in column j, there is
a 1 in the cell (i, j). The binary context variables are, as their name indicates,
binary. There is 1 in the cell (i, k) each time the initial expression of the step
i can be described by the (binary) variables of column k. An example of line
is shown in Tab. 3.
The files used in section 8 are not exactly the same and will be explained
in time.
Remark 3. On a line, it is not possible to have 1 in the columns of two binary
variables that depend on the same algebraic context variable while there can
be as many 1’s as there are context variables.
5.4 Implications
4
The index r cannot be equal to i: a value of a variable cannot imply another value
of the same variable.
86 M.-C. Croset et al.
The Actions.
In the rules library, there are 20 rules concerning factoring. We present here
the results for one student. Five rules have been diagnosed for this student:
• Correct: the factoring is correct.
• ErMinus: the factoring is erroneous and the mistake is about a minus sign.
For example, (5x + 1)x − (1 + 5x)y 7→ (5x + 1)(x + y).
• ErNothing: when the cofactor of the common factor is 1, some students
think that there remains “nothing” when the common factor is withdrawn.
For example, (x + 3)(x + 2) + (x + 3) 7→ (x + 3)(x + 2). This transformation
can be explained by the loss of a term but interviews with students show
that the concept of “nothing” was behind this transformation.
• ErOther: other kind of factoring errors.
• NoInf, meaning NoInformation: the student has not answered this task.
Note that it is not a step but an expression: there is no final expression
because the student has stopped solving the task.
There are 36 binary context variables associated to the factoring task. The
six context variables are the nature of the common factor, its visibility, its
degree, its position, the nature of its cofactors and the presence and position
of a minus sign. Each variable is decomposed into binary variables. In what
follows, we explain only those that appear in the implicative graph. Each of
them is illustrated by an example.
• The nature of the common factor can be numeric (e.g., 6x + 3), monomial
(e.g., 3x + 15x2 ), a sum of two terms (e.g., (5x + 1)x − (1 + 5x)y), a sum
of three terms (e.g., x(x + 2 + y) − (x + y + 2)(1 + 5x)), or a product (e.g.,
x(x − 4) + (x2 − 4x)(x + 1)).
• The visibility of the common factor depends on its nature. Let us take the
example of a sum of two terms as a common factor. Its visibility can be
obvious (e.g., (x+3)(x+2)+(x+3)), commuted (e.g., (5x+1)x−(1+5x)y),
opposite (e.g., (x + 3)(x + 2) + (−x − 3)), commuted-opposite (e.g., (x + 3)
(x + 2) + (−3 − x)), disconnected (e.g., x + (x + 2) × x + 2), multiple (e.g.,
−6 − 3x + (1 + x)(−2 − x)) or bi-multiple (e.g., −6 − 3x + (1 + x)(−4 − 2x)).
• The nature of the cofactors can be numeric, unit, monomial, sum, product
or identical. For example, in the expression (x + 3)(x + 2) + (x + 3), the
nature of the cofactors is respectively sum (x + 2) and unit (1); while in
the expression x(x − 4)(x + 1) + (x − 4)2 , the cofactors of the common
factor (x − 4) are on the one hand the product x(x + 1) and on the other
hand, the sum (x − 4), which is identical to the common factor.
Student’s Algebraic Knowledge Modelling 87
Fsum2Commut
Expected value
Fsum2Obvious
Distribution
Distribution
Distribution
Distribution
Distribution
Distribution
Distribution
FSum2Opp
FSum2
Expressions
2y 2 + 2 1 0 0 0 0 0 0 0 0 1 0 1 0 0
−7x − 3x2 0 0 0 0 0 0 0 0 0 0 1 0 1 0
3x2 + yx 0 1 0 0 0 0 0 0 0 0 0 1 1 0
3(x + 8)2 + x2 + 8x 0 0 1 0 0 0 0 1 0 0 0 1 1 0
(x − 3)(1 − 4x) − 5(3 − x) 0 0 1 0 0 0 1 0 0 0 1 0 0 1
−6 − 3x + (2 − x)(−4 − 2x) 0 0 1 0 0 0 0 0 1 0 1 0 0 0
(x − 4) + (x − 4)x 0 0 1 1 0 0 0 0 0 1 0 1 0 0
−(x + 3 + y) + x(y + x + 3) 0 0 0 0 0 0 0 0 0 1 0 1 0 0
The implicative graph was then built from a file with 41 lines and 36 + 5
attributes. A first work with one premise is presented; a second one follows
with two premises.
88 M.-C. Croset et al.
One premise.
Three sets of implications emerge from the analysis and are presented in Fig. 3:
• The student did not transform the expression, NoInf, when the common
factor or one of the cofactors is a product (FProduct and CofProduct). For
example, she did not treat exercises like x(x + 2)(x + 1) + 3(1 + x)(2 + x),
where the common factor is a product. She also did not answer when the
common factor is a disconnected sum (FSum2TermDisc), or an opposite
one (FSum2TermOpp). For example, the expression (x + (x + 2) × x + 2)
and −5x − 8 + (5x + 8)(x + 2) contributed respectively to the implication
(N oInf → F Sum2T ermDisc) and (N oInf → F Sum2T ermOpp). These
results are interesting. Even when the student does not answer, it means
something at a cognitive level: considering a product as a common factor
seems too difficult for this student.
• The student performed a correct factoring when the common factor was
numeric (Fnum), or monomial (Fmono). For example, the student correctly
factorized the expression −3y − 12 or 3x + 9x(x + 2).
• No interesting information about the ErNothing rule use emerges at this
level. A necessary condition for the use of ErNothing is the presence of
a unit cofactor in the expression (CofUnit). Indeed, this rule cannot be
applied if the cofactor is not a unit. It is this information that appears in
the implicative graph. We will see that with two premises we have more
information about the use of this rule.
We also added a new action, called Transformation that is the oppo-
site of the action NoInf. It means that the student transforms an expres-
sion either correctly or not. The following implications emerge at threshold
89 (see Couturier chapter): (F sum2T ermObvious → T ransf ormation) and
(F sum2T ermCommut → T ransf ormation). When the common factor is a
sum, either obvious or commuted, the student tries to transform the expres-
sion, the transformation being correct or not.
Fig. 3. Implicative graph with one premise. Threshold at 86. Case of factoring
Student’s Algebraic Knowledge Modelling 89
Two premises.
The implications are more precise. One of the implications with two premises
is interesting to describe. The student uses the rule ErNothing especially
in the cases where the common factor is an obvious sum of two terms,
Fsum2TermObvious, and of course, one of the cofactors is a unit Cofunit,
cf. Fig. 4. The transformation (x + 3)(x + 2) + (x + 3) 7→ (x + 3)(x + 2) con-
tributes to this implication. When looking at the data, we see that indeed,
the student did not use this rule when she was confronted with an expression
like 5x + 5, even if the last cofactor is also a unit.
CofUnitFSum2TermObvious
ErNothing
CofUnit
Fig. 4. Part of the implicative graph with two premises. Threshold at 90. Case of
factoring
The study of the data of this student shows that she has stable behaviour
in the field of factoring. She has correct and incorrect stable actions, well
fixed in her behaviour. The SIA approach made it possible to detect stable
behaviours according to contexts. It answered one of our questions.
previous one: the CHIC table concerns only one student but it includes many
tasks done by her/him. The lines of the CHIC table are again elementary
steps, the columns are binary variables and actions. However, in contrast to
section 6, the actions are not the rules but sets of rules.
The Actions.
For this analysis, actions take only two values: either the elementary step is
correct (called Correct) or it is not (called Error). No distinction between the
kinds of errors is made.
The treated CHIC table can contain as many elementary steps as the student
has done. In the following example, 99 steps have been diagnosed. The table
is then a 99 × 15 matrix. The student was chosen from an experiment that
especially concerned movement tasks. The results of the implicative graph are
presented in Fig. 5. This analysis shows that this student (grade 9) seems to
Student’s Algebraic Knowledge Modelling 91
Correct
Error
InOpMinus
Fig. 5. Implicative graph with three premises, threshold 87. Example of a result
for detecting a task that is the source of the student’s errors. In this case, the task
of collecting like terms, especially in the presence of a minus sign, seems to cause
regular difficulties for this student
This treatment answers to our previous questions: the SIA makes it possi-
ble to detect tasks that do not present difficulties to the student and those that
do. The student in question has difficulties with the specific task of collecting
like terms. It is this task that can be interesting to model, as it was done in
section 6: selection of the elementary steps concerning this task among the
whole 99 steps, association of the context variables in this task and analysis
of the resulting implicative graph. The last part can sometimes be complex.
In the next section, we will try to overcome this problem.
We are concerned by a part of the task of collecting like terms: the sum
of two terms of the same degree, when the degree obtained is correct. In
another words, we consider only elementary steps like axm + bxm 7→ cxm ,
where m can be null, a and b can be positive or negative. We do not consider
transformations like axm +bxm 7→ cxp , where p is not equal to m. In addition,
we consider only four possibilities for the coefficient c: c is obtained as a sum
depending on a and b described below. We set aside the case where c is equal
to a or b, or ab, and so on. We look for behaviour groups for this sub-task.
5
If we take the notation used in the context definition (section 5.1), there are
i=n
Y
p mi couples.
i=1
Student’s Algebraic Knowledge Modelling 93
The Rules.
The actions correspond to the four rules concerning this sub-task:
• Cor is the correct calculation of the sum a + b (e.g., 2x − 5x 7→ −3x or
2x3 + 5x3 7→ 7x3 ).
• PlusOp, meaning PlusOpposite, is the sum of a and the opposite of b. The
rule can be written as a+b 7→ a−b (e.g., 2x − 5x 7→ 7x or 2x+5x 7→ −3x).
• OpPlus, meaning OppositePlus, is the sum of the opposite of a and b.
The rule can be written as a + b 7→ −a + b (e.g., 2x − 5x 7→ −7x or
2x + 5x 7→ 3x).
• OpPlusOp, meaning OppositePlusOpposite, is the sum of the opposite of
a and the opposite of b. The rule can be written as a + b 7→ −a − b
(e.g., 2x − 5x 7→ −3x or 2x + 5x 7→ −7x).
The Contexts.
We decided to restrict the context variables associated to this task. We chose
only two variables: sign of a and b, order of |a| and |b|.6
The first variable can be decomposed into four binary variables: (sign of
a is plus, sign of b is plus), (sign of a is minus, sign of b is plus), (sign of a
is plus, sign of b is minus) or (sign of a is minus, sign of b is minus). The
second variable can be decomposed into three binary variables: |a| is smaller
than |b|, |a| is greater than |b| or |a| is equal to |b|.
For example, in the expression 2x − 5x, the sign of a is plus and of b
is minus. We denote this information by P M (“Plus, Minus”). In this same
expression, |a| is smaller than |b|. This information is denoted C1. The binary
variable |a| is greater than |b| is denoted C2, and the variable a is equal to b
is denoted Eg.
Therefore, there are 12 possible contexts for the task of collecting like terms
denoted by juxtaposing the two binary variables. For example, the context
C1P M means that the context is (sign of a is plus, sign of b is minus, |a| is
smaller than |b|).
The Attributes:
Ordered pair (Context, Rule). Since there are twelve contexts and four rules,
there are 48 attributes. Their name is obtained by the association of the rule
name and the context name. For example, the attribute CorC1PM means
the use of a correct rule, Cor, in the context C1P M , while the attribute
PlusOpC2MP means the use of the rule PlusOp in the context C2M P .
6
We are aware that we do not take into account some important variables such
as the degree of the monomial, a presence of a minus sign in the expression of
the student’s step, or the nature of the coefficients. One of the reasons for this
restriction is that the automation of the process can be long and the work is still
in process: we want to be sure that we obtain results before continuing coding
other variables.
94 M.-C. Croset et al.
The Experiment.
As was mentioned in the section 7, experiments were designed with the pur-
pose of obtaining data about collecting like terms. However, this task appears
in most of exercises that were proposed in these experiments. In this exam-
ple, the students population consists of a group of three grade 8 classes of 87
students in total (n = 87). The table is then an 87 × 48 matrix.
Each student worked on more or less 28 exercises. In total, 2584 elementary
steps concern the chosen sub-task. We obtain the average of 29.7 elementary
steps per student.
Some Results.
In what follows, we describe five of the behaviour groups that are highlighted
by the treatment of a similarity tree, see Fig. 6.
The behaviour class (OpP lusOpC2M M , OpP lusOpC1M M ) can be in-
terpreted as follow: when confronted with a sum of two negative terms, a and
b, the students from this behaviour class sum |a| and |b|. For example, they
transform the expression −2x−5x into 7x or −5−2 into 7. This can mean that
they understand that they have to add |a| and |b|, but they make a mistake
for the sign of the result, making it always positive.
The second behaviour class (CorC1P P , CorC1M P , OpP lusOpC2M P ,
OpPlusOpC1PM, OpP lusOpC2M M , OpP lusOpC1M M ) looks more pre-
cisely at the meaning of the previous behaviour class. Let us take the example
of the six attributes of this class given in Tab. 5.
These students sum correctly when they are confronted with a sum of two
positive terms, or with a sum of a negative term (−5), smaller (in absolute
value) than a positive term (2), (T 1, T 2). When they are confronted with
a sum of a negative term greater (in absolute value) than a positive term,
the negative term being on the left (T 3) or on the right (T 4), the students
calculate | − 5 + 2|. The cases T 5 and T 6 have been already explained in the
previous class.
Student’s Algebraic Knowledge Modelling 95
We can interpret this group as follow: the students know that for a sum of
one positive and one negative terms, they have to subtract the smaller term
from the greater one; however, they do not know that the result must have
the same sign as the greater term. For a sum of two terms with same sign,
they know they have to add the two terms. However, they do not know that
the sign of the result is the same as the sign of the terms. That is particularly
visible when the two terms are negative. What they may be doing is that
they multiply the signs of the two terms, as if it were a multiplication. They
apply the rule: ∆|a| + ∆|b| 7→ |a| + |b|, where ∆ is the sign of the term a
and b.7 This interpretation was verified by replaying the work of the students
who contributed the most to this class, via the ‘replay system’ in Aplusix. In
particular, we observed that these students are never wrong about the sign
in the case of a multiplication of two terms: they mastered the sign rules for
multiplication. In order to verify this hypothesis it is possible to insert rules
about multiplication of two terms in the CHIC table and see whether such
groups emerge.
The third behaviour class (CorC2MP, CorEgMM, CorC1MM, CorC2MM )
is a class of correct actions.
The fourth & fifth ones (PlusOpC1PP, OpPlusC2PP, OpEgPP, OpPlus-
C1PP, PlusOpC1PP ) seem strange: the students subtract one term from the
another even if neither of them has minus sign: the contexts are all positive
terms (P P ). This is probably due to the choice of the variables: we have
selected only two context variables. If we consider and add the variable “pres-
ence of a minus sign in the source of the student’s step”, we would perhaps
have an explanation for this class. Behaviour classes such as those generalising
behaviours of the students A and B presented in section 4 may appear.
2MP
OpC PM
1MM
2MM
1
P
P
P
OpC
OpP usOpC
OpP OpC1P
Cor usOpC
C2P
C1P
C2P
P
Cor 1PP
Cor 2PM
M
P
M
lus
C1M
lus
EgP
EgM
C2M
lus
gPP
lus
sOp
C1M
C1P
C2M
C1M
C
s
l
C
OpP
Cor
OpP
Plu
OpP
Cor
Cor
Cor
Cor
OpE
OpP
Plu
Cor
Cor
7
Application of an incorrect rule leads to a correct result, in the case of the sum
of two positive terms.
96 M.-C. Croset et al.
9 Conclusion
Errors, and more globally behaviours in which we are the most interested,
are those that are “persistent and reproducible”. Catching stable errors and
characterizing types of these errors could allow teachers, didacticians and
artificial tutors to take adequate and personalized didactical decisions instead
of using a systematic and repetitive feedback.
Data on which our stability research is based are collected in the learning
environment for algebra, Aplusix. Each student’s action is viewed as a rule
application. Our purpose was to determine rules that are used regularly by
the student and to link them to the algebraic context in which the student
used them. We aimed at both a detailed description of the student’s learning
state and a high cognitive level of its description. However, mapping a single
action to stable behaviours is not an obvious task.
Our research work proceeded in two phases. First, we have established a
list of the sub-tasks that provoke regular errors. For that, we needed to deter-
mine a good granularity of the sub-tasks. We used the Statistical Implicative
Approach (SIA) to determine, in a students population, the variables that
cause a stable erroneous behaviour. Second, a systematic analysis of each
sub-task allowed us to outline the main behaviour groups.
Thanks to the Statistical Implicative Analysis theory, a certain stability
of behaviours has been observed by associating algebraic context variables to
actions. It has been possible to determine for a student the possible tasks and
sub-tasks that are interesting to model because they provoke stable behaviours
in the student. Moreover, the SIA has permitted to point out the algebraic
context variables that are source of actions: implicative links between variables
and student’s actions have been provided as a result of this analysis at a very
fine grain size. A statistical implicative analysis of frequencies of ordered pairs
(algebraic context, action) for a students’ population has allowed behaviour
groups to emerge: groups of ordered pairs the most used by a part of the
population or those that show a very high level of stability of frequencies.
Using a statistical approach in a didactical research is an original method:
most of the didactical research cannot afford to deal with a large student body.
In addition, interactive learning environments providing models of students
reach either a very fine level of description where steps are not linked together,
or a coarse grain size with general information not linked to a precise domain
(such as information about whether the student learned best with a directed
or explanatory learning tasks [19]).
Student’s Algebraic Knowledge Modelling 97
References
1. G. Brousseau. Les obstacles épistémologiques et les problèmes en mathématiques.
Recherches en Didactique des Mathématiques, volume 4–2, pages 165–198, 1983.
2. R. Sison, M. Shimura. Student Modeling and Machine Learning. International
Journal of Artificial Intelligence in Education, volume 9, pages 128–158, 1998.
3. D. H. Sleeman, A. E. Kelly, R. Martinak, R. D. Ward, J. L. Moore. Studies
of Diagnosis and Remediation with High School Algebra Students. Cognitive
Science, volume 13, pages 551–568, 1989.
4. J. S. Brown, K. Van Lehn. Repair theory: A generative theory of bugs in proce-
dural skills. Cognitive Science, volume 4, pages 379–426, 1980.
5. R Gras. L’analyse implicative: ses bases, ses développements. Educaçao Matem-
atica Pesquisa, volume 4:2, pages 11–48, 2004.
6. C Kieran. The core of algebra: Reflections on its Main Activities. ICMI Algebra
Conference, Melbourne, Australia, pages 21–34, 2001.
7. J. R. Anderson, A. T. Corbett, K. R. Koedinger, R. Pelletier. Cognitive Tutors:
lessons learned. The journal of the learning sciences, volume 4:2, pages 167–207,
1995.
8. D. McArthur, C. Stasz, M. Zmuidzinas. Tutoring Techniques in Algebra. Cogni-
tion and Instruction, volume 7:3, pages 197–244, 1990.
9. M. Beeson. Design Principles of Mathpert: Software to support education in
algebra and calculus. In N. Kajler, editor, Computer-Human Interaction in
Symbolic Computation, Springer-Verlag, Berlin, Heidelberg, New York, pages 89–
115, 1998.
10. R. Prank, M. Issakova, D. Lepp, V. Vaiksaar. Using Action Object Input Scheme
for Better Error Diagnosis and Assessment in Expression Manipulation Tasks.
Maths, Stats and OR Network, Maths CAA Series, 2006.
11. J. F. Nicaud, D. Bouhineau, H. Chaachoua. Mixing microworld and CAS fea-
tures in building computer systems that help students learn algebra. International
Journal of Computers for Mathematical Learning, volume 9:2, 2004.
98 M.-C. Croset et al.
1 The function and its graphic: its basis and its form
In current teaching there is a trend that favors the presentation of mathe-
matical notions (notably that of functions) in a visual way. This trend relies
on curricular developments and in changes of teaching methods based on
the growth and development of information and communication technologies
(ICT), such as graphic, symbolic and programmable calculators or mathe-
matical software as CABRI, Derive, Maple or Mathematica. These curricular
developments and these changes in teaching are in many cases influenced by
the Principles and Standards for School Mathematics [9].
“[Technology Principle] Electronic technologies (calculators and com-
puters) are essential tools for teaching, learning, and doing math-
ematics. They furnish visual images of mathematical ideas [. . . ]
Instructional programs from pre-kindergarten through grade 12
should enable all students to understand patterns, relations, and
functions [Standard Algebra]. In grades 6–8 all students should:
E. Lacasta and M.R. Wilhelmi: The graphic illusion of high school students, Studies in
Computational Intelligence (SCI) 127, 99–117 (2008)
www.springerlink.com © Springer-Verlag Berlin Heidelberg 2008
100 Eduardo Lacasta and Miguel R. Wilhelmi
Caption
Cpsf: competence in problems solving involving linear and quadratic functions
Ctx: competence in textually presented problems solving
Ctb: competence in problems solving, when the function is represented by
a numerical table
Cg: competence in problems solving, when the function is graphically represented
Ptx: student’s preference for the textual presentation of the function
Ptb: student’s preference for the tabular presentation of the function
Pg: student’s preference for the graphical presentation of the function
Caption
The way of presentation and the knowledge concerned for every question is
determined respectively by margins of line and column
F: algebraic formula representation
T: numerical table representation
its: reading of the intersections
sig: sign of the functions for intervals of variable x
com: comparison of functions
rég: partition of the plane by functions.
ext: extrapolation
max: maxima and minima
The “max” column corresponds to questions on the identification of maxima on the
quadratic function (identification on the parabola “Ip”)
The “Cp” column corresponds to the calculation of increasing/decreasing intervals
of the parabola
The “Cd” column corresponds to the determination of the increasing/decreasing
character of the line
The numbers correspond to the groups of questions, from 1 to 5
The empirical data show a gap between the a priori model and the pro-
cedures really observed. The used method was the factorial analysis (Fac-
torial Analysis of Correspondence —FAC— and Main Components Analysis
—MCA), since it was about “revealing symmetrical relationship and establish-
ing discriminative factors in a population by the way of variables”. In figures
2 and 3 the first two main factors are shown.
In the planes of the two main first factors (figures 2 and 3), the pupils’
success to questions appear grouped according to the mathematical problem
and not according to the kinds of presentation of the function (G, F or T).
The kind of presentation of the questions does not influence the global
success of students in the questionnaire. Thus, there is no sign of the exis-
tence of the graphic conception [1] different from a non graphic conception of
mathematical notions. Contrary wise, the student’s responses vary according
to mathematical notions and not according to the way it is presented (G, T
or F).
104 Eduardo Lacasta and Miguel R. Wilhelmi
1
The concerned graph is always the Cartesian graph. The function is a character-
istic notion of the Secondary education. In Primary education the influence of
the graphic presentation of other mathematical notions can condition the ability
of pupils in problem solving [6].
The graphic illusion of high school students 105
the successful solution of a problem did not imply the efficient use of the
accompanied diagram” [15, p. 495].
Caption
Ngi: numerical resolution when the problem is presented by an incomplete
graph
RG: success in graphically presented problems
RTx: success in textually presented problems
RTb: success in numerical table presented problems
REU: global success in the questionnaire
PTx: preference for textual presentation
PTb: preference for tabular presentation
PG: preference for graphical presentation
Caption
Dashed arrows: transitivity of the statistical implication
Continuous arrows: implication (99%, 95%, 90%)
Vertical scale: ratio of students giving evidence of the respective variables
We used a sample of 87 Spanish High School students (13 years old). The
functions in a first questionnaire have been presented either by means of a
Cartesian graph (G), a numerical table (T) or an algebraic formula (F). The
factorial analysis has allowed to explore the data and it has detected the
statistical proximity (symmetrical relationship) of the behaviours, according
to the treated notions. Indeed, the proximity doesn’t obey the form of the
tasks (graphic or non graphic presentation), but to the involved notions.
In a second questionnaire, on one hand, the proximity of some variables
is detected in the Similarity Analysis. This way, the relationship between the
variable “instruction in graphic representation” and the variable “detection of
the proportion by the way of a graphical technique” is a very strong one. In ad-
dition, the relationship between the “preference for the tabular presentation”
and the “success in the resolution of tabular problems” is notable too.
On the other hand, the SIA detects statistically significant relationships
between the preference for textual presentation and the success in textually
presented problems, that the Similarity Analysis does not show. We also ob-
serve that the preference for the graphic presentation doesn’t imply the success
in its resolution.
These facts take us to conclude that a graphic illusion exists amongst
these students, since similarity relationships do not exist (neither implica-
tive relationship) between the preference for the graphic presentation and the
competence in the resolution of problems presented graphically.
The SIA contributes new elements, even though of delicate interpretation.
The empirical data doesn’t allow to justify the application of a specific teach-
ing method. In fact, there is a radical rupture between explanatory didactics
and normative (or technique) didactics based on two facts:
1. The statistical implication is not transitive. The competence in the deter-
mination of the proportionality by means of the alignment of the points
in the graphic with the origin (Da) implies the success in graphically
presented problems (RG). The competence RG implies the success in tex-
tually presented problems (RTx). However, it’s not possible to deduce
that “Da” implies “RTx”. Consequently, it is not possible to establish the
maxim: “If the students are competent in ‘Da’, then most of them solve
the presented problems in a textual way (RTx)”.
2. The statistical implication admits an interpretation (mathematically iso-
morphic) according to the Set Theory 2 . Given a set E, we defined P (Q
respectively) like the set of elements x in E (x ∈ E) that verify the propo-
sition p (q respectively):
2
The Boolean algebra rules can be written in both set and logic notation.
112 Eduardo Lacasta and Miguel R. Wilhelmi
References
1. N. Balacheff. Conception, connaissance et concept. Séminaire DidaTech, Uni-
versité Joseph Fourier (Paris), 1995.
2. M. Bosch and Y. Chevallard. La sensibilité de l’activité mathématique aux
ostensifs. objet d’étude et problématique. RDM, pages 77–124, 1999.
3. E. G. Bremigan. An analysis of diagram modification and construction in stu-
dents’ solutions to applied calculus problems. JRME, pages 248–277, 2005.
4. G. Brousseau. Theory of didactical situations in mathematics. Kluwer Academic
Publishers, 1997.
5. G. Chauvat. Courbes et fonctions au college. Px, pages 23–44, 1999.
6. I. Elia, A. Gagatsis, and R. Gras. Can we ‘trace’ the phenomenon of com-
partmentalization by using the implicative statistical method of analysis? an
application for the concept of function. In R. Gras, F. Spagnolo, and J. David,
The graphic illusion of high school students 113
First questionnaire
x −6 −5 −4 −3 −2 −1 0 1 2 3 4 5 6 7
F2 (x) −1 0 1 2 3 4 5 6 7 8 9 10 11 12
G2 (x) −21 −12 −5 0 3 4 3 0 −5 −12 −21 −32 −45 −60
H2 (x) −9 −7 −5 −3 −1 1 3 5 7 9 11 13 15 17
1. Find the values of x satisfying:
a) f (x) = h(x)
b) F1 (x) = H1 (x)
c) G2 (x) = H2 (x)
2. Are the following statements true or false?
a) If x < 0, then f (x) < 0, g(x) < 0 and h(x) > 0
b) If x > 1, then F1 (x) < 0, G1 (x) > 0 and H1 (x) < 0
c) If x > 1, then F2 (x) > 0, G2 (x) < 0 and H2 (x) > 0
3. For what intervals of x are the following inequalities true?
a) f (x) > h(x) > g(x)
The graphic illusion of high school students 115
Second questionnaire
Time (minutes): 3 5 9 12
Distance (Km.): 18 30 54 72
Based on the obtained results, does the train maintain its speed? Why?
3. In the Post Office, we are informed that to send packages to a same desti-
nation, the prices are according to the package’s weight and are given by
the following graph:
Is the shipment similarly expensive in all the cases or is there some vari-
ation according to the package’s weight? Why?
4. In a warehouse there are some closed packages that contain cups of the
same type. These packages have labels with the number of cups and the
total weight. This data is represented in the following graph:
Are the labels correct or has there been some error? Why?
5. In another stationary store 4 packages of notebooks are also sold. The
number of notebooks that each package contains and its corresponding
price is given in the following table:
Time (minutes): 3 5 8 20
Distance (Km.): 225 375 675 1 350
Is there a reduction when buying more notebooks? Why?
The graphic illusion of high school students 117
Is the shipment similarly expensive in all the cases or is there some vari-
ation according to the package’s weight? Why?
7. In another warehouse there are also some closed packages that contain
cups of the same type. These packages have some labels with the number
of cups and the total weight. On package A is placed the following label:
“3 cups, 240 grs.”. On the package B: “6 cups, 480 grs.”. On the package
C: “12 cups, 1040 grs.”. On the package D: “16 cups, 1280 grs.”. Are the
labels correct or has there been some error? Why?
8. Tests of the Spanish High Speed train are also being made in a long,
plain and straight line portion of railway. The obtained results are the
following: it takes 2 minutes in traveling 12 kilometers, 4 minutes in trav-
eling 24 kilometers, 8 minutes in traveling 42 kilometers and 10 minutes
in traveling 66 kilometers. Represent this data graphically. Does the train
maintain its speed? Why?
1 Introduction
Data was collected in clusters. A questionnaire (an extract from the ques-
tionnaire can be found in the annexe) was put to students (in scientific,
technological and professional streams) in agricultural high schools in the
Midi-Pyrénées region, in order to determine their representations of PE, team
sports and volleyball.
The questionnaire was divided into three main sections: an initial question to
establish students’ gender, based on items from the BSRI (For a discussion
of the relevance of the BSRI test for determining gender attitudes, see [15]);
a second section with questions about PE in general (its usefulness, students’
preference for a particular discipline, views on mixed-sex sports education,
teacher’s sex, etc); and a third section consisting of questions about team
sports and volleyball in particular (based on word association tests and on
semantic differenciator test inspired from Osgood, [12]).
The volleyball word association test was adapted from free-association
tests, which are verbal productions, used to study the social psychology of
representations [1]. Beginning with a starter word — “volleyball” was used
Implicative networks of student’s representations of Physical Activities 121
here — these tests consisted in asking subjects to say any words or expressions
which came to mind (a maximum of seven here). The spontaneous nature and
the projective dimension of this production reveal the semantic universe of
the subject under study more quickly and easily than in an interview [13].
Semantic differenciator test, inspired from Osgood is a quantitative method
for analysing connotations, consisting in associating a word with pairs of op-
posing representative adjectivesx [10]. Connotation is the intensive definition
of a word. For example: the word “crow” evokes the colour black, a bad omen
and a series of implicit or explicit meanings [11]. It is a general tool for finding
meaning (words, figures, etc.) and the pairs of adjectives have to be adapted
for each case: here, for volleyball. Our work was based on research by David [5]
who used a semantic differentiator adapted for rugby to reveal the differen-
tial aspects of representations of rugby among a mixed-sex PE group. The
pairs of bipolar scales are built using antonymous adjectives drawn from word
lists. The semantic differentiator can be considered as an attitude scale for
predicting behaviour. Moreover it helps build a fairly readable image of the
representations of subject groups (particularly functional representations), an
aspect which is particularly interesting to us as far as volleyball is concerned.
These two techniques were borrowed from methodologies used to explore
students’ representations. They were chosen because they were complemen-
tary. The word association test gave the representations a “fixed” aspect,
whereas the semantic differentiator was based on tactical and dynamic ideas
of volleyball.
According to Bailleul [2] “Statistical Implicative Analysis (SIA) is a partic-
ularly effective tool for studying representations and revealing their organisa-
tional structures”. We used the CHIC software program (Cohesive Hierarchical
Implications Classification) to process the answers in the questionnaire which
related to representations of team sports and volleyball.
The CHIC software program [8, 9] performs implicative analyses, the goal
being to identify how much, statistically speaking, a particular answer to a
particular item leads to another answer to another item, thus determining the
reliability of the “quasi-implications” between variables. The software then
produces an implication diagram of the variables, leading to the identification
of networks of answers, themselves made up of “implicative chains”. The im-
plicative chains are graphic illustrations showing the implications between two
or more variables. The networks are combinations of several chains all leading
to the same variable. The various chains have similar or identical meaning
and are used to interpret the observations [2].
3 Some results
3.1 Characteristics of the population under study
At the end of 2002, questionnaires were sent out to nine General Education
and Agricultural Technology high schools in the Midi-Pyrénées region. After
122 C.-M. Chiocca and I. Verscheure
At the 0.70 threshold implication level, very long chains (up to eight variables)
may be found, as well as relatively distinct networks. We shall call these
networks A, B and C. We shall begin by looking at the networks obtained
at the 0.70 threshold. Then we shall discuss these three networks further,
indicating where the different forms of the sex and gender variables had the
greatest influence on the chains.
Network A
The first network consisted of different chains involving the “team spirit” and
“I like team sports” variables. This network appeared to be characterised by
“involved” variables (resulting from both the semantic differentiator and the
word association test), all referring to the feminine characteristics of volley-
ball expressed in the literature [6, 14]: terms such as “to return it”, “don’t get
hurt”, “to train” or “to ensure”.
A certain work dimension (“to train”, technical moves (“to return it”) and
within certain limits (“don’t get hurt”) led to the development of knowing
how to “play”. This, combined with concerns that could almost be considered
as hygienically oriented (“gentle”, “to relax”, “a fun activity”) made me “feel
good”. “Feeling good”, together with the “making progress” condition (the sec-
ond work reference in this network), including the “control” elements which
enabled me to be a “team player” and “feel good” in a team activity (“I like
team sports”). This representation of volleyball (or even team sports in gen-
eral) seemed well-balanced, as it had the work dimension on one side and the
Implicative networks of student’s representations of Physical Activities 123
Fig. 1. Network A
Network B
With themes like “positive feelings” and “a fun activity” implying that “I like
volleyball” and that it is “a team sport”, we can hypothesise that the represen-
tations which formed this network’s structure were positive attitudes towards
this activity. Volleyball also seemed to be associated with a high regard for
physical qualities (in particular this category included words such as “tall”
and “jump high”) as well as mobility. On the other hand, the game element
seemed to be represented by “to play in continuity” (as confirmed by game-
related words such as “upwards”) and tactics.
Network C
In the third network, what predominated was the representation of the sport
as a test of team strength (“match”, “to win”, “to be ready to fight”, “play as
124 C.-M. Chiocca and I. Verscheure
A team To play
sport in continuity
I♥
volleyball
Body
in motion
I♥
team sports
Fig. 2. Network B
To become
better
To be ready
Attack Get clever To risk Match
to fight
To control
yourself
Play as a
team
I♥
team sports
Fig. 3. Network C
Implicative networks of student’s representations of Physical Activities 125
For each chain, we calculated the influence of the additional variables: two
forms for sex (female and male) and four for gender (androgynous, non-
differentiated, feminine and masculine). Each of these additional variables had
a different influence on the formation of chains in the various networks. We
consider that the influence of an additional variable may be included in our
comments as an answer to our hypotheses, provided that its error rate is less
than 0.10 (a standard criterion for deciding whether a variable significantly
explains a particular phenomenon).
When we looked at the chains that finished with: “play”, “feel good”, “control
yourself”, “play as a team” which were implied by the “I like team sports” vari-
able; we noticed that regardless of what the “previous” variable was (to return
it, to train, to relax, don’t get hurt, gentle), the female form of the additional
sex variable significantly influenced these chains (error rates: 0.0438, 0.0137,
0.042).
On the “a fun activity — feel good — control yourself — play as a team”
chain, it was again the female form of the sex variable which had a significant
influence, with an error rate of 0.0482.
The fact that the female form of the sex variable characterised all these
chains led us to hypothesise that females consider volleyball to be a PSA,
whose important aspects are team spirit, making progress and having fun.
126 C.-M. Chiocca and I. Verscheure
• the “to win — to be ready to fight — I like team sports” chain, with an
error rate of: 0.0184.
• the “rough — to be ready to fight — I like team sports” chain, with an
error rate of: 0.0449.
• the “to win — attack — I like team sports” chain, with an error rate of:
0.041.
• the “to risk — I like team sports” chain, with an error rate of: 0.00572.
However, the androgynous form of the gender variable had a significant
influence on the following chains:
• “champion — to be ready to fight — I like team sports”, with an error rate
of: 0.0189.
• “champion — match — I like team sports” with an error rate of: 0.0246.
This network seemed more comparable with representations of an opposing
relationship in volleyball. The masculine form of the additional gender variable
had a significant influence on several chains, while the androgynous form was
predominant in one chain. It would rather be males, therefore, who would see
volleyball as a sport of continual opposition. The representation of adversity
(towards opponents as well as oneself) was revealed by this network in two
forms: an aggressive manner (“attack”, “rough”) and a cunning manner (“get
clever”, “to control yourself”, “to risk”). We therefore suggest that this network
was more “male” than the previous two.
4 Conclusion
In view of the results, we can say that the sex variable appeared more often
than the gender variable, as an additional variable with a significant influence
on these chains. Only the “feminine”, “androgynous” and “masculine” forms of
the gender variable each had a significant influence on a single chain (the non-
differentiated form of this variable did not predominate in any of the chains),
whereas both forms of the sex variable (“female” and “male”) played a greater
role.
Furthermore, Network B highlighted volleyball’s team dimension and the need
for implementing tactics and have particular mental and physical qualities.
This network, which could be summarized as: “playing volleyball as a team
requires tactics and mental qualities”, seemed closer to the “female” form; al-
though the “feminine gender” was implied when the answers stated that men-
tal qualities were required, while the “masculine gender” was implied when
physical qualities were concerned. The “female” form’s influence on this chain
should thus be moderated.
Network C was more synonymous with representations of volleyball as an op-
posing relationship. There was the significance of the matches, as well as the
desire to win, attack and play as a team. The additional variable having the
128 C.-M. Chiocca and I. Verscheure
greatest influence on this network was the “male” form, with the androgynous
gender also making a contribution to the “ready to fight” chain.
This level of analysis led us to believe that the students could have sexual rep-
resentations of volleyball, although sometimes one should make adjustments
according to gender.
In our other studies, not covered here, we drew parallels between classes that
were revealed using ascendant hierarchical classification (AHC) and networks
revealed using CHIC. We can therefore form student typologies according to
sex, that sometimes need to be adjusted to take gender into account.
While sex is a strong predictor of attitudes and dispositions to team sports
and volleyball, gender is not. We found that females and males prefer mascu-
line forms of team games, and that both see volleyball as relatively gender-
neutral. Both, females and males, appear to like volleyball, but for different
reasons. While the females enjoy the cooperative effort of keeping the game
going and so for continuity, the males show a strong preference for scoring
points and so for discontinuity.
References
1. J.-C. Abric. Pratiques sociales et représentations. PUF, Paris, 1994.
2. M. Bailleul. Mise en évidence de réseaux orientés de représentations dans deux
études concernant des enseignants stagiaires en iufm. In Actes des journées sur
la fouille dans les données par la méthode d’analyse statistique implicative, 2000.
3. S.L. Bem. The measurement of psychological androgyny. Journal of Consulting
and Clinical Psychology, vol. 42, n. 2:pp. 155–162, 1974.
4. D. Bouthier and B. David. Représentation et action: de la représentation initiale
à la représentation fonctionnelle des aps en eps. Méthodologie et didactique de
l’éducation physique et sportive, Ed. G. Bui-Xuan:pp. 233–249, 1989.
5. B. David. Rugby mixte en milieu scolaire. Revue Française de Pédagogie,
n. 110:pp. 51–61, 1995.
6. Davisse. Sport, école et société: la part des femmes. Paris Ed. Actio, pages
pp. 174–263, 1991.
7. Fontayne, Sarrazin, and Famose. The bem sex-role inventory: validation of a
short-version for french teenagers. European Review of Applied Psychology, 50,
n. 4:pp. 405–416, 2000.
8. R. Gras, S. Almouloud, M. Bailleul, and A. Larher. L’implication statistique.
Nouvelle méthode exploratoire de données. La Pensée Sauvage, 1996.
9. R. Gras and P. Kuntz. The implicative statistical analysis — its theoretical
foundations. Kluwer, 2007.
10. Jodelet. L’association verbale. in P. Fraisse et J. Piaget, Traité de psychologie
espérimentale, fasc. VIII:p/97–153, 1972.
11. R. Menahem. Le différenciateur sémantique, le modèle de mesure. L’année
psychologique, 68:pp. 451–465, 1968.
12. C.E. Osgood, G.J. Suci, and P.H. Tannenbaum. The mesurement of meanings.
Chicago, University of Illinois Press, 1957.
Implicative networks of student’s representations of Physical Activities 129
13. M-L. Rouquette and P. Rateau. Introduction à l’étude des représentations so-
ciales. Grenoble, PUG, 1998.
14. Tanguy. Le volley: un exemple de mise en oeuvre didactique. Echanges et con-
troverses. n. 4:p. 7–20, 1992.
15. I. Verscheure, C. Amade-Escot, and C.-M. Chiocca. Représentations du volley-
ball scolaire et genre des élèves: pertinence de l’inventaire des rôles de sexe de
Bem? In RFP 154. 2006.
16. I. Verscheure and C. Amade-Escot. Gender difference in the learning of volley-
ball attack. In In Procedings AIESP Congress: Professionnal preparation and
social needs, 2002.
Appendix
The word association test was introduced in the following way: “What does
the word “volleyball” make you think about? Can you give me some other
words (between 5 and 7) that it brings to mind?”
The students gave us a total of 2386 words (an average of 4.7 words per
pupil); including 521 different words, with some being quoted very often. For
example: the word “net” was quoted 201 times, the word “ball” 198 times; the
word “smash” 175 times, the word “pass” 102 times and the word “team” 97
times. . .
To help process this information, we grouped the words together into sev-
eral categories. We began by grouping together words with the same root
(ball, volleyball), then words which seemed similar in terms of the research
questions, or in terms of their meaning. For example, we categorised the word
“team” together with other word groups including it, such as “good team at-
mosphere”, “team mates”, “team”, “solid team”, “be in a team”, “team game”,
“play as a team”, etc. We combined the “team” words with others relating
to the “collective” theme (e.g.: “learn to play together”, “good group spirit”,
“communal”, “colleagues”, “closeness”, “trust your partners”, “understanding
between players”, “mutual support”, “group”). This category was entitled “the
communal aspect of volleyball”. We continued in the same way for all 521
words, which were eventually grouped together into twenty categories.
To test the validity of grouping the words in this way, we called upon the
“judging method”. Two volleyball experts examined the words in our selected
categories, to let us know whether or not they agreed with the categories the
words had been put into. There was agreement on more than 80 per cent of
the words, so we retained the classification and, after discussion, revised the
word categorisation until a consensus was reached.
Return 3 2 1 0 1 2 3 Attack
Make progress 3 2 1 0 1 2 3 Become athletic
Score a point 3 2 1 0 1 2 3 Play as a team
Prolong the exchange 3 2 1 0 1 2 3 Interrupt the exchange
Become strong 3 2 1 0 1 2 3 Get clever
Be the best 3 2 1 0 1 2 3 Learn to control yourself
Watch the ball 3 2 1 0 1 2 3 Watch the opponent
Rough 3 2 1 0 1 2 3 Gentle
to be ready to fight 3 2 1 0 1 2 3 Don’t get hurt
Play 3 2 1 0 1 2 3 to win
Be a champion 3 2 1 0 1 2 3 Feel good
Static 3 2 1 0 1 2 3 Mobile
Make progress 3 2 1 0 1 2 3 Relax
to train 3 2 1 0 1 2 3 Match
Precision 3 2 1 0 1 2 3 Force
Be good enough 3 2 1 0 1 2 3 to risk
Rupture 3 2 1 0 1 2 3 to play in continuity
Summary. This study aims to gain insight about the distinct features and advan-
tages of three statistical methods, namely the hierarchical clustering of variables,
the implicative method and the Confirmatory Factor Analysis, by comparing the
outcomes of their application in exploring the understanding of function. The inves-
tigation concentrates on the structure of students’ abilities to carry out conversions
of functions from one mode of representation to others. Data were obtained from
587 students in grades 9 and 11. Using Confirmatory Factor Analysis, a model, that
provides information about the significant role of the initial representations of con-
versions in students’ processes, is developed and validated. Using the hierarchical
clustering and implicative analysis, evidence is provided to students’ compartmen-
talized thinking among representations. These findings remain stable across grades.
The outcomes of the three methods were found to coincide and to complement each
other.
1 Introduction
Nowadays the centrality of representations in teaching, learning and doing
mathematics seems to become widely acknowledged. A basic reason for this
emphasis is that mathematical concepts are accessible only through their semi-
otic representations [1]. Kaput suggests that representations are “integrated”
with mathematics. In certain cases, representations, such as graphs, are so
closely connected with a mathematical concept, such as function, that it is
difficult for the concept to be understood and acquired without the use of the
corresponding representation [2].
A given representation, however, cannot describe thoroughly a mathemat-
ical concept, since it highlights only a part of its aspects [3]. This justifies
claims that the use of more than one representation or notation system helps
students to obtain a better picture of a mathematical concept.
The ability to identify and represent the same concept through different
representations is considered as a prerequisite for the understanding of the
particular concept [1, 16]. Besides recognizing the same concept in multiple
systems of representation, the ability to manipulate the concept with flexibility
within these representations as well as the ability to “translate” the concept
from one system of representation to another are necessary for the mastering
of the concept [5] and allow students to see rich relationships [16].
Duval [1, 17] maintains that mathematical activity can be analyzed based
on two types of transformations of semiotic representations, i.e. treatments
and conversions. Treatments are transformations of representations, which
take place within the same register that they have been formed in. Conversions
are transformations of representations that involve the change of the register
in which the totality or a part of the meaning of the initial representation is
conserved, without changing the objects being denoted.
Some researchers interpret students’ errors as either a product of a defi-
cient handling of representations or a lack of coordination between represen-
tations [13, 18]. The standard representational forms of some mathematical
concepts, such as the concept of function, are not enough for students to con-
struct the whole meaning and grasp the whole range of their applications.
Mathematics instructors, at the secondary level, traditionally have focused
their teaching on the use of the algebraic representation of functions [19].
Sfard [20] showed that students were unable to bridge the algebraic and graph-
ical representations of functions, while Markovits, Eylon and Bruckheimer [21]
observed that the translation from graphical to algebraic form was more dif-
ficult than the reverse. Sierpinska [4] maintains that students have difficulties
in making the connection between different representations of functions, in
interpreting graphs and manipulating symbols related to functions. Gagatsis
and Christou [11] developed a model involving some critical paths relating the
conversions from one type of representation to another. These paths indicated
that the conversion from one representation to another is not a straightforward
task. For example, students’ ability to translate a function from its graphical
to the algebraic form was the result of students’ understanding of three other
conversions: (a) the conversion of a function from graphic to verbal form,
(b) the conversion from verbal to graphic function, and (c) the conversion
from verbal to algebraic form of a function. A possible reason for this kind
of behaviour is that most instructional practices limit the representation of
functions to the translation of the algebraic form to the graphic form and not
the reverse. Furthermore, Aspinwall, Shaw and Presmeg [22] suggested that
in some cases the visual representations create cognitive difficulties that limit
students’ ability to translate between graphical and algebraic representations.
Lack of competence in coordinating multiple representations of the same
concept can be seen as an indication of the existence of compartmentaliza-
tion, which may result in inconsistencies and delays in mathematics learning
134 I. Elia and A. Gagatsis
• For which aspects of the study is the application of each statistical method
more appropriate and open to complementary use?
4 Method
from its verbal representation to the graphical and to the symbolic represen-
tation respectively. Students had to carry out 12 conversions in each test, that
is, 24 conversions in total. For each type of conversion the following types of
algebraic relations were examined: y < 0, xy > 0, y > x, y = −x, y = 3/2,
y = x − 2 based on a relevant research of Raymond Duval [23]. The former
three tasks correspond to inequalities and thus regions of points, while the
latter three tasks correspond to functions.
Each test included an example of an algebraic relation in a graphic, verbal
and symbolic form to help students understand what they were asked to do.
The example is illustrated in Table 1.
This section is distinguished into two parts. The first part involves a brief
overview of the rationale, the components and basic concepts of structural
136 I. Elia and A. Gagatsis
equation modeling and CFA, while the second part concentrates on the under-
lying principles, elements and structure of the implicative statistical method
and the hierarchical classification of variables.
from one factor to another imply that a factor causes or predicts another fac-
tor, e.g., in Figure 1 the arrows starting from “AbGV” and pointing toward
“Gvs” and “Vgs” imply that “AbGV” predicts “Gvs” and “Vgs”. A second type
of processes used in a model is the impact of the errors in the measurement of
the observed variables and in the prediction of the latent factors. The impact
of random measurement errors on the observed variables and errors in the pre-
diction of factors are represented as one-way arrows pointing from Es (e.g.,
E1-E12) and Ds (e.g., D1, D2) respectively, to the corresponding variables,
as shown in Figure 1. A third type of processes involved in a model are the
covariances or correlations between pairs of variables, which are represented
as curved or (sometimes) straight two-way arrows (e.g., E1-E2 or E7-E8) as
illustrated in Figure 1.
Bentler’s [27] EQS program was used for testing the CFA models in this
study. The estimation method that was used in EQS was maximum likelihood
solution. The tenability of a model can be determined by using the following
measures of goodness-of-fit: X 2 , CFI (Comparative Fit Index) and RMSEA
(Root Mean Square Error of Approximation) [28]. The following values of the
three indices are needed to support model fit: The observed values for X 2 /df
should be less than 3, the values for CFI should be higher than .9 and the
RMSEA values should be lower than .06.
For the analysis of the collected data, the hierarchical clustering of variables
and Gras’s implicative statistical method have been also conducted using a
computer software called C.H.I.C. (Classification Hierarchique, Implicative et
Cohesitive), Version 3.5. [29]. These methods of analysis determine the hier-
archical similarity connections and the implicative relations of the variables
respectively [30, 31]. For this study’s needs, similarity and implicative dia-
grams have been produced from the application of the analyses on the whole
sample and on each age group of students. The implications of the analyses
were based on the classical theory.
The hierarchical clustering of variables [32] is a classification method which
aims to identify in a set V of variables, sections of V, less and less subtle,
established in an ascending manner. These sections are represented in a hier-
archically constructed diagram using a similarity statistical criterion among
the variables. The similarity stems from the intersection of the set V of vari-
ables with a set E of subjects (or objects). This kind of analysis allows the
researcher to study and interpret clusters of variables in terms of typology
and decreasing resemblance. The clusters are established in particular levels
of the diagram and can be compared with others. This aggregation may be
indebted to the conceptual character of every group of variables.
The construction of the hierarchical similarity diagram is based on the
following process: Two of the variables that are most similar to each other
A comparison between the hierarchical clustering of variables 139
with respect to the similarity indices of the method are joined together in a
group at the highest (first) similarity level. Next, this group may be linked
with one variable in a lower similarity level or two other variables that are
combined together and establish another group at a lower level, etc. This
grouping process goes on until the similarity or the cohesion between the
variables or the groups of variables gets very weak. In this study the similarity
diagrams allow for the arrangement of the variables, which correspond to
students’ responses in the tasks of the tests, into groups according to their
homogeneity.
The implicative statistical analysis aims at giving a statistical meaning to
expressions like: “if we observe the variable a in a subject, then in general
we observe the variable b in the same subject” [30, 33]. Thus the underlying
principle of the implicative analysis is based on the quasi-implication: “if a is
true then b is more or less true”. An implicative diagram represents graphically
the network of the quasi-implicative relations among the variables of the set V.
In this study the implicative diagrams contain implicative relations, which
indicate whether success to a specific task implies success to another task
related to the former one.
It should be noted that the present paper is related to the ones of Elia et
al. [10] and Gagatsis and Christou [11], whose basic findings are included in
the theoretical section (2).
5 Results
Before carrying out CFA, we examined the hypothesis that the data of our
sample come from a normal population. The values of skewness and kurtosis,
each divided by the corresponding standard errors, for the whole sample (0,6
and -3,2) and for each age group (Grade 9: 1,9 and -1,9; Grade 11: 0,0 and
-2,1) indicated that the data were normally distributed.
Next, a series of models were tested and compared. Specifically, the first
model was a third-order CFA model which was designed on the basis of the
results of the study by Elia et al. [10]. It involved one third-order factor which
was hypothesized as accounting for all variance and covariance related to the
second-order factors. The second-order factors represented students’ abilities
to carry out conversions of algebraic relations with the graphic (Test A) and
the verbal mode (Test B) respectively as the source representation. Each of
the second-order factors were assumed to explain the variance and covariance
related to three first-order factors measured by the observed variables cor-
responding to the 12 conversions of Test A and the 12 conversions of Test
B respectively. The former three first-order factors were distinguished with
respect to the conceptual characteristics of the tasks, that is, whether they
involved a function or not or the kind of the function involved. The latter
140 I. Elia and A. Gagatsis
Fig. 1. The elaborated model for the conversions among different modes of repre-
sentation of algebraic relations, with factor loadings for students of the whole sample
and of grades 9 and 11, separately.
6. The first, second and third coefficient of each parameter stand for the
application of the model on the performance of the students of the whole
sample, of grades 9 and 11, respectively.
It is noteworthy that the number of the observed variables is half as the num-
ber of the observed variables of the first model. In the tests the conversion
items of each algebraic relationship formed one task, having the same starting
representation, and students normally treated the two conversion items of the
same relationship “simultaneously”. Therefore, we considered that it would be
more meaningful to integrate the variables that corresponded to the conver-
sions of the same mathematical relation and the same initial representation,
142 I. Elia and A. Gagatsis
the conversion of the constant function (V512a: y = 3/2) with the graphic
form as the source representation was also lower compared to the respective
coefficients of the variables standing for the other functions in Test A. This is
an interesting finding to be discussed later in combination with the results of
the other statistical methods.
Fig. 3. The implicative diagram among the responses of students of grades 9 and
11 to the conversion tasks of Test A and Test B (C.H.I.C. 3.5)
y > x (v312a). Consecutively, students who carried out the conversions of the
graphic representation of y > x were successful at the corresponding conver-
sions of the constant function y = 3/2 (v512a). Success at the latter conversion
task implied success at the conversions of the relation xy > 0 (v212a) corre-
sponding to a region of points, which in turn entailed success at the conversion
of the graphic form of the relation, y < 0 (v112a), representing a region of
points as well. On one hand, these results indicate that in general the hier-
archical ordering of the tasks based on students’ performance to Test A is
congruent with Test B, providing further evidence to the two latter predic-
tions which suggest the establishment of close implicative relationships among
the variables of the conversions of functions and of the implications of success
at these conversions to success at the conversions of regions of points. On the
other hand, unlike Branch B, the variable of the conversion of y > x intervenes
among the variables of the conversions of functions in Branch A, which is not
completely in line with these predictions. The conversion of the graphic repre-
sentation of the relation y > x, which despite being an inequality, it actually
involves a function, was more complex than the corresponding conversion of
the constant function and success at the task of the former relation implied
success at the task of the latter relation.
It is worth noting that besides the implicative relation between the vari-
ables referring to the two conversion tasks that involved the same algebraic
relation y = x − 2 (v612a-v612b), there is another implicative connection link-
ing variables of the two tests. In particular, success at the conversions of the
relation xy > 0 having the graphic mode as the initial representation (Test
A: v212a) implied success at the conversion of the same relation with the
verbal form as the initial representation (Test B: v212b). On one hand, the
fact that only two implicative relations are formed between the variables of
the two tests suggests that students’ success at most of the tasks of Test A
was independent from their success at the tasks of Test B. Thus, support is
provided to the almost compartmentalized ways by which the students ap-
proached the conversions of a different source representation despite involving
the same mathematical content. On the other hand, the fact that in both
relations success at a task of Test A entailed success at a task of Test B pro-
vides evidence to the more difficult character of the conversion starting with
a graph relatively to a conversion starting with a verbal description.
The next figures illustrate the results of the hierarchical clustering of vari-
ables and the implicative method for the students of Grade 9 and 11 sepa-
rately. These results are generally congruent with the outcomes referring to
the whole sample elaborated above, and therefore they are in line with the
predictions concerned with the similarity and implicative relationships of the
variables, with only a number of minor deviations.
Figure 4 illustrates the hierarchical similarity diagram of the “condensed”
variables corresponding to grade 9 students’ responses to the tasks of the
two tests. In line with the general similarity diagram for the whole sample,
two distinct clusters of variables are formed, namely Cluster 1 and 2, which
148 I. Elia and A. Gagatsis
Fig. 4. The hierarchical similarity diagram among the responses of students of grade
9 to the conversion tasks of Test A and Test B (C.H.I.C. 3.5)
to the conversions of the three functions of the test (v412b, v512b, v612b)
and the algebraic relation which involves a function (v312b: y > x). Thus, the
conceptual components of the algebraic relations, distinguished by whether
they involve a function or not, seem to differentiate students’ processes in the
conversions of a verbal representation (Test B).
Figure 5 illustrates the implicative diagram of the “condensed” variables
corresponding to grade 9 students’ responses to the tasks of the two tests.
The results of the implicative analysis are in line with the similarity relations
explained above. Two separate “chains” of implicative relations among the
variables are formed with respect to the test they refer to, namely Chain A
and Chain B. This suggests that success at the conversions of the algebraic
relations in the two tests depended primarily on their initial representation.
The commonality of their content did not have a role. For instance, students
who succeeded at the conversion of a function with the graphic form as the
source representation did not necessarily succeed at the conversion of the same
function with the verbal form as the initial representation. Students carried
out the conversions by activating compartmentalized processes based on the
source representation of the conversion.
The two implicative chains have a similar structure, which stems from
the conceptual components of the tasks. In particular, the conversions of the
functions y = x − 2 (v612a or v612b) and y = −x (v412a or v412b) were the
most difficult tasks, and success at them implied success at the conversions
of almost all the other algebraic relations in each test. The conversions of
the constant function (v512a or v512b) and the algebraic relation involving a
function (v312a or v312b) were less complex. Students exhibited the greatest
facility at the conversion tasks of the algebraic relations standing for regions
of points (y < 0 or xy > 0). Figure 6 illustrates the hierarchical similarity
diagram of the variables corresponding to grade 11 students’ responses to the
tasks of the two tests. The structure of the similarity relations in this diagram
is analogous to the structure in the diagram concerning grade 9 students, as
two similarity clusters are established with respect to the source represen-
tation of the conversions. A main difference though is that the first cluster
(Cluster 1), which refers to the conversions of graphic representations, involves
one similarity group (Group 1a), in which the variable of the fifth task (v512a)
is linked to the variables of the third (v312a), fourth (v412a) and sixth (v612a)
task. Thus, students carried out the conversion of the constant function us-
ing similar processes with the ones when performing the conversions of the
other algebraic relations involving functions. This similarity is weaker though
than the similarities among the conversions of the other relations (3, 4 and 6).
Eleventh graders’ increased consistency in comparison with the ninth graders’
consistency indicates the older students’ realization that the graph of the re-
lation y = 3/2 represents a function despite its dissimilar perceptual form
relatively to the graphs of the other relations involving functions. However,
students performed the conversions of the other relations, i.e. y < 0 (v112a
150 I. Elia and A. Gagatsis
Fig. 5. The implicative diagram among the responses of students of grade 9 to the
conversion tasks of Test A and Test B (C.H.I.C. 3.5)
Fig. 6. The hierarchical similarity diagram among the responses of students of grade
11 to the conversion tasks of Test A and Test B (C.H.I.C. 3.5)
A comparison between the hierarchical clustering of variables 151
Fig. 7. The implicative diagram among the responses of students of grade 11 to the
conversion tasks of Test A and Test B (C.H.I.C. 3.5)
6 Discussion
the verbal representations of the algebraic relations y < 0 and xy > 0 were
approached similarly to each other, this was not the case in the conversions
of the graphic representations of the corresponding relations. This outcome
provides evidence to the students’ disconnected and distinct ways of using the
verbal form and the graphic form as source representations in conversions,
giving further support to the phenomenon of compartmentalization. The im-
plicative diagram indicated that the conversion of the graphic representation
of the function y = x − 2 (v612a) was the most complex task of both tests
and students who provided a correct solution at it, succeeded at all of the
other conversion tasks of both tests. Moreover, the implicative relations link-
ing responses to tasks of the two tests showed that success at some tasks of
Test A entailed success at the corresponding tasks of Test B. The above out-
comes of the implicative method provide evidence to the distinct and more
difficult character of the conversions starting with a graph relatively to con-
versions starting with a verbal description. This differentiation and increased
difficulty may be due to the fact that the perceptual analysis and synthesis
of mathematical information presented implicitly in a diagram often make
greater demands on a student that any other aspect of a problem [22]. The
graphic register functions effectively only under the conventions of a different
mathematical culture. Due to students’ poor knowledge of this new culture,
graphic modes of representation are difficult to decode and work out. It can
be asserted that the support offered by mathematical meta-language is more
156 I. Elia and A. Gagatsis
fundamental than the aid given by the graphic register for carrying out a
translation from one mode of representation to another [10].
Furthermore, the outcomes of the three statistical processes uncovered how
students dealt with different types of algebraic relations in conversions of the
same source representation (Tables 2 and 3, items 4 and 5). The conversions
of the algebraic relations that corresponded to inequalities and thus regions
of points were found to have considerable autonomy from the conversions of
relations involving functions. The separate grouping of the former variables
from the latter ones in the similarity diagrams revealed that students tackled
the conversions of inequalities differently from the conversions of functions.
The lower factor loadings of the variables referring to the conversions of in-
equalities relatively to the corresponding loadings of the variables standing for
the conversions of functions in the CFA model were in line with this outcome.
The implicative diagrams revealed additional information to the above find-
ings, suggesting that the tasks involving functions were more complex than
the tasks involving regions of points in any type of conversion. Moreover, the
implications among the variables showed that students who carried out con-
versions of function from one representation to another were able to succeed
at conversions of the same type involving inequalities. Considering the rela-
tions within the responses to the function tasks, a relatively weak similarity
was observed between students’ response to the conversion of the constant
function and their responses to the other functions, revealing their distinct
ways of approaching the conversion of a graphic representation of this kind
of function. Given that this distinction did not apply in the conversions of
the verbal representation (Test B), it is indicated that the interpretation of
the graphic form of the particular type of function was the main factor dif-
ferentiating students’ performance. The relatively lower factor loading of the
variable referring to the conversion of a constant function from a graphic rep-
resentation to another representation in comparison to the factor loadings of
the corresponding variables of the other functions gave further support to
students’ different conversion processes when dealing with this special kind of
function. The implicative relations revealed that the task involving the par-
ticular function was the easiest one among the conversion tasks of the other
functions. The above findings highlight the effect of the inherent mathemat-
ical foundation of the algebraic relations, i.e. whether they are functions or
not and what kind of functions they are, on students’ processes, consistency
and success in conversions of the same type.
Addressing the effect of age on the abilities to transfer algebraic rela-
tions from one representation to another, the outcomes of all of the analyses
(Table 3, item 6) indicated that students’ compartmentalized ways of using
representations and thinking of algebraic relations in general and functions in
particular occurs in both grades. Specifically, the factorial structure described
above remains invariant, while the abilities to carry out conversions of graphic
or verbal representations of algebraic relations remain compartmentalized in
the similarity and the implicative diagrams for the two grades involved in the
A comparison between the hierarchical clustering of variables 157
study. These findings, which provide support to the results of the studies by
Elia et al [10] and Gagatsis and Christou [11] indicate that relations among
translations from one mode of representation to another do not vary as a func-
tion of grade levels. Thus the relative inherent nature of difficulties of each
type of conversion has age endurance, suggesting that development or regular
instruction does not change students’ processes while dealing with conversions
of functions from one representation to another.
Despite this invariance, some discrepancies in the performance of students
of the two grades occurred as regards the inherent mathematical properties
of the algebraic relations (Table 4, item 2). Based on the similarity diagrams,
eleventh graders were found to respond to the conversion of the graph of the
constant function more coherently with the conversions of the other functions
relatively to the ninth graders. This suggests that the older students were
more competent in recognizing the conceptual components of some algebraic
relations, i.e. whether they represented a function or not, and dealt with
their conversions more consistently. This competence was also indicated by
the outcomes of the implicative analysis on the data of the two age groups
respectively. The hierarchical ordering of the tasks with respect to their level of
difficulty was clearer in the implicative diagram of eleventh graders compared
to the diagram of the ninth graders. Given that the different levels of difficulty
of the tasks stemmed from the type of the algebraic relations involved, it
can be asserted that the older students identified more efficiently the distinct
conceptual properties of each algebraic relation. Thus, the conceptual features
of the algebraic relations seemed to have a stronger impact on their processes
and success levels relatively to the younger students’ responses.
each analysis employed separately, and consequently could enrich and deepen
the outcomes of the investigation.
1
Glossary of initials used in the text:
CFA: Confirmatory Factor Analysis
EQS: Bentler’s Structural Equation Modeling program
CFI: Comparative Fit Index
RMSEA: Root Mean Square Error of Approximation
SEM: Structural Equation Modeling
EFA: Exploratory Factor Analysis
CHIC: Classification Hierarchique, Implicative et Cohesitive
References
1. R. Duval. The cognitive analysis of problems of comprehension in the learning
of mathematics. Mediterranean Journal for Research in Mathematics Education,
1(2):1–16, 2002.
2. J. Kaput. Representation systems and mathematics. In C. Janvier (ed.): Prob-
lems of representation in the teaching and learning of Mathematics, pages 19–26,
Lawrence Erlbaum Associates Publishers, Hillsdale NJ, 1987.
3. A. Gagatsis, M. Shiakalli. Ability to translate from one representation of the
concept of function to another and mathematical problem solving. Educational
Psychology, 24(5):645–657, 2004.
A comparison between the hierarchical clustering of variables 159
Appendix
Test A
N=183 Graphic→Verbal Graphic→Symbolic
Occurrence Mean Occurrence Mean
V1: y <0 139 .76 102 .56
V2: xy > 0 93 .51 72 .39
V3: y > x 67 .37 46 .25
V4: y = −x 75 .41 36 .20
V5: y = 3/2 65 .36 70 .38
V6: y = x − 2 28 .15 13 .07
Test B
N=183 Verbal→Graphic Verbal→Symbolic
Occurrence Mean Occurrence Mean
V1: y <0 132 .72 109 .60
V2: xy > 0 116 .63 72 .39
V3: y > x 83 .45 91 .50
V4: y = −x 58 .32 47 .26
V5: y = 3/2 68 .37 80 .44
V6: y = x − 2 53 .29 44 .24
Test A
N=404 Graphic→Verbal Graphic→Symbolic
Occurrence Mean Occurrence Mean
V1: y < 0 322 .80 288 .71
V2: xy > 0 244 .60 193 .48
V3: y > x 171 .42 145 .36
V4: y = −x 166 .41 137 .34
V5: y = 3/2 205 .51 177 .44
V6: y = x − 2 156 .39 107 .26
Test B
N=404 Verbal→Graphic Verbal→Symbolic
Occurrence Mean Occurrence Mean
V1: y < 0 329 .81 300 .74
V2: xy > 0 339 .84 218 .54
V3: y > x 250 .62 269 .67
V4: y = −x 195 .48 224 .55
V5: y = 3/2 257 .64 286 .71
V6: y = x − 2 204 .50 190 .47
Summary. In this research implicative analysis served to study some previous hy-
potheses about the interrelationships in students’ understanding of different con-
cepts and procedures after 12 hours of teaching elementary Bayesian inference. A
questionnaire made up of 20 multiple choice items was used to assess learning of
78 psychology students. Results suggest four groups of interrelated concepts: con-
ditional probability, logic of statistical inference, probability models and random
variables.
1 Introduction
that might facilitate the teaching of these concepts (e.g. those available from
Jim Albert’s web page at http://bayes.bgsu.edu/). These and other authors
(e.g. [8]) have incorporated Bayesian methods to their teaching and are sug-
gesting that Bayesian inference is easier to understand than classical inference.
This is however a controversial point. On one hand, it is argued [29] that
Bayesian inference relies too strongly on conditional probability, a topic hard
for undergraduate students in non-mathematical majors to learn. On the other
hand, in the past 50 years errors and difficulties in understanding and apply-
ing frequentist inference have widely been described (e.g. in [3, 21]). These
criticisms suggest researchers do not fully understand the logic of frequentist
inference and give a (incorrect) Bayesian interpretation to p-values, statis-
tical significance and confidence intervals. It is then possible that learning
Bayesian inference is not as intuitive as assumed or at least that not all the
concepts involved are equally easy for students. Moreover, empirical research
that analyze the learning of students in natural teaching contexts is almost
non-existent.
Consequently, the first aim of this research was to explore the extent to
which different concepts involved in basic Bayesian inference are accessible
to undergraduate psychology students. A second goal was to compare learn-
ing outcomes with our previous hypotheses that there are different groups of
related concepts (and not just conditional probability) that are potentially
difficult for these students. We finally wanted to explore the implications be-
tween concepts included in each of these groups with the aim of providing
some recommendations about how to best organize the teaching of the topics.
In this sense, implicative analysis was an essential tool. As suggested
by [17], researchers in human sciences are interested in discovering inductive
nonsymmetrical rules of the type “if a, then almost surely b”. The method
provides an implication index for different types of variables; and moreover
serves to represent these implications in a graph or an implicative hierarchy
as a complex non linear system. This specially suits our theoretical frame-
work [16], where knowledge is seen as a complex system, more than as a
linear object, and for this reason, in our research we were interested in finding
the implications between understanding the different mathematical objects
involved in basic Bayesian inference, that is, what the knowledge contents A
that facilitate learning of other different contents B are.
2 Teaching Experiment
The sample taking part in this research included 78 students (18–20 years
old) in the first year of the Psychology Major at the University of Granada,
Spain. These students were taking part in the introductory statistics course
and volunteered to take part in the experiment. The sample was composed of
17.9% boys and 82.1% girls, which is the normal proportion of boys and girls
Implications between learning outcomes in elementary bayesian inference 165
in the Faculty. These students scored an average of 4.83 (in a scale 0–10) in
the statistics course final examination with standard deviation of 2.07.
The students were organized into four groups of about 15–20 students each
and attended a short 12 hour long course given by the same lecturer with the
same material. The 12 hours were organized into 4 days. Each day there were
two teaching sessions with a half hour break in between. The first session (2
hours) was dedicated to presenting the materials and examples, followed by
a short series of multiple choice items that each student should complete, in
order to reinforce their understanding of the theoretical content of the lesson.
In the second session (one hour), students in pairs worked in the computer
lab with the following Excel programs that were provided by the lecturer to
solve a set of inference problems:
1. Program Bayes: This program computes posterior probabilities from prior
probabilities and likelihood (that should be identified by the students from
the problem statement).
2. The program Prodist transforms a prior distribution P (p = p0 ) for a
population proportion p in the posterior distribution P (p = p0 | data),
once the number of successes and failures in the sample are given. Prior
and posterior distributions are drawn in a graph.
3. The program Beta computes probabilities and critical values for the Beta
distribution B(s, f ), where s and f are the numbers of successes and
failures in the sample.
4. The program Mean computes the mean and standard deviation in the
posterior distribution for the mean of a normal population, when the mean
and standard deviation in the sample and prior population are known.
In Table 1 we present a summary of the teaching content. Students were
given a printed version of the didactic material that covered this content. Each
lesson was organized in the following sections: a) Introduction, describing the
lesson goals and introducing a real life situation; b) Progressive development
of the theoretical content, in a constructive way and using the situation pre-
viously presented; c) Additional examples of other applications of the same
procedures and concepts in other real situations, d) Some solved exercises,
with description of main steps in the solving procedure; e) New problems that
students should solve in the computer lab; and f) Self assessment items. All
this material together with the Excel programs described above was also made
available to the students on the Internet (http://www.ugr.es/ mcdiaz/bayes).
We added a forum, so that students could consult the teacher or discuss them-
selves their difficulties, when needed.
for the assessment that was part of the analysis course they were following.
The BIL (Bayesian Inference Learning) questionnaire (which is included in
Appendix) was prepared for this research and is composed of both multiple
choice and some open ended items that were developed by the authors with
the specific aim to cover the most important contents in the teaching. The aim
was to assess learning in the following groups of concepts, which in our a-priori
analysis were assumed to be the core content of basic Bayesian inference and
might cause different types of difficulties to students. These concepts, as well
as the philosophical principles of Bayesian inference had been introduced in
the teaching at an elementary level, adequate to the type of students. We also
assumed learning of one of these groups of concepts would not automatically
Implications between learning outcomes in elementary bayesian inference 167
assure the learning of the other groups, so in the implicative analysis the three
groups could be unrelated.
The aim of Bayesian inference is updating the prior distribution via the like-
lihood to get the posterior distribution, which provides all the information
for the parameter, once the data have been collected [9]. However, it is also
possible to carry out procedures similar to those used in frequentist statistics,
although the interpretation and logic is a little different [7, 16].
Credible intervals provide the epistemic probability that the parameter
is included in a specific interval of values, for the particular sample, while
confidence intervals provide the frequentist probability that in a percentage of
samples from the same population the parameter will be included in intervals
of values computed in those samples. Credible intervals are computed from
the posterior distribution (item 17) and students should be able to compute
them by using the tables of different distributions (items 10, 16); they should
understand that the interval width increases with the credibility coefficient
and decreases with the sample size (item 12).
In Bayesian inference we can compare at the same time different hypothe-
ses; in this case we compute the probabilities for those hypotheses given the
data by using the posterior distribution and select the hypothesis with higher
probability (item 11). In testing only one hypothesis we either compute the
probability for the hypothesis or for the contrary event (item 14); acceptance
or rejection will depend on the value of that probability.
So, there are some conceptual and interpretative differences between clas-
sical and frequentist approaches, but, since both approaches often lead to ap-
proximately the same numerical results, students might not understand these
differences and confuse both approaches [23].
Here A and B are the population subgroups where a and b take the value
1 [19, 33]. This index follows the normal distribution N(0, 1), and from there
an intensity for the implication a ⇒ b is defined by (2).
ϕ(a, b̄) = Pr card X ∩ Y ≤ card A ∩ B (2)
" #rs
1/2
Ψ (A, B) = sup ϕ ai , bj · [C (A) · C (B)] (4)
i∈{1,...,r};j∈{1,...,s}
the parameters in the Beta curve (item 9) and understand how posterior
distributions are achieved from prior distributions and likelihood through
Bayes theorem (item 7) succeeded better in getting a credible interval
for proportions in the continuous case, a task that requires interpreting
probabilities of Beta curves, and understanding the concept of posterior
probability, as well as the concept of credible interval. They also performed
better in discriminating prior and posterior distribution of the mean (item
17). All of this leads to better choosing a non informative prior distribution
for proportion in the continuous case through the Beta Curve (item 8),
and graphically interpreting the parameters in Beta curves (item 20); both
tasks are related to understanding the meaning of these parameters. These
are a subgroup of the tasks we included in the second group of concepts
(parameters, their distribution, prior and posterior distribution) in the a-
priori analysis; specifically most of these tasks are related to Beta curves
that was a concept new to the students.
3. Group 3 (items 11, 12, 14 and 16) is a set of the concepts we included in
the third group in the a-priori analysis: Logic of Bayesian inference. Being
able to correctly test a hypothesis for proportions (item 11) increases the
likelihood of correctly interpreting credible intervals (item 12); and these
two tasks are linked to another group: correctly testing a hypothesis about
the mean (item 14), which, in turn increases the likelihood of correctly
computing a credible interval for the mean (item 16). All this knowledge
is specifically related to the logic of Bayesian methods; understanding
the test of hypotheses facilitates that of credible intervals; inference for
proportion was easier than inference for mean, possibly because students
have to distinguish in the last task the formulas for known or unknown
variance.
4. Group 4: Finally there is a second group of tasks related to conditional
probability (the different parts of Item 18, 2.3 and 1). Correct identifica-
tion of prior probability (item 18.1) facilitates the correct identification of
likelihood from a problem statement (item 18.2) and this leads to correct
computation of posterior probabilities (item 18.3). These three abilities
lead to better identification of conditional probabilities for the contrary
event (item 2.3) and discrimination between prior probability, likelihood
and posterior probabilities in the context of a problem (item1). The sep-
aration between groups 1 and 4 is explained by the different difficulty of
the tasks in the two groups. Tasks in group 4 were easier than those in
group 1 where probabilities are only given by formulas.
Other groupings of items that were non significant at the 95% level were as
follows:
1. Group 5 : Items 6 (assigning adequate prior distribution for the non in-
formative case to proportions in the discrete case), 3 (understanding pa-
rameters as random variables) and 5 (discrimination between parameters
Implications between learning outcomes in elementary bayesian inference 175
6 Discussion
References
1. J. Albert. Teaching introductory statistics from a bayesian perspective. In
B. Philips, editor, Proceedings of the Sixth International Conference on Teaching
Statistics, CD-ROM, 2002.
2. J.H. Albert and A. Rossman. Workshop Statistics. Discovery with Data. A
Bayesian Approach. Key College Publishing, 2001.
3. Lecoutre B., Lecoutre M.P., and Poitevineau J. Uses, abuses and misuses of
significance tests in the scientific community: Won’t the bayesian choice be un-
avoidable? ISR, pages 399–418, 2001.
4. M. Bar-Hillel. Decision Making Under Uncertainty, chapter The Base Rate Fal-
lacy Controversy, pages 39–61. North Holland, Amsterdam, 1987.
5. C. Batanero and E. Sánchez. Exploring Probability in School: Challenges for
Teaching and Learning, chapter What is the Nature of High School Student’s
Conceptions and Misconceptions about Probability?, pages 260–289. Springer,
New York, 2005.
6. J.M. Bernardo. A bayesian mathematical statistics primer. In A. Rossman
and B. Chance, editors, Proceedings of the Seventh International Conference
on Teaching Statistics. International Association for Statistical Education, CD-
ROM, 2006.
7. D.A. Berry. Basic Statistics: A Bayesian Perspective. Belmont, 1995.
8. W.M. Boldstad. Teaching bayesian statistics to undergraduates: Who, what,
where, when, why, and how. In B. Phillips, editor, Proceedings of the Sixth
International Conference on Teaching of Statistics, CD-ROM, 2002.
9. W. Bolstad. Introduction fo Bayesian Statistics. Wiley, 2004.
10. R. Couturier. Subjects categories contribution in the implicative and the simi-
larity analysis. LMSET, pages 369–376, 2001.
11. R. Couturier and R. Gras. Chic: Traitement de données avec l’analyse implica-
tive. In S. Pinson and N. Vincent, editors, Journées Extraction et Gestion des
Connaissances (EGC’2005), pages 679–684 (Vol. 2), 2005.
12. R. Couturier, R. Gras, and F. Guillet. Classification, Clustering, and Data Min-
ing Applications, chapter Reducing the Number of Variables Using Implicative
Analysis, pages 277–285. Springer-Verlag, Berlin, 2004.
13. R. Falk. Conditional probabilities: insights and difficulties. In R. Davidson
and Swift J., editors, Proceedings of the Second International Conference on
Teaching Statistics, pages 292–297, 1986.
14. R. Falk. Studies in mathematics education, chapter Inference Under Uncertainty
via Conditional Probability, pages 175–184 (Vol. 7). UNESCO, Paris, 1989.
15. W. Feller. An Introduction to Probability Theory and its Applications, Vol. 1.
Wiley, 1968.
16. J. D. Godino. Un enfoque ontológico y semiótico de la cognición matemática.
RDM, pages 237–284, 2002.
178 Carmen Díaz, Inmaculada de la Fuente and Carmen Batanero
Appendix: Questionnaire5
5
Correct responses are emphasized in bold.
180 Carmen Díaz, Inmaculada de la Fuente and Carmen Batanero
(C) (D)
Values of Probability Values of Probability
Proportion Proportion
0.00 0.00 0.00 1/4
0.01 0.25 0.25 1/4
0.02 0.50 0.50 1/4
0.03 0.75 0.75 1/4
0.04 1 1 1/4
Item 11. The posterior distribution for the proportion of voters favorable to a
political party is given by the B(30, 40) distribution. From the above data
table, the most reasonable decision is accepting the following hypothesis
for the population proportion
1. H : p < 0.25
2. H : p > 0.55
3. H : p > 0.25
4. H : p > 0.45
Item 12. For the same posterior distribution of the parameter in a population
the r% credible interval for the parameter is:
1. Wider if r increases
2. Wider if the sample size increases
3. Narrower if r increases
4. It depends on the prior distribution
Item 13. In a normal population with standard deviation σ = 5 and with no
prior information about the population mean, we pick a random sample
of 25 elements and get a sample mean x̄ = 100. The posterior distribution
of the population mean is:
1. A normal distribution N (100, 0.5)
2. A normal distribution N (0, 1)
3. A normal distribution N (100, 5)
4. A normal distribution N (100, 1)
Item 14. To test the hypothesis that the mean µ in a normal population with
standard deviation σ = 1 is larger than 5, we take a random sample of
100 elements. To follow the Bayesian method:
1. We compute the sample mean x̄ and then compute P x̄−5
0.1 < 5 ; when
this probability is very small, we accept the hypothesis.
2. We compute x̄ and then compute P x̄−5
0.1
< Z ; when Z is
the normal distribution N (0, 1); when this probability is very
small, we accept the hypothesis.
3. We compute the sample mean x̄ and then compute P x̄−5
0.1 > Z when
Z is the normal distribution N (0, 1); when this probability is very
small, we accept the hypothesis.
4. We compute the sample mean x̄ and then compute P x̄−5
0.1 > 5 when
this probability is very small, we accept the hypothesis.
Item 15. In a sample of 100 elements from a normal population we got a mean
equal to 50. If we assume a prior uniform distribution for the population
mean, the posterior distribution for the population mean is:
1. About N (50, s), where s is the sample standard estimation.
2. About N (50, s/10), where s is the sample standard estima-
tion.
3. We do not know, since we do not know the standard deviation in the
population
Implications between learning outcomes in elementary bayesian inference 183
4. About N (0, 1)
Item 16. The posterior distribution for a population mean is N (100, 15). We
also know that P (−1.96 < Z < 1.96) = 0.95, where Z is the normal
distribution N (0, 1). The 95% credible interval for the population mean
is:
1. (100 − 1.96 · 1.5, 100 + 1.96 · 1.5)
2. (100 − 1.96, 100 + 1.96)
3. (100 · 1.5 − 1.96, 100 · 1.5 + 1.96)
4. (100 − 1.96 · 15, 100 + 1.96 · 15)
Item 17. In a survey to 100 Spanish girls the following data were obtained:
Mean Standard dev.
Sample 160 10
Prior distribution 156 13
Posterior distribution 158.5 7.9
To get the credible interval for the population mean we use:
1. The normal distribution N (160, 10)
2. The normal distribution N (156, 13)
3. The normal distribution N (158.5, 7.9)
4. The normal distribution N (160, 0.5)
Item 18. 20% of boys and 10% of girls in a kindergarten are immigrant. There
are about 60% boys and 40% girls in the center. Use the following table
to compute the probability that an immigrant child taken at random is a
boy.
Events Prior probabilities Likelihoods Product Posterior probabilities
Sum 1 1
8
2.0
6
f(x)
f(x)
4
1.0
2
0.0
0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
x x
3.0
2.0
2.0
f(x)
f(x)
1.0
1.0
0.0
0.0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
x x
Personal Geometrical Working Space:
a Didactic and Statistical Approach
Alain Kuzniak
Work initiated by Bachelard [1] and Koyré [2] and pursued in mathematics
by Lakatos [3] showed that the idea of a peaceful scientific evolution of math-
ematical concepts was an illusion. Kuhn [4] brought the conflicting logic of
scientific ideas to a culmination: he sees the transition from one paradigm to
another as a revolution whereby a new paradigm replaces the old one.
Our view of the study of geometry is based on an approach asserting that
geometry has undergone significant changes of perspectives equivalent to par-
adigmatic shifts. Following Gonseth [5] who places geometry in relation to
the problem of space and applying Kuhn’s notion of a paradigm, we con-
sider three geometrical paradigms [6, 7] that organize the interplay between
intuition, deduction, and reasoning in relation to space:
Personal Geometrical Working Space: a Didactic and Statistical Approach 187
The following problem (Hachette Cinq sur Cinq 4e 1998, page 164) exemplifies
the kind of geometrical exercises for which the existence of a working space
suitable to solve the problem is not obvious.
188 A. Kuzniak
The drawing looks like a square but its status in the problem is not clear.
Is the drawing a real object the problem suggests to study or does it result
from a construction described in a text? And in that case, is the practical
achievement essential or does it only serve as a support for reasoning?
The function of the represented object is usually given by the text of the
problem: this in turn orients towards a precise geometrical paradigm. Here,
the wording gives no such indications and as a student points out: There are
no texts for the wording, only a drawing that can mislead.
Finally, who is right? Charlotte or Marie? Pythagoras’ theorem, which
doesn’t require the real measurement of the angle, gives a typical way of
handling this kind of exercise. But even there, the ambiguity of the choice of
the working space reappears. For our purpose, we shall introduce two forms
of Pythagoras’ theorem, the usual one, an abstracted form, with real numbers
and equalities:
If the triangle ABC is right in B then AB 2 + BC 2 = AC 2
and the other one, a practical form, using approximate numbers and, in a less
common way, approximate figures
If the triangle ABC is “almost” right in B then AB 2 + BC 2 ' AC 2
The first form leads to work in Geometry which deviates from data of
experiment by arguing in the numeric setting. The second formulation appears
rather as an advanced form of Geometry I.
If we work in Geometry II by using the abstracted form of Pythagoras’
theorem, then we can argue, as one student suggests, giving reason to Char-
lotte:
We know that if OEM is right in O then we have OE 2 +OM 2 = M E 2
We verify 42 + 42 = 5.62 and 32 6= 31.26. Thus, OEM is not a right
triangle.
If we use the practical Pythagoras’ theorem in the measured setting then
we shall rather follow the reasoning proposed by another student who con-
cludes:
√
Marie is right OELM is a square since 32 ' 5.6.
Personal Geometrical Working Space: a Didactic and Statistical Approach 189
From the answers given by the students, we can sketch a classification that
takes into account the geometrical paradigm applied in the resolution. It must
be clear that only answers and not the students are classified here. But, by
doing this, a general understanding of students’ behavior is intended.
We have also identified four kinds of answers to the “Charlotte and Marie”
problem. This allows us to bring out four main approaches.
We labeled these four groups PII, PIprop, PIperc, and PIexp, for rea-
sons that will be clarify below. In each case, we will give a typical answer
(Appendix 1) from the sub-population under study.
First, answers using theorem are common among two groups of students,
PII and PIprop.
PII In this case, [St A], the standard Pythagoras’ theorem is applied
inside the world of abstract figures and numbers without considering the real
appearance of the object. Only information which is given by words and signals
(code of segments, indications on the dimension of the lengths), is used, and
Pythagoras’ theorem is applied in its entire formal rigor. To prove that the
quadrangle is a rhombus (four sides of the same length) and to show that it is
not a square (contrapositive of Pythagoras’ theorem), students use minimal
and sufficient properties. We shall consider this population as being inside
Geometry II.
PIprop. This population groups together students who apply the practical
Pythagoras’ Theorem, in fact, to be rigorous, the converse. They generally
conclude that Marie is right [St B]. In that case, the students recognize the
importance of the drawing and of the measurements’ approximation. The
practical Pythagoras’ theorem appears as a tool of Geometry I. We have
designated this population as PIprop to insist on the fact that individuals
190 A. Kuzniak
of this group use properties to argue. The question whether these students
can play with the differences between Geometry I and Geometry II or if their
horizon remains only technological.
An addition to these answers, here are those of students who did not use
Pythagoras’ theorem.
PIexp. We group together students who use their measuring and drawing’s
tools to arrive at an answer. They are situated in the experimental world of
Geometry I. Generally, this type of students concludes that Marie is right [St
C]. But, it is not always the case: a student, using his/her compass, verifies
that the vertices of the quadrangle are not cocyclics and s/he can assert that
OELM is not a square.
PIper. In this last category [St D], we group together students whose
answers are based on perception: Their interpretation of the drawing is the
basis for their answer, and they do not give use any information about their
tools of investigation. It is not easy to know if this lack of deductive proof is due
to a lack of geometrical knowledge or to a real confidence in the appearance of
the figure. To answer to this question, we must have a look at their reasoning
problems.
The typical outcomes presented above are logically quite coherent and do
not contain too many reasoning errors and formulation problems. That is not
true for all cases and we proceeded by performing an analysis of proofs and
the reasoning structure based on the levels of argumentation inspired by Van
Hiele.
We classify in level 1 works, which enumerate a non-minimal list of quad-
rangle properties to justify assertions. In level 2, we place productions, which
evoke a correct relation of inclusion between square set and rhombus set. In
level 3, we set productions that use minimal and sufficient information to
justify assertions.
This analysis allows us to separate two categories of students. In the first
one, widely illustrated by our previous examples, students have solid knowl-
edge concerning the figures’ properties and use level 3 reasoning. The students
of the second category argue with an accumulation of properties and show not
very sound knowledge of the geometrical properties. Here are two examples
illustrating this second group. [St E]
1) The quadrangle OELM is a rhombus. It follows the characteristics
of such a figure: four sides are equal; diagonals cut themselves in their
middle and form a right angle.
2) Both girls are right; OELM is a square, for it has four equal sides
and four right angles. It is also a rhombus, even if this figure that is
rhombus is not necessarily composed of right angles.
Personal Geometrical Working Space: a Didactic and Statistical Approach 191
The didactical study leaves some of the questions raised in introduction pend-
ing. The classification we obtained is a straight product of our theoretical
framework. It is therefore compelling to use statistical tools to test the model
and, at the same time, measure the distance between the classes. In phase 2 of
the exercise, students had to choose the best explanation among the variety
of responses. It turns out that the choices they made depended on the class
they belonged to, plus other unknown reasons. To further the analysis, we
need to understand what determines students’ class membership and define
the relevant sub-classes so that we can explain why members of the same class
could evolve in different ways during the teaching process.
192 A. Kuzniak
The last question about the doubts of the students is taken into account
by aspect 8. This aspect is divided in three variables depending on the content
the students mention: properties, drawing and estimation.
These eight aspects are then shaped into disjunctive (Yes/No) variables to
allow the statistical analysis: that gives 14 characters. We keep neither aspect
5 nor aspect 4c in this analysis.
The study was made with the program Statistica, we give here (fig. 1) the
representation of variables in the prime factorial plane.
Expressing the answer to the problem, Charlotte (CHA) and Marie (MAR)
are obviously the most determining variables and it could have been interest-
ing to consider them as supplementary variables [10, 11]. With the statistical
study, we can correlate these two variables with the others which seem in our
view to be important, as the use of square roots (RAC) or of drawings on the
figure (FIG). Both variables CAR and ACC are pointing in the characteristic
properties use and are determining for a better understanding of students’
reasoning.
The graph shows the three variables connected to the doubts described by
the students: DES for doubts on drawing, APP for the estimation and finally
PROP for the expression of problems linked to the properties. Showing the
relation between rhombus and square, the place of variable LOS is also going
to be interesting to evaluate student’s reasoning level.
We should bear in mind that in the didactic approach the subgroup PIperc
includes the students who answered Marie but without our knowing the ex-
act nature of their reasoning: is the conclusion they give based on the sole
perception or given by default due to the lack of geometrical knowledge and
the neglect of certain properties? We introduced the analysis with levels argu-
mentation to better understand the method used by these students, due to its
194 A. Kuzniak
Thanks to factorial analysis, we can first sketch a map that positions students
in the prime factorial plane. To ascertain a better grouping of the determining
variables, we used the program CHIC and created similarity trees as well as
hierarchic trees that reveal one-way relationships among variables [12, 13].
This approach is essential to studying how students’ geometrical thinking
works and describe their GWSs in a more dynamic way [14, 15].
[CHA, SRAC, PYT, CAR, IND] create a first set which shows us how stu-
dents giving the answer Charlotte (CHA) are reasoning. They use Pythagoras’
theorem (PYT) with a calculation without square roots (SRAC), they master
the notion of characteristic property (CAR) and use only information given
by the problem wording (IND). This group is very close to the one that we
identified under the name of PII.
Another group is organized around variables [LOS, MAR]. The implicative
analysis confirms that students having told from relations between rhombus
and square answered Marie (MAR). This group is close to the one described by
variables [ACC, PROP, FIG], but if students of this group lean their reasoning
on the figure and give a series of properties, at the same time they indicate
their doubts and their difficulties. The students from these two classes argue
differently from those belonging to the first group (PII): visual or experimental
use of the support of the figure, accumulation of arguments or reasoning based
on the global perception of the figure shape.
In a specific way, the statistical analysis shows the coherence of a last
group around [APP, RAC, COR]. These answers use the “approximate” form
of Pythagoras’ theorem. It seems that students from this group are sensitive
to the importance of the estimation and to the question of the drawing which
looks like a square. The way they argue is close to group PII but their regard
to the reality is different.
The combination of both analyses allows the organization of the variables
as shown in the graph of the binary variables in the prime factorial plane
(fig 4).
More precisely [16], the Geometrical Working Space (GWS) is the place
organized to ensure the geometrical work. It makes networking the three fol-
lowing components:
• the real and local space as material support,
• the artifacts as drawings tools and computers put in the service of the
geometrician,
• a theoretical system of reference possibly organized in a theoretical model
depending on the geometrical paradigm.
The geometrical working space becomes manageable only when its user
can link and master the three components above.
To solve a problem of geometry, the expert has to work with a suitable
GWS. This GWS must meet two conditions: its components are sufficiently
powerful to handle the problem into the right geometrical paradigm and,
depending on the user, its various components are mastered and used in a valid
way. In other words, when the expert has recognized the geometrical paradigm
involved in the problem, she/he can solve it thanks to the GWS suited to this
paradigm. When the problem is set a person (the pupil, the student or the
professor), either an ideal expert, this person handles the problem with its
personal GWS. This last one will have neither the wealth nor the performance
of the GWS of an expert.
This focus on the personal GWS, led us to introduce a cognitive dimension
into our GWS approach. For that purpose, we follow Duval [17, p 38] who
points out three kinds of cognitive processes:
• visualization process with regard to space representation and the material
support,
• construction process depending on the used tools (rules, compass) and on
the configuration,
• reasoning in relation to a discursive process.
These three processes are linked in a diagram (fig. 5) we juxtapose with
GWS components.
Now we can interpret the results of the statistical analysis of the students’
answers in connection with the notion of personal GWS. In terms of GWS,
clearly the study points out two systems of reference: one associated to Geom-
etry I and the other to Geometry II. The new dimension brought by the
statistical data is that technical mastery of reasoning and properties knowl-
edge introduces differences between personal students’ GWSs. The influence
of visualization or artifacts changes according to the student
A first students’ group works inside the GWS/GII based on Geometry II
system of reference. We can divide this set in two subgroups. Students of the
Personal Geometrical Working Space: a Didactic and Statistical Approach 197
first groups master enough -at least in the exercise Charlotte and Marie- the
theoretical system of reference. This group matches exactly to PII represented
above by the production of student A. Within the limits of this exercise, this
group masters rules of the geometrical argumentation. When members of this
group evoke doubts on the drawing, they underline the misleading aspect of
the drawing following the traditional view about figure in French geometry
education as soon as Geometry II is set up in the curriculum (Grade 7 or 8).
The second subgroup refers always to Geometry II but with an insufficient
mastery due either to the neglect of certain properties, or to the superficial
understanding of reasoning rules in Geometry II, encountered during their
studies. This subgroup is strictly included in PIperc, a group we noticed the
heterogeneity. We meet students here whose answers is similar to student D
but who express their doubts in a particularly subtle way as this one:
Could we say that diagonals are really perpendicular? Could we say
that the quadrangle has 4 right angles? By using a set square, yes. By
calculating with Pythagora, it is not exact, but approximate.
The second large population that the analysis points out groups together
students who have in mind Geometry I paradigm and work into the working
space GWS/GI To them, the figure given into the problem is a real object
they have to study. The analysis reveals two subgroups, members of the first
use mainly arguments based on visualization and construction to solve the
problem, members of the second use the connection between construction
and proof. In this population, we meet vague answers close to student D
(PIperc) but also to student C who used drawing instruments to verify prop-
erties (PIexp). Part of the students who used the “approximate” Pythagoras’
theorem (PIprop) belongs to this large group.
198 A. Kuzniak
Finally, it is necessary to point out a group which seems to play the game
(GI/GII). These students look to balance between GI and GII but the usual
rules of the didactic contract leave them few possibilities of expressing clearly
their opinion. Members of this group give answers close to student B and
use like him/her the “approached” form of Pythagoras’ theorem but some can
also have used the classic Pythagoras’ theorem by writing their doubts on the
status of the figure drawn in the problem.
Chart summarizing the results:
5 Conclusion
References
1. G. Bachelard. La formation de l’esprit scientifique. Vrin Paris (Translation For-
mation of the Scientific Spirit (Philosophy of Science), Clinamen Press, 1983.
2. A. Koyré. From the Closed World to the Infinite Universe. (Hideyo Noguchi
Lecture), Johns Hopkins University Press; New Ed edition, 1969.
3. I. Lakatos. Proofs and Refutations: The Logic of Mathematical Discovery.
Cambridge University Press, 1976.
4. T. S. Kuhn. The Structure of Scientific Revolutions (Foundations of Unity of
Science). University of Chicago Press, 2Rev Ed edition, 1966.
5. F. Gonseth. La géométrie et le problème de l’espace. Griffon Ed, Lausanne,
1945–1952.
6. C. Houdement, A. Kuzniak. Sur un cadre conceptuel inspiré de Gonseth et
destiné à étudier l’enseignement de la géométrie en formation des maîtres. Ed-
ucational Studies in Mathematics, volume 40/3, pages 283–312, 1999.
7. C. Houdement, A. Kuzniak. Elementary geometry split into different geo-
metrical paradigms. Proceedings of CERME 3, http://www.dm.unipi.it/
~didattica/CERME3/proceedings/Groups/TG7/, 2003
8. A. Kuzniak, J. C. Rauscher. On Geometrical Thinking of Pre-Service School
Teachers. Cerme IV Sant Feliu de Guíxols Espagne, http://cerme4.crm.es/
Papers\%20definitius/7/kuzrau.pdf, 2005.
9. R. Berthelot, M. H. Salin. L’enseignement de la géométrie au début du collège.
petit x, volume 56, pages 5–34, 2001.
10. P. Orus, P. Gregori. Des variables supplémentaires et des élèves fictifs dans la
fouille des données avec CHIC. Actes des troisièmes rencontres ASI Palerme,
pages 279–292, 2005.
11. A. Scimone, F. Spagnolo. The importance of supplementary variables in a case
of an educational research. Actes des troisièmes rencontres ASI Palerme, pages
317–326, 2005.
12. R. Gras. L’implication statistique: nouvelle méthode exploratoire de données.
La pensée sauvage, 1996.
13. P. Kuntz. Classification hiérarchique orientée en ASI. Actes des troisièmes ren-
contres ASI Palerme, pages 53–62, 2005.
200 A. Kuzniak
Gerard Ramstein
LINA, Polytech’Nantes
Rue Christian Pauc BP 50609 44306 Nantes cedex 3, France
[email protected]
Key words: ranking analysis, feature selection, classification rules, gene coregula-
tion, microarray data analysis
1 Introduction
2 Related work
Dealing with imprecise and noisy data is an important issue that has already
been addressed by the researchers in the area of implicative analysis. Their
works put emphasis on the determination of intervals. In [14], an optimal
partition on numeric variables is defined and the quality of implication is
determined by the union of elements of the partition. Another interesting
work [13] introduces fuzzy partitions. We do not use either of these approaches
because we prefer avoiding a partioning procedure. We indeed observed that
microarray datasets often follow a monomodal distribution and the definition
of a partition, fuzzy or not, tends to be arbitrary.
To our knowledge, the implicative analysis has not yet been applied to
microarray data. However, the discovery of association rules has been recently
proposed in this particular application field.
In [8], association rules are extracted from gene expression databases rel-
ative to the yeast genome. A preprocessing retains the genes that are under-
expressed or over-expressed according to their expression values. This work is
based on the A priori algorithm [1] and the usual rule parameters, support and
Microarray analysis 207
confidence. In [24], the authors present a set of operators for the exploration
of comprehensive rule sets. The expression values are discretized according
to predetermined thresholds. The rules are filtered with classical support and
confidence parameters. One drawback of these two methods is the dependency
of the obtained rules to arbitrary thresholds. A similar study [5] incorpo-
rates annotation information, combined with over- or under- expression. [20]
presents HAMB, a machine learning tool that induces classification rules from
gene expression data. FARMER [7] also discovers association rules from mi-
croarray datasets. Instead of finding individual association rules, FARMER
finds interesting rule groups, i.e. a set of rules that are generated from the same
set of individuals. FARMER uses a supervised discretization procedure, based
on entropy minimization. A case study on human SAGE data [3] explores
large-scale gene expression data using the Min-Ex algorithm, which efficiently
provides a condensed representation of the frequent itemsets. The data have
been transformed into a boolean matrix by a discretization phase, the logical
true value corresponding to gene over-expression. The authors analysed the
effect of three different discretization procedures. Our work is closer to the
one proposed in [16]. The authors define the concept of emerging patterns,
where itemsets are boolean comparison operators over gene expressions. They
use an entropy minimization criterion that strongly differs from our approach,
since it takes into account all the samples, while we prefer to extract higher
quality rules, even if they concern only a subset of observations.
For the sake of generality, we consider a set of m individuals for which n mea-
surements have been performed. In our study, individuals are genes (actually
gene products to be more precise) and the measurements correspond to a set
O of n different experimental conditions. A set of experiments generally refers
to a biological study involving different tissue samples. We will call observa-
tion an experiment relative to a particular biological condition and implying
the whole set of individuals (genes). Let M (k, l) be the measurement value
associated to an indidividual k and an observation l. Note that this value may
refer to any ordinal data type. Our analysis relies on this matrix, although
the same study could be performed on the transposed matrix. We call profile
of the individual k the vector p(k) = (M (k, l), l ∈ [1, n]). The profile in our
application is usually called the expression profile and concerns the whole set
of measurements relative to gene k. We define the operator rank that takes
a profile p(k) and returns its observation indexes, ranked in increasing order.
For example, let us consider the profile p(k) = (4.1, 12.3, 1.2, 3.7). We have
rank(p(k)) = (3, 4, 1, 2), which means that the lowest value (i.e. 1.2) has been
measured under condition 3, the value 3.7 under 4, and so on. The study
208 Gerard Ramstein
profile 1 2 3 4 5 6 7 8 9 10
p(A) 6 4 10 7 8 13 3 12 5 2
p(B) 15 12 16 19 10 14 8 7 17 21
Table 1. Measurement values issued from individuals A and B.
Microarray analysis 209
rank o1 o2 o3 o4 o5 o6 o7 o8 o9 o10
r(A) 10 7 2 9 1 4 5 3 8 6
r(B) 8 7 5 2 6 1 3 9 4 10
Table 2. Reordering of the observations using the rank operator. The values repre-
sent the observations of the previous table, namely the column indexes.
Note that this measure is very robust with respect to the data: it is insen-
sitive to monotonic transformations, an interesting property for microarray
data, often prone to various preprocessings. The value of ϕI (A, B) indicates
the quality of the association. Besides, the relative intervals imax and jmax
for which the maximum defined in eq. 2 has been found provides useful in-
formation. The rule A → B can thus be expressed in a more precise and
operational form.
Let us define tA = min(M [A, o], o ∈ rA (imax)), TA = max(M [A, o], o ∈
rA (imax)), tB = min(M [B, o], o ∈ rB (jmax)) and TB = max(M [B, o], o ∈
rB (jmax)). Let o be an observation. The association rule can be expressed as
follows:
if tA ≤ M [A, o] ≤ TA , then tB ≤ M [B, o] ≤ TB (3)
Suppose for example that table 1 concerns two genes A and B issued from the
expression matrix M defined at the beginning of this section. The rule A → B
can then be written as follows: if for an observation o we have 5 ≤ M [A, o] ≤ 7,
then 15 ≤ M [B, o] ≤ 19.
This measure presents the advantage to possess an infinite positive range and
to also increase with the quality of the rule. The parameter λI (A, B) is eas-
ily interpretable: instead of having for instance ϕI (A, B) = 0.9999, we will
consider λI (A, B) = 4, which means that the risk of observing a comparable
210 Gerard Ramstein
situation by chance is equal to 10−4 . In this paper we will equally use expres-
sion 2 or 4 in our numerical examples.
The most widely used microarray analysis concerns the study of gene coreg-
ulation. A gene A is said to be coregulated with a gene B if the expression
profile of both genes is similar. This similarity is generally measured by differ-
ent metrics, such as the Euclidean distance or the Pearson’s correlation. Note
that the intensity of implication is oriented and can favour an association
from A to B rather than from B to A. The usual metrics do not precise any
orientation and has another drawback: the similarity measure is issued from a
global estimation that takes into account all the observations of O while our
implicative analysis searches for partial similarity between interval ranks that
Microarray analysis 211
0
−2
−4
−6
0 20 40 60 80
observations
Fig. 1. Implication of CHA1 over SAM1. The absciss axis represents the 89 exper-
imental conditions. The ordinate axis represents the expression measures. Triangles
belong to the profile of gene CHA1 (YCL064C) and circles belong to the profile of
SAM1(YLR180W). The filled points denote the observations belonging to the rank
intervals that maximize the intensity of implication. The two rank intervals are iden-
tical and the corresponding rule accepts one exception, indicated by a double arrow.
This arrow points out to the fact that there exists an observation (shown by an
unfilled triangle) that is less under-expressed and that does not belong to the rank
interval of SAM1 while it is included in that of CHA1.
of implication for this pair of genes. These results show that the implicative
measure is finer than correlation techniques. Indeed the low values obtained
from the latter will not permit the gene association to emerge, notably with
respect to the large amount of data. In contrast with these results, the im-
plicative value clearly reveals the quality of the rule, stating that the risk to
encounter such an association by chance is less than one over a thousand.
method value
Intensity of implication 0.9992
Pearson correlation -0.16
Kendall correlation -0.0089
Table 3. Comparison of the implication and correlation measures.
The only difference with eq. 2 is that rB (j) is now replaced by a unique obser-
vation set, the set Oc of observations of class c. As we search for classification
rules, we only consider predefined classes. Note however that a more complex
analysis could be performed by accepting any subset of O, this issue being
similar to the selection of genes in an unsupervised study. When the class par-
tition is unequal, the use of the intensity of implication presents an important
advantage. Contrary to the confidence, our measure takes into account the
fact that a class c is over-represented. Note that the intensity of implication
is always null in the extreme case where Oc = O.
The selection of informative genes among a gene set G is performed by the
following algorithm:
Selection algorithm
inputs:
M, G : expression matrix and its gene set
C, L : the classes and the class labelling function
K : the number of informative genes per class
outputs:
igs : the informative gene set
begin
214 Gerard Ramstein
igs ← ∅
for each class c ∈ C do
genelist ← ∅
for each gene g ∈ G do
– compute ϕ = ϕI 0 (g → c)
– genelist ← genelist ∪ {(g, ϕ)}
end;
– sort all pairs (g, ϕ) ∈ genelist in decreasing order of ϕ
– add into igs the K first genes of the sorted gene list
end;
end.
Note that the selection process permits to discover informative genes ca-
pable of discriminating more than one class among the others. A typical sit-
uation will concern genes presenting an over-expression for a given class and
an under-expression for another one. A special case is relative to datasets
that are partitioned into two classes. One expects that a gene that discrimi-
nates one class will automatically be informative for the other one. Actually,
this assertion will not necessarily be true, according to the position of coun-
terexamples at each extremity of the ranking. Two different triplets (g, c1 , ϕ1 )
and (g, c2 , ϕ2 ) will indeed be associated to a same gene g. In the case where
ϕ1 6= ϕ2 , the same gene may be retained in one class and rejected in the other.
350
300
250
200
150
100
50
0
0 2 4 6 8 10
between ALL and AML. As we have two classes, we did the same by setting
the input parameter K of our selection algorithm to 25. We obtained a set of
classification rules that comprises 14 genes described by the authors as dis-
criminative and included in their list of 50 genes. Table 4 shows that the genes
selected by Golub et al. are comparable whith our set in terms of intensity of
implication. The same minimum has been found in both sets and the mean is
almost identical. However, one observes for our selected set that the dispersion
is lower, which seems to indicate that the quality of our classification rules is
higher. To verify this assumption and to analyse the discriminative power of
Classification algorithm
inputs:
M : expression matrix
Γ (M ) : set of discriminative rules
p(s) : the expression profile of an unknown sample s
outputs:
cs : the predicted class of s
begin
parameter:
Microarray analysis 217
where µ and σ are respectively the mean and the standard deviation
of the expression profile. The classification algorithm assigns s to the
most numerous class within the neighbour set. When many features are
bound to be little relevant, feature-weighted distances are preferable to
the Pearson distance. However, the standard k-NN method is easy to
implement and, compared to more sophisticated techniques, it provides
relatively good classification results for microarray data [27].
Random Forest. This classifier [4] consists of many decision trees that deal
with a random choice of samples (with replacement). At each selection
node, only a random choice of conditions is used. The forest selects the
classification having the most votes over all the trees in the forest. Random
forest is especially well-suited for microarray data, since it achieves good
predictive performance even when the number of variables is much larger
than the number of samples, as it has been demonstrated in [10].
Support Vector Machines. A support vector machine [25] is a machine lear-
ning algorithm that finds an optimal separating hyperplane between mem-
bers and non-members of a given class in an abstract space. As random
forests, this classifier shows excellent performance in high-dimensional
variable space and then is well-adapted to classification of microarray
samples [19].
Table 7 presents the results we obtained on a leave-one-out validation
test with these classifiers and our three datasets. Our method, although it is
based on a simple counting of the rules that s verifies, achieves comparable
performances with the most sophisticated techniques.
We apply our study on tumour samples relative to brain cancer. The dataset is
the same as the one presented in the previous section (42 samples, 5 classes).
It is the most complex dataset, because of its number of tumour subtypes
and its classification error rates. As in [4] the data are preprocessed. This
preprocessing comprises thresholding, filtering, a logarithmic transformation
and standardisation of each experiment to zero mean and unit variance. Fil-
tering includes a selection of the first thousand of genes by decreasing order
of variance.
The gene selection process extracts the K = 10 most discriminative genes
for each of the five tumour types. Let Γ (M ) be our new set containing these
50 genes. We compute for each pair (gi , gj ) of genes in Γ (M ) the intensity
of implication associated to gi → gj . We obtain a gene association network
that can be vizualised using any arbitrary layout algorithm. We prefer to
position the genes with respect to the quality of their associations with their
neighbours. Therefore we define a similarity function sim(gi , gj ) as follows:
sim(gi , gj ) = max(ϕ(gi , gj ), ϕ(gj , gi )). To visualise our genes, we express the
distance between two genes gi and gj as follows:
distance(gi , gj ) = ms − sim(gi , gj )
where ms is the maximal value of the elements of the matrix sim. Multi-
dimensional scaling (MDS [18]) provides a visual representation of the prox-
imities among a set of objects. This method permits to associate to a gene g
a point in a plane.
220 Gerard Ramstein
(a) (b)
Fig. 3. Representation of the gene similarity. (a) represents the MDS based on im-
plication analysis: the different tumour types associated to our gene selection are well
separated, contrary to (b), in which the dissimilarity measure is the absolute value
of the Pearson correlation. We obtained a similar mapping by using the Euclidian
distance instead of the Pearson coefficient (figure not shown).
MARCKSL1
NMB
LYN SNRPN
ELA2
TCF3
APLP2
CD63
CCT3
MGST1
CD33
TOP2B
CST3 ACADM
FAH
ZYX
RBBP4
NCOA6
ADM PPBP
In a dataset comprising two classes, there exist only two types of gene
profiles that discriminate these classes. Following the analysis done by the
authors [12], we then consider the two following patterns:
π1 : Genes that present an under-expression for ALL samples and an
over-expression for AML samples.
222 Gerard Ramstein
the sensitivity (also called the true positive rate). Finally, the rule quality λ
is the intensity of implication expressed in its logarithmic form (eq. 4).
The rule F AH → ZY X in line 5 is the most pertinent rule, since it con-
cerns all the individuals of class ALL (100 % of class support) and only them
(100% of class homogeneity): it is a perfect indicator of the leukemia subtype.
Rule ACADM → RBBP 4 (line 1) presents a high intensity of implication,
although it does not concern all the observations of class ALL. Indeed, 7.4%
of observations of class ALL do not respect the rule (note however that this
rule conversely presents a perfect class homogeneity, i.e. all the observations
belong to ALL class). The high value of λ is due to its support that is greater
than that of rule 5 (i.e. rule 1 concerns more individuals than rule 5). Rule 2
has the same premise than the previous rule. Its class homogeneity is reduced.
Almost one third of the individuals do not belong to the majority class of the
rule. This rule reflects the fact that coregulation is not always associated to
the leukemia subtype. It is indeed possible to encounter a remarkable statis-
tical association between genes that is not necessarily linked to the studied
phenotype. That is why one must consider all the parameters expressed in
table 8 before interpreting a rule.
6 Conclusion
Microarray data analysis is generally based on the measure of the similar-
ity between gene expression profiles, such as the absolute Pearson correlation.
The drawback of this measure is that it assumes a global relationship between
genes, while an implication may only concern a particular group of conditions.
The intensity of rank implication is more appropriate for the discovery of par-
tial dependencies. Association rule may therefore help to infer gene regulatory
pathways. Our method is very robust to noise and, unlike correlation tech-
niques, it provides the direction of the relationship.
224 Gerard Ramstein
References
1. R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In Pro-
ceedings of the 20th Very Large Data Bases Conference, pages 487–499. Morgan
Kanfmann, 1994.
2. U Alon, N Barkai, D A Notterman, K Gish, S Ybarra, D Mack, and A J Levine.
Broad patterns of gene expression revealed by clustering analysis of tumor and
normal colon tissues probed by oligonucleotide arrays. Proc Natl Acad Sci U S A,
96(12):6745–6750, Jun 1999.
3. C. Becquet, S. Blachon, B. Jeudy, J. F. Boulicaut, and O. Gandrillon. Strong-
association-rule mining for large-scale gene-expression data analysis: a case
study on human sage data. Genome Biol, 3(12), 2002.
4. L. Breiman. Random forests. Machine Learning, 45:5–32, 2001.
5. Pedro Carmona-Saez, Monica Chagoyen, Andrés Rodríguez, Oswaldo Trelles,
José María Carazo, and Alberto D. Pascual-Montano. Integrated analysis of
gene expression by association rules discovery. BMC Bioinformatics, 7:54, 2006.
6. D. Chen, Z. Liu, X. Ma, and D. Hua. Selecting genes by test statistics. Journal
of Biomedicine and Biotechnology, 2:132–138, 2005.
7. G. Cong, A. Tung, X. Xu, F. Pan, and J. Yang. Farmer: Finding interesting
rule groups in microarray datasets, 2004.
8. C. Creighton and S. Hanash. Mining gene expression databases for association
rules. Bioinformatics, 19(1):79–86, January 2003.
9. M. Dettling and P. Buhlmann. Supervised clustering of genes. Genome. Biol.
Res., 3(12):research0069.1–0069.15, 2002.
10. R. Diaz-Uriarte and S. Alvarez de Andres. Gene selection and classification of
microarray data using random forest. BMC Bioinformatics, 7, 2006.
11. A. Gasch and M. Eisen. Exploring the conditional coregulation of yeast gene
expression through fuzzy k-means clustering, 2002.
12. T. R. Golub, D. K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J. P. Mesirov,
H. Coller, M. L. Loh, J. R. Downing, M. A. Caligiuri, C. D. Bloomfield, and E. S.
Lander. Molecular classification of cancer: class discovery and class prediction
by gene expression monitoring. Science, 286:531–537, 1999.
13. R. Gras, R. Couturier, F. Guillet, and F. Spagnolo. Extraction de règles en
incertain par la méthode statistique implicative. In 12èmes Rencontres de la
Société Francophone de Classification, pages 148–151, Montreal, 2005.
Microarray analysis 225
1 Introduction
In our information society, large databases and data warehouses have become
widespread. This huge amount of information has led to the increasing de-
mand for mining techniques for discovering knowledge nuggets. To meet this
demand, the Knowledge Discovery in Databases (KDD) [16] community pro-
posed the association rule model [1].
Initially motivated by the analysis of market basket data, the task of asso-
ciation rule mining aims at finding relations between items in datasets [9].
J. David et al.: On the use of Implication Intensity for matching ontologies and textual
taxonomies, Studies in Computational Intelligence (SCI) 127, 227–245 (2008)
www.springerlink.com © Springer-Verlag Berlin Heidelberg 2008
228 J. David et al.
Association rules are propositions of the form “If antecedent then conse-
quent”, noted antecedent → consequent, representing implicative tendencies
between conjunctions of valued attributes or items. Association rules have the
advantage of being an easy and meaningful model for representing explicit
knowledge. Furthermore, this unsupervised learning technique does not need
particular information about knowledge to be discovered contrary to classical
supervised techniques (such as decision trees). These advantages have moti-
vated a great deal of research and the publication of association rule extraction
algorithms such as Apriori [1, 2]. Nevertheless, if only minimal support and
confidence values are used, these algorithms typically produce many rules and
it is hard to only select those which may interest the user. One way to face this
problem is to use Interestingness Measures (IMs). IMs aim at assessing the im-
plicative quality of association rules but also some useful characteristics such
as novelty, significance, unexpectedness, nontriviality, and actionability [9,19].
IMs allow to rank and reduce the amount of rules, and consequently to help
the user to choose the best ones according to his/her preferences.
The information society has also led to the development of the Web and
then a great increase in the available data and information. In this vast
Web, resources, often in textual form, tend to be organised into hierarchi-
cal structures. This hierarchical structuring of web contents ranges from large
web directories (e.g. Yahoo.com, OpenDirectory) to online shop catalogs (e.g.
Amazon.com, Alapage.com). Furthermore, with the arrival of the Semantic
Web, such a hierarchical organisation is also used through OWL ontologies
which aim at providing formal semantics of the web contents. Even if the use
of hierarchies helps to structure web information and knowledge, the Web re-
mains heterogeneous. Data exchanges and communications between software
programs or software agents using hierarchical-organised data is consequently
difficult. In order to address such interoperability problems, one must be able
to compare such data structures and find matches between them. Thus, many
matching methods have been proposed in the literature [27, 34, 35]. These
methods aim at finding semantic relations (i.e. equivalence, subsumption, etc)
between entities (i.e. directories, categories, concepts, properties) defined in
different hierarchical structures (filesystems, schemas, ontologies). Even if the
proposed approaches are issued from different communities, they mostly use
similarity measures and as a consequence, a majority of them are restricted
to finding equivalence relations only.
At the intersection of these two research fields, we proposed to use the as-
sociation rule paradigm for matching ontological structures [11]. Our original
approach, named AROMA (Association Rule Ontology Matching Approach),
heavily relies on the asymmetric nature of association rules, which allows it to
match not only equivalence relations but also subsumption relations between
entities. The consideration of subsumption between entities helps to charac-
terise more precisely the matching relations between hierarchical structures
regarding only similarity based approaches. Also, it allows enhancements the
output matches. Furthermore, unlike most approaches designed for matching
On the use of II for matching ontologies and textual taxonomies 229
2 Related work
2.1 Textual taxonomy matching
For the last six years, ontology and schema matching have been widely studied
and many approaches have been proposed in the literature. These methods
come from different communities such as artificial intelligence [7,18,20], data-
bases [12, 29, 32], graph matching [23, 30], information retrieval [28], machine
learning [13, 31, 36], natural language processing and statistics [24]. Although
they are heterogeneous and consequently difficult to compare, preliminary
afforts have been result in surveys of matching techniques [27, 34, 35].
One survey [34] focuses on database schema matching techniques and pro-
poses a classification which discriminates the extensional or element-based ap-
proaches from the intensional or only-schema-based approaches. While many
efforts are concentrated around the intensional matchers, few extensional ap-
proaches have been proposed in the literature.
A survey of intensional matchers can be found in [35]. The authors pro-
pose two classifications of this type of matchers. The first one permits us to
distinguish methods according to their granularity and their interpretation of
the input information (it distinguishes element-level and structure-level tech-
niques and then the syntactic, external and semantic methods). The second
classification distinguishes three classes:
230 J. David et al.
Extensional matchers
the concept path name) and machine learning classifiers for textual docu-
ments (kNN classifier and Naive Bayes text classifier on the documents). The
oPLMap approach also uses some constraints for taking into account the struc-
ture of the taxonomies. In the output, this method provides a set of n-to-n
mapping elements valued by a probability measure.
Hical or SBI [25]. This method uses a statistical test, named κ-
statistic [10], on shared documents for determining matching. κ-statistic tests
the null hypothesis: “κ = 0”. A relation between concepts holds if the null
hypothesis can be dissmissed with a significance level of 5%. Hical proposes a
top-down approach in order to reduce the computing time.
CAIMAN matching service [28]. This matching method is enclosed
in the CAIMAN system for facilitating the exchange of relevant documents
between geographically dispersed people within their communities of inter-
est. In the CAIMAN matching service, each document is represented by a
document vector composed of words and word frequencies weighted by the
TF/IDF measure. Then, the characteristic vectors of concepts are computed
from the document vectors by using the Rocchio classifier. Finally, similarities
between concepts are deduced by evaluating the cosine value between char-
acteristic vectors of concepts. The output matching is a one-to-one matching:
for each source concept, the method retains only the target concept for which
the measure is maximised.
In the framework of association rule discovery, and in order to select the most
interesting rules, many Interestingness Measures (IMs) have been proposed
and studied (see [3, 19, 22, 37] for a survey). In this context, some researchers
are interested in principles and properties defining a good IM [3,17,22,33,37],
while others work on the comparison of IMs from a data-analysis point of
view.
According to our objective of hierarchy matching, we selected some IMs
that may be relevant for our work. In the context of AROMA, unlike
On the use of II for matching ontologies and textual taxonomies 233
Taxonomy of IMs
The taxonomy classifies IMs according to three criteria. The first one concerns
subject of IMs (deviation from independence or equilibrium) and the second
one, the nature of IMs (descriptive or statistical). The last one, the scope
of the IMs (quasi-implication, quasi-conjunction, quasi-equivalence), explains
the semantics of the measure according to logical operators.
Table 3 shows, for each selected IM, its scope (rule (→), quasi-implication
(⇒), quasi-conjunction (↔), quasi-equivalence (⇔)), its nature (statistical (S)
or descriptive (D)), its subject (deviation from independence (I) or equilibrium
(E)), its fixed value in the independence or equilibrium situation (depending
on its subject) and its formula.
On the use of II for matching ontologies and textual taxonomies 235
Linkage
Analysis
Table 3. Selected IMs and their properties
3 AROMA methodology
AROMA (Figure 1) was designed to find matching between conceptual hierar-
chies populated from textual documents. This method permits the discovery
of a set of significant association rules holding between concepts obtained
from two hierarchical structures and evaluated by the Implication Intensity
measure. AROMA takes, as input, two conceptual hierarchies H1 and H2 ,
each defined as a tuple H = (C, ≤, D, σ), where C is the set of concepts, ≤
is the partial order organising concepts into a taxonomy, D is the set of tex-
tual documents, and σ is the relation associating a set of documents to each
concept (i.e. for a concept c ∈ C, σ(c) represents the documents associated
to c). Thanks to the first part of the method concerning the acquisition and
the selection of relevant terms for each concept, we are able to redefine each
hierarchy as a tuple H0 = (C, ≤, T, γ) where T is the set of relevant terms
selected. In order to consider the partial order, we assume that a term associ-
ated with a concept is also associated with itsSparent concepts, and thus, we
extend γ to the relation γ 0 as follows: γ 0 (c) = c0 ≤c γ(c0 ).
a into the set of relevant terms of the concept b. The existence of such a valid
rule means that the concept a (issued from H1 ) is probably more specific than
or equivalent to the concept b (issued from H2 ).
ϕ(a → b) ≥ ϕr (2)
where na∧b = card(γ1∩2 (a) − γ1∩2 (b)) is the number of relevant terms for
concept a that are not relevant for concept b. Na∧b is the random number of
relevant terms for concept a that are not relevant for concept b.
For example (Figure 3), the rule A2 → B4 has nA2∧B4 = 1 counter-
examples. Its Implication Intensity value is calculated using a Poisson law
(which is a possible model for the Implication Intensity [21]):
nA2∧B4
X λk
ϕ(A2 → B4) = e−λ × = 0, 97
k!
k=0
4 Experimental results
The experiments presented in this section concern only the second part of
the AROMA (i.e. the rule selection phase). After describing the data used for
the experiments, we first compare the performance of the measures in terms
of the F-Measure, which aggregates precision and recall. Next, we describe
an analysis of the distribution of the measure values on two sets of matching
relations: a set of hand-made reference matching relations, noted R+ and a
set of irrelevant relations R−.
The experiments used the “Course catalog” benchmark [14]. This benchmark
is composed of two catalogs of courses descriptions which are offered at the
Cornell and Washington universities. The courses descriptions are hierarchi-
cally organised into schools and colleges and then into departments and centers
within each college. These two hierarchies contain respectively 166 and 176
concepts to which are associated 4360 and 6957 textual course descriptions.
The benchmark data also include a set of 54 manually matched relations from
concepts of the Cornell catalog to the Washington catalog. Only equivalence
relations are included in the manually matched set.
On the use of II for matching ontologies and textual taxonomies 239
In this experiment, we studied and compared how the IM’s evaluated matching
relations independently of the AROMA rule selection algorithms and their
240 J. David et al.
35
10
30
20
8
25
15
Frequency
Frequency
Frequency
20
6
15
10
4
10
5
2
5
0
0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
10
8
10
Frequency
Frequency
6
4
5
2
0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
35
30
30
8
25
6
Frequency
Frequency
Frequency
20
20
15
4
10
10
2
5
0
0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
30
25
30
20
Frequency
Frequency
20
15
10
10
5
0
0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
on this type of rules. We can also notice that the quasi-conjunction measure,
LLA, does not distinguish good rules from bad ones.
From these experiments, we conclude that matching relations are better
evaluated by IMs of deviation from independence. In such cases, the number
of counter-examples needed to reach the equilibrium situation is less than the
number needed to reach the independence situation. We also found that the
statistic measure of quasi-implication, II, is well suited for distinguish good
rules from bad ones.
5 Conclusion
In this paper, we proposed an original use of the association rule model and
interestingness measures in the context of schema/ontology matching. More
precisely, we described the AROMA approach, which is an extensional matcher
for hierarchies indexing text documents. A novel feature of AROMA is that it
uses of the asymmetrical aspect of association rules in order to discover sub-
sumption matches between hierarchies or ontologies. Based on studies of IMs,
we selected several IMs according to three criteria (subject, nature and scope)
and we evaluated them on a matching benchmark. The two experiments show
On the use of II for matching ontologies and textual taxonomies 243
that deviation from independence measures are the best adapted IM family
for such an application since the evaluated rules are good regarding the in-
dependence situation, but bad in terms of equilibrium deviation. From these
results, we can also argue that the two descriptive indexes used, Confidence
and Loevinger, tend to have the same behaviour. Due to its deviation from
independence subject, its statistical nature and its quasi-implication scope,
the Implication Intensity obtains the best scores on this benchmark. In this
paper, we analysed measure behaviour on the process of rule extraction be-
tween concepts. We did not study the terminological step of AROMA, which
consists of extracting and selecting concept relevant terms. Such an evaluation
would be interesting since this terminological extraction process significantly
influences the accuracy of the results.
References
1. R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between
sets of items in large databases. In Proceedings of the 1993 ACM SIGMOD
International Conference on Management of Data, pages 207–216. ACM Press,
1993.
2. R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In
J.B. Bocca, M. Jarke, and C. Zaniolo, editors, Proceedings of the 20th Interna-
tional Conference Very Large Data Bases (VLDB’94), pages 487–499. Morgan
Kaufmann, 1994.
3. Jr. Bayardo, J. Roberto, and R. Agrawal. Mining the most interestingness
rules. In Proceedings of the 5th ACM SIGKDD International Conference On
Knowledge Discovery and Data Mining (KDD’99), pages 145–154, 1999.
4. J. Blanchard. A visualization system for interactive mining, assessment, and
exploration of association rules. PhD thesis, University of Nantes, 2005.
5. J. Blanchard, F. Guillet, H. Briand, and R. Gras. Assessing rule interestingness
with a probabilistic measure of deviation from equilibrium. In Proceedings of the
11th international symposium on Applied Stochastic Models and Data Analysis
(ASMDA-2005), pages 191–200. ENST, 2005.
6. J. Blanchard, F. Guillet, R. Gras, and H. Briand. Using information-theoretic
measures to assess association rule interestingness. In Proceedings of the fifth
IEEE international conference on data mining ICDM’05, pages 66–73. IEEE
Computer Society, 2005.
7. S. Castano, V. De Antonellis, and S. De Capitani Di Vimercati. Global viewing
of heterogeneous data sources. IEEE Transactions on Knowledge and Data
Engineering, 13(2):277–297, 2001.
8. S. Castano, A. Ferrara, and S. Montanelli. Matching ontologies in open net-
worked systems: Techniques and applications. Journal on Data Semantics,
3870(V):25–63, 2006.
9. A. Ceglar and J. F. Roddick. Association mining. ACM Computing Surveys,
38(2):5, 2006.
10. J. Cohen. A coefficient of agreement for nominal scales. Educational and Psy-
chological Measurement, 20(1):37–46, 1960.
244 J. David et al.
Summary. The aim of this paper is to study the quantitative tools of the research
in didactics. We want to investigate the theoretical-experimental relationships be-
tween factorial and implicative analysis. This chapter consists of three parts. The
first one deals with the didactic research and some fundamental tools: the a priori
analysis of a didactic situation, the collection of experimental data and the statistic
analysis of data. The purpose of the second and the third section is to introduce the
experimental comparison between the factorial and the implicative analysis in two
researches in mathematics education.
Introduction
the discipline subject of the analysis- and the paradigm of the experimental
sciences. The research in Didactics can be considered a sort of “Experimental
Epistemology”.
The fundamental tool for the research in didactics is the a priori analysis
of a didactic situation.
What does “a-priori analysis” of a didactic situation mean?
It means the analysis of the “Epistemological Representations”, “Historic-
epistemological Representations” and “Supposed Behaviours”, correct and not,
to solve a given didactic situation.
1
The cognitive paths permit the highlighting of the conceptual networks regarding
the didactic situation.
2
The semiotic perspective for the analysis of the disciplinary knowledge allows the
management of the contents with reference to the problems of “communication”
of the contents. This position is not particularly new with respect to the hu-
man sciences, but it represents a real innovation for the technical and scientific
disciplines.
3
In any case, a didactic situation poses a “problem” for the student to solve, either
as a traditional problem (i.e. in the scientific or mathematical framework) or as
a “strategy” to organise the best knowledge to adapt oneself to a situation.
4
For “space of the events” we mean the set of the possible strategies of solutions,
correct or not, and always supposed within a specific historic period by a specific
community of teachers.
5
The “good problem” is the one which, with respect to a given knowledge, allows
the best formulation in ergonomic terms.
Modelling by Statistic in Research of Mathematics Education 249
Fig. 1. The diagram summarizes the relationship between research in didactics and
collection of experimental data.
Each didactic research inevitably causes us to collect some data which can
be considered formed by a collection of elementary informations. Each piece
6
The “variables of a didactic situation” are all the possible variables which happen
into the situation. The “didactic variables” are those which permit a change of
the pupils’ behaviours. So, the didactic variables are a sub-set of the variables of
the didactic situation.
250 E. Malisani et al.
Table 1.
The teacher must take rapid and many decisions and can correct them very
quickly if they prove to be inappropriate. He cannot wait for the result of the
statistical treatment of all his questions. The teacher must try to utilise these
statistical treatments which allow him to arrive quickly at certain conclusions.
The researcher must follow an opposite process:
1. Which hypotheses correspond to the questions that interest us?
2. What data should be collected?
3. Which treatments should be used?
4. What conclusions?
More than the rapidity and the immediate usefulness, it is the consistency,
the stability, the pertinence, and the sureness of the responses which interest
a researcher.
The research with appropriate statistical methods will allow:
1. the communication between teachers about the information they need and
which they collect on the results of the students; the value of the methods
used. . . ;
2. the use, also with discernment, of the results of the research in didactics;
3. the knowledge of the possibilities and the limits of the statistical methods
and so the legitimacy of the knowledge used in their profession;
4. the discussion about this legitimacy;
5. the formulation of some open conjectures to be put to the test of the
experimental contingency;
6. the imagination of the plausibility of these conjectures;
7. to know how to convert their experience into knowledge;
8. the participation in some research.
The teacher must take rapid and many decisions and can correct them very
quickly if they prove to be inappropriate. He cannot wait for the result of the
statistical treatment of all his questions. The teacher must try to utilise these
statistical treatments which allow him to arrive quickly at certain conclusions.
The researcher must follow an opposite process:
1. Which hypotheses correspond to the questions that interest us?
2. What data should be collected?
3. Which treatments should be used?
4. What conclusions?
252 E. Malisani et al.
5. More than the rapidity and the immediate usefulness, it is the consistency,
the stability, the pertinence, and the sureness of the responses which in-
terest a researcher.
6. The research with appropriate statistical methods will allow:
7. the communication between teachers about the information they need and
which they collect on the results of the students; the value of the methods
used. . . ;
8. the use, also with discernment, of the results of the research in didactics;
9. the knowledge of the possibilities and the limits of the statistical methods
and so the legitimacy of the knowledge used in their profession;
10. the discussion about this legitimacy;
11. the formulation of some open conjectures to be put to the test of the
experimental contingency;
12. the imagination of the plausibility of these conjectures;
13. to know how to convert their experience into knowledge;
14. the participation in some research.
Observations
in general their order too. The only operations are the logical ones (set
theory).
It is always possible to transform a numeric variable into an interval, ordinal
or nominal variable (losing some information); an interval variable can be
transformed into an ordinal or nominal variable; an ordinal variable can be
transformed into a nominal variable. The reverse is not true.
The research in didactics uses quantitative and qualitative tools. In this paper
we are trying to deal with the quantitative tools, spending more time, overall,
on theoretical-experimental relationships between factorial and implicative
analysis. Some significant experimental situations will be analysed in the parts
2 and 3.
Implicative analysis
The problem faced by R. Gras [12] arose from the attempt to answer the
following question: “Given some binary variables a and b, how can I be sure
that into a population, from each observation of a, there necessarily follows
the observation of b? ” Or in an even more succinct manner: “Is it true that
if a then b? ”
The answer is not generally possible and the researcher must be satisfied
with an “almost” true implication. By the implicative analysis by R. Gras, one
tries to measure the degree of validity of an implicative proposition between
binary and not binary variables. This statistical tool is used in the Didactics
of Mathematics9 .
The approaches to factorial analysis are of two types; the first one through the
study of the self-values of equations and the second one through a geometric
interpretation (vectors) and some contents of rational mechanics.
The approach presented here is the second one.
Let us consider the Cartesian product E × V (E constituted in general by
n students, n∈N; and V by m variables, m∈N). This is a typical situation
to find data in didactics. The problem is to represent geometrically the dis-
tribution of the two sets in a space of n × m dimensions. Factorial analysis
interprets the geometrical representations. This fact, in the sphere of Human
Sciences, has had many applications in the field of Psychology, but allowing an
9
All the information related to the mathematical theory are found in this volume.
254 E. Malisani et al.
1.4 Conclusions
This research was carried out by a quantitative analysis along with a qualita-
tive analysis. The statistical survey for the quantitative analysis was made by
two phases: in the first experiment, which was realized by a sample of pupils
attending the third and fourth year of study (16–17 years) of secondary school,
the method of individual and matched activity was used; the second experi-
ment was carried out in three levels: pupils from the first school (6–10 years),
pupils from primary school (11–15 years) and pupils from secondary school.
The quantitative analysis of the data drawn from pupils’ protocols was
made by the software of inferential statistics [14] CHIC 2000 (Classification
Hiérarchique Implicative et Cohésitive) and the factorial statistical survey
S.P.S.S. (Statistical Package for Social Sciences).
The research pointed out some important misconceptions by pupils and
some knots in the passage from an argumentative phase to a demonstrative
one of their activity which need to be deepened.
The research was realized on different levels by two experiment. The first ex-
periment, was realized with pupils attending the third and fourth year of study
(16–17 years) of secondary school, the method of individual and matched ac-
tivity was used. Pupils working individually were expected, within two hours,
to answer the following question:
a) Using the enclosed table of primes, the following even numbers can be
written as a sum of two primes (in an alone or in a manner more)? 248; 356;
1278; 3896.
b) If you have answered the previous question, are you able to prove that
it occurs for every even number?
The pupils working in couples were expected, within an hour, to answer
this question (in a written form and only if they agreed):
Is it always true that every even natural number greater than 2 is a sum of
two prime numbers? Let argue about the demonstrative processes motivating
them.
In both cases the procedure was acoustically recorded and the transcript
of those records with comments was made.
The second experiment was carried out in three levels: pupils from the
primary school (6–10 years), pupils from middle school (11–15 years) and
pupils from higher secondary school. The experiment was carried out on the
lowest level in two phases: In the first phase the pupils could answer this
question:
How can you obtain the first 30 even numbers by putting together prime
numbers of the table you have just made?
In the second phase, the pupils created small groups and tried to answer
the following question:
258 E. Malisani et al.
Can you derive the even numbers obtained by summing always and only
two primes? If it is so, can you state this is always the case for an even
number?
The pupils from lower secondary school solved the following problem
within 100 minutes:
Is the following statement always true? “Can an even number be resolved
into a sum of prime numbers?” Argue your claims.
The procedure had four phases:
a) a discussion about the task in couples (10 min.)
b) an individual written description of a chosen solving strategy (30 min.)
c) the division of the class into two groups discussing the task (30 min.)
d) the proof of a strategic processing given by the competitive groups (30
min.)
Pupils from higher secondary school solved the same problem like the
pupils from middle school in the same way and within the same time limit.
Individual works were analyzed (a-priori analysis), the identification of
parameters was carried out and those were subsequently used as a basis for
the characteristics of pupils’ answers. It enabled to do a quantitative analysis
of the answers, to establish an implicative graph (graph of functionality), a
hierarchical diagram, a diagram of similarities and also an analysis of data.
The analyses, graphs and diagrams (or trees) were part of the evaluation of
each experiment together with conclusions.
7. The strategy for Cantor’s method. He/she considers the primes lower then
the given even number and calculates the difference between the given
number and each of primes. (S-Cant)
8. Euler. He/she is uneasy to prove the conjecture because one has to con-
sider the additive properties of numbers. (Euler)
9. Chen Jing-run?s method (1966). He/she expresses an even number as a
sum of a prime and of a number which is the product of two primes.(Chen)
10. He/she subtracts a prime number from an any even number (lower then
the given even number) and he/she ascertains if he/she obtains a prime,
so the condition is verified. (Spa-pr)
11. He/she looks for a counter-example which invalidates the statement of the
conjecture. (C-exam)
12. He/she considers the final digits of a prime to ascertain the truth of the
statement.(Cifre)
13. He/she thinks that a verification of the statement by some numerical
examples needs to prove the statement. (V-prova)
14. He/she does not argue anything for the second question. (Nulla)
15. He/she thinks the conjecture is a postulate. (Post)
The analysis of the implicative graph shows, with percentages of 90%, 95%
and 99%, that pupils’ choice to follow some of the strategies is strictly linked
to a relevant strategy, namely Gold 1, or the one according which the pupil
considers odd prime numbers summing each of them with successive primes.
Hence the basis of pupils’ behaviours is the sequential thinking.
Component 1
such a definition, the pupil named Abdut is who observes how Goldbach’s
conjecture to be verified in a large number of cases, so he supposes it
is also valid for any very large even number, and this fact leads him to
the final thesis, namely the conjecture to be valid for every even natural
number.
b) Intuitionist: this is the pupil using the N-random and Euler strategies
in common with Abdut, but thinking that the demonstration of the con-
jecture can be deduced by a simple numerical evidence, because he is
convinced that what happens for the elements of a small finite set of val-
ues can be generalized to the infinite set which the small set belongs to.
So, he uses the V-prova strategy. In short, in an inductive argumentation
used by the intuitionist pupil the statement is deduced as a generic case
after specific cases.
c) Ipoded: this is just the pupil using a deductive argumentation which can
be directly transposed into a deductive demonstration.
With these new additional variables a transposed matrix (changing rows by
columns) was made by Excel and interpreted by CHIC. The more interesting
results are displayed in figure 4.
an equivalence relation when we put it into a set. In fact, what the relation does
in this case? It orders the elements of the set, so the supplementary variables,
in this case, give much more order to the data. They get the interpretation of
data more effective. Really, they begin attractors for pupils’ behaviours.
1. a part of pupils bites off more than one can chew with the following
conclusion: since the conjecture is true for all of these particular cases,
then it has to be true anyway. These are pupils who have a strong faith in
their convictions, but who do not know clearly enough how to pass from
an argumentation to a demonstration, by using the achieved data.
2. a part of pupils proceeds at the same time by an empirical verification and
by an attempt of argumentation and demonstration ending to a mental
statement. They try to clear a following hurdle: how can I deduce a gen-
eral statement from the empirical evidence? These are pupils who before
making any generalization want to be sure of the made steps, therefore
they tread carefully.
3. few pupils, after a short empirical verification, look at once for a formal-
ization of their argumentations, but if they are not able to do that, they
are not diffident about claiming they are in front of something which is
undemonstrable. These pupils have a high consideration for their mental
processes therefore they think that if they are not able to demonstrate
anything, then it has to be undemonstrable anyway.
By this experimentation we argue that the argumentation favoured by
pupils facing a historical conjecture like Goldbach’s is the abductive one. Some
questions arise from the results which would be advanced by other experimen-
tations:
1. Is this result generalizable?
2. To what extent is it generalizable?
But the fundamental kernel of this experimentation about the interplay
between history of mathematics and mathematics education is that such re-
sults could not be pointed out if the a-priori analysis had not been made by
the historical-epistemological remarks which have inspired it.
and the hypothetical behaviours, correct and not correct. Besides the a-priori
analysis allows us to individualize the variables of the situation-problem and
the hypotheses of research. These hypotheses can be falsified through the
statistic analysis and/or the qualitative analysis of the data.
In the last decade two statistic methods have been very used: the im-
plicative analysis (ASI) of Regis Gras [13, 14] and the correspondence factor
analysis (CFA). The implicative analysis is a powerful tool. It allows a clear
visualization of the relations of similarity and implication among variables or
classes of variables of the situation-problem, through the graphs elaborated
by the software CHIC. The correspondence factor analysis represents geomet-
rically, in a multi-dimensional space, a distribution of two set: the individuals
and the variables of the situation [13]. Since it allows an analysis on small
samples in the field of the not parametric Statistics, it contributes to inter-
pret meaningfully the didactic phenomena. In the last decade the aim of some
studies was to improve the tool in the field of the didactic research and, chiefly,
to create some models ad hoc [35].
This paper is a contribution to the studies on the application of the Statis-
tic Implicative Analysis (SIA) and the Correspondence Factor Analysis (CFA)
in different fields, particularly, in Mathematics Education. This research puts
in evidence the relations between the Implicative Analysis and the Factorial
Analysis to falsify hypotheses of research in mathematics education. We want
to analyze too the type of information obtained by the application of the two
statistic methods12 .
There are a lot of studies on the obstacles which pupils meet during the
passage from the arithmetic to the algebraic thought. Some of them reveal
that the introduction of the concept of variable represents the critical point
of transition [25, 39, 40].
This is a complex concept because it is applied with different meanings in
different situations. Its management depends precisely on the particular way
of its use into the activity of problem-solving.
The notion of variable could take on a plurality of conceptions: general-
ized number (it appears in the generalizations and in the general methods);
unknown (its value could be calculated by considering the restrictions about
the field of existence of the solutions of a problem); “in functional relation”
(relation of variation with other variables); sign totally arbitrary (it appears
in the study of the structures); register of memory (in informatics) [38].
In Malisani and Marino [23] and Malisani and Spagnolo [24] we observed
that the pupils spontaneously evoke the different conceptions of variable as:
numerical value, unknown, “thing which is varying”, also in absence of an
adequate mastery of the algebraic language.
12
This part of the paper is based on [22]
Modelling by Statistic in Research of Mathematics Education 265
It is possible that many difficulties in the study of algebra derive from the
inadequate construction of the concept of variable [8]. An opportune approach
to this concept should consider its principal conceptions, the existing inter-
relationships between them and the possibility to pass from one to the other
with flexibility, in relation to the exigencies of the problem to be solved.
The historical analysis emphasizes that the notions of unknown and the
one of variable as “thing which is varying” have a totally different origin and
evolution. Even if both the concepts deal with numbers, their processes of
conceptualizations seem to be entirely different [26].
In Malisani [20, 21] we studied the relational-functional aspect of the vari-
able in problem-solving, considering the semiotic contexts of algebra and ana-
lytical geometry. We showed that there is a certain interference of the concep-
tion of unknown on the functional aspect, in the context of a situation-problem
and in absence of visual representative registers. We also demonstrated that
the students find some difficulties to interpret the concept of variable in the
process of translation from the algebraic language into natural one.
This paper belongs of the statistic analysis of that experimentation, by
which we wanted to verify if the conception of variable as “thing which is
varying” is evoked, when the notion of unknown prevails in the context of a
situation-problem.
To carry out this research we chose the linear equation in two variables
for two reasons: firstly, because it represents a nodal point from which the
students derive the conceptions of the letters as unknowns or “things which
are varying”. Secondly, this kind of equation is well known by the pupils,
because they studied it from different viewpoints: linear function, equation of
a straight line and component of the linear systems.
We carried out a-priori analysis of the problem. The aim was to determine
all the possible strategies that the pupils could use. Some errors that students
could possibly make in the application of these strategies were also identified.
The pupils worked individually, we did not allow them consulting books
or notes. The where given time was sixty minutes.
In a table we filled in with a double input “pupils/strategy”, and we in-
dicated for every pupil the strategies he used by the value 1 and those he
didn’t apply by the value 0. The data were analyzed in a quantitative way,
by using the statistic implicative analysis of Regis Gras [13, 14], the software
CHIC 2000 and the factorial statistical survey S.P.S.S. (Statistical Package
for Social Sciences).
13
We called “procedure of substitution into the same equation” the incorrect method
where he/she writes one variable in function of the other. Then he/she replace this
variable in the original equation, and thus he/she obtain an identity. In short, the
pupil applies the method of substitution used to solve the systems of equations
to a single equation.
Modelling by Statistic in Research of Mathematics Education 267
The hypothesis
14
In this study we prefer to use the term “multiple solutions” rather than the one
of “infinite solutions”, because we have not considered the possible connotations
of the word “infinite”. However, we defined two experimental variables ALb4 to
take into account the cases in which the pupil explicitly considers the existence
of infinite solutions.
268 E. Malisani et al.
Implicative graph
AL14 AL6
AL9 AL13
AL4.2 AL4.3 ALb5 AL7
ALb4 AL5
AL3 ALb3
ALb6 AL11
AL2
AL4.1
ALb2
AL4
ALb1
AL1
99 95 90 85
The implicative graph on Figure 6 (carried out with the software CHIC
2000) shows three well defined groups of experimental variables with statis-
tic percentages of 95% and 99%. They are pointed out by the cloud on the
left (cloud L), the cloud in the center of the figure (cloud C) and the cloud
on the right (cloud R) (the grey cloud (cloud I) around AL11 indicates the
intersection between cloud C and cloud R).
The three groups are directly or indirectly connected with the variable
ALb1 “the pupil calculates the solution set” and AL1 “the pupil answers to the
question”. Every group corresponds to a different kind of strategy used by the
students:
• Procedure in natural language (cloud L): the pupil adds a datum
considering that the winnings are equal (generally dividing Euro 300 in
half) or that the bets are equal15 . In this way, the student transforms the
question into a typical arithmetic problem and he resolves it finding only
a particular solution verifying the equation. This result is confirmed by
15
The experimental variable AL4 “he/she adds a datum” considers two possibilities:
equal winnings or equal bets (AL4.3). The first case takes into account the two
other alternatives: the winnings are divided in half (AL4.1) or both the teenagers
win Euro 300 (AL4.2).
“To add a datum” is equivalent to introduce a new equation and to forme (with
the equation of the problem 3x+4y = 300 or part of it) a system of two equations
into two unknowns. Therefore a system corresponds to each case.
“The winning of Euro 300 are divided in half ” (AL4.1): it is equivalent to the
system 3x + 4y = 300, 3x − 4y = 150
Modelling by Statistic in Research of Mathematics Education 269
the implicative links among the experimental variables AL2, ALb2 and
AL4 (with its variations AL4.1, AL4.2 and AL4.3). The procedure in the
natural language is the most used by the pupils (Cfr. Table of frequencies
in the Appendix ), and it leads to the single solution. So the predominant
conception of variable is that of unknown.
The graph shows that the first component (horizontal axis) is strongly
characterized by the pair of supplementary variables: NAT and PALG1.
The profiles ALG, PALG2 and FUNZ form a cloud that strongly character-
izes the vertical component. The supplementary variable PALG2 is very near
to FUNZ, because the student who abandons the pseudo-algebraic procedure
generally adopts the profile described in FUNZ.
272 E. Malisani et al.
The winning strategies are precisely those described in the profiles ALG,
PALG2 and FUNZ which lead to multiple solutions, while NAT and PALG1
lead to the oneness of the solution. This finds a strong correspondence with
the different conceptions of “variable”. Therefore, the horizontal axis represents
the conception of variable as unknown, the vertical axis, instead, reproduces
its relational-functional aspect. These results allow us to falsify the hypothesis
again.
3.7 Conclusions
The implicative graph shows the solving strategies applied by the students to
solve the problem:
1. procedure in natural language: it is the most used by the pupils and it
leads to the single solution. The predominant conception of variable is
that of unknown.
2. methods by trials and errors in natural language and/or in half-formalized
language (generally arithmetic): it gets to several solutions. The depen-
dence of the variables is evoked, but a strong conception of the relational-
functional aspect does not appear yet.
3. pseudo-algebraic strategy: it is little used by the pupils and it leads to the
correct solution of the problem only in some cases.
To examine carefully these results we introduced some supplementary vari-
ables in the “students” component. These profiles represent the supplementary
individuals putting out the fundamental characteristics of the a-priori analy-
sis. They are displayed in Table 3.
SUPPLEMENTARY STRATEGIES
VARIABLES
NAT in natural language
FUNZ by trials and errors in natural language and/or
in half-formalized language
PALG1 pseudo-algebraic + resolution of the equation with
some errors of syntactic kind
PALG2 pseudo-algebraic + other strategy
ALG algebraic
The hierarchic tree shows that the profile NAT is the most meaningful
because it represents the strategy the pupil used most. We observe a small
set of pupils connected to this group. They followed the procedure described
in NAT; but, afterwards, they effected the passage from the single solution to
multiple solutions [22, pp. 96].
Modelling by Statistic in Research of Mathematics Education 273
References
1. R. Agrawal, T. Imielinski, and A.N. Swami. Mining association rules between
sets of items in large databases. In P. Buneman and S. Jajodia, editors, ACM
SIGMOD International Conference on Management of Data, pages 207–216,
1993.
2. F. Arzarello, L. Bazzini, and G. Chiappini. L’algebra come strumento di pen-
siero. analisi teorica e considerazioni didattiche. Progetto Strategico CNR-TID,
(6), 1994.
3. Ch. Bastin, J.P. Benzecri, Ch. Bourgarit, and P. Cazes. Pratique de l’Analyse
des Données, volume 1–2. Dunod, 1980.
4. A. Bodin. Improving the diagnostic and didactic meaningfulness of mathematics
assessment in france. In Annual Meeting of the American Educational Research.
Association AERA, New York, 1996.
5. G. Brousseau. Theory of didactical situations in mathematics. Kluwer Aca-
demic Publishers, 1997. Edited and translated by N. Balacheff, M. Cooper,
R. Sutherland and V. Warfield.
6. G. Brousseau. Théorie des situations didactiques. Didactique des mathématiques
1970–1990. La pensée sauvage, 1998. Textes rassemblés.
274 E. Malisani et al.
25. M. Matz. Intelligent Tutoring Systems, chapter Towards a Process Model for
High School Algebra Errors, pages 25–50. Academic Press, London, 1982.
26. L. Radford. Approaches to Algebra. Perspectives for Research and Teaching,
chapter The roles of geometry and arithmetic in the development of algebra:
historical remarks form a didactic perspective, pages 39–53. Kluwer, 1996.
27. A. Scimone. Following goldbach’s tracks. In Proc. of the Int. Conf. “The Hu-
manistic Renaissance in Mathematics Education”, University of Palermo-Italy,
2002. Text available at: http://dipmat.math.unipa.it/~grim/21project.htm.
28. A. Scimone. La congettura di goldbach tra storia e sperimentazione didattica.
Quaderni di Ricerca in Didattica, 10:1–37, 2002. Text available at: http://
dipmat.math.unipa.it/~grim/quaderno10.htm.
29. A. Scimone. Conceptions of pupils about an open historical question: Goldbach’s
conjecture. The improvement of Mathematical Education from a historical view-
point. PhD thesis, Palermo, Italy, 2003. published on Quaderni di Ricerca
in Didattica 12, Text available at: http://dipmat.math.unipa.it/~grim/
tesi/_it.htm.
30. A. Scimone. An educational experimentation on goldbach’s conjecture. In Proc.
CERME 3, Group 4, pages 1–10, Bellaria-Italy, 2003.
31. A. Scimone. How much can the history of mathematics help mathematics educa-
tion? an interplay via goldbach’s conjecture. In Zbornik, Bratislavskehoseminara
z teorie vyucovania matematiky, pages 89–101, Bratislava, 2003.
32. F. Spagnolo. Obstacles Epistémologiques: Le Postulat d’Eudoxe-Archimede. PhD
thesis, Universiy of Bordeaux I, 1995.
33. F. Spagnolo. L’analisi a priori e l’indice di implicazione di regis gras. Quaderni
di Ricerca in Didattica, 7:110–117, 1997. Text available at: http://dipmat.
math.unipa.it/~grim/quaderno7.htm.
34. F. Spagnolo. A theoretical-experimental model for research of epistemological
obstacles. In Int. Conf. on Mathematics Education into the 21st Century, 1999.
Text available at: http://dipmat.math.unipa.it/~grim/model.pdf.
35. F. Spagnolo. L’analisi quantitativa e qualitativa dei dati sperimentali. Quaderni
di Ricerca in Didattica, 10, Supplemento, 2002. Text available at: http://
dipmat.math.unipa.it/~grim/quaderno10.htm.
36. F. Spagnolo. La modélisation dans la recherche en didactiques des mathéma-
tiques: les obstacles épistémologiques. In Recherches en Didactiques des Math-
ématiques, volume 26. La Pensée Sauvage, Grenoble, 2006.
37. F. Spagnolo and R. Gras. Fuzzy implication through statistic implication:
a new approach in zadeh’s framework. In S. Dick, L. Kurgan, W. Pedrycz,
and M. Reformat, editors, Proc. of Annual Meeting of the North American
Fuzzy Information Processing Society (NAFIPS 2004), volume 1, pages 425–429,
Banff,Canada, 2004.
38. Z. Usiskin. Conceptions of school algebra and uses of variables. In A.F. Coxford
and A.P. Shulte, editors, The ideas of Algebra, pages 8–19. NCTM, Reston-Va,
1988.
39. S. Wagner. An analytical framework for mathematical variables. In Proc. of the
Fifth PME Conference, pages 165–170, Grenoble, France, 1981.
40. S. Wagner. What are these things called variables. Mathematics Teacher,
76(7):474–479, 1983.
276 E. Malisani et al.
Dominique Lahanier-Reuter
Université Charles-de-Gaulle,
Equipe THEODILE E.A. 1764
59653 Villeneuve d’Ascq, France
[email protected]
The implicative analysis of data allows us to show rules (or quasi rules) that
structure a set of data from calculations of the co-occurrences of some modal-
ities of variables. These rules can be generically represented by an expression
of the type “if A then B”. They are consequentially hierarchical, that means
that they operate an asymmetry between the regulated modalities. In this,
the implicative analysis of data (henceforth S.I.A) is distinguished from other
modes of statistical analysis that, if they are equally based on calculations
of co-occurrences of modalities of variables, only exhibit symmetrical rules
that therefore do not discriminate between the variables studied. Studying
the relations between S.I.A. and the didactics of mathematics consequently
questions the theoretical status that mathematics didacticians can grant to
these models of rules as well as the nature or the status of data from which
the modalities of variables subject to the S.I.A. are constructed.
If one can define mathematical didactics as a plan of scientific study of
the phenomena tied to the transmission of disciplinary knowledge, this means
that the didacticians’ preferred field of observation is that of the mathematics
class: the mathematics class in the sense of the material space, certainly (as
the one can explore the posters, the students’ notebooks. . . ) but also in the
sense of symbolic space (the mathematics class still exists when the teacher
prepares his classes, when the student learns his lessons, at home, at study
hall. . . ). This class thus exists once the interaction between subjects places
them, one as teacher, and the other as student, in relation to an object of
disciplinary knowledge. Very schematically, one of the main objects of study
in of didactics is that of the manifestations of this relationship between these
three interdependent elements, the teacher, the student and the knowledge
of the discipline, of its establishment and its maintenance over time. Two
consequences can be drawn from this modelling. Firstly the observables are
consequently constructed as interactions between master and student, mas-
ter and knowledge, student and knowledge. Secondly, this study requires the
analysis of the regulations that are simultaneously going to be generated by
this system of interactions and assure its functioning. Thus, for example, one
of the most fruitful problems in mathematical didactics is that of the regula-
tions which affect the interactions between the student and the knowledge at
hand if the observables are in this case actions in which the student is engaged
(linguistic or not), the regulations that govern these actions (the engagement
of procedures, some choices made. . . ) or that are produced by these actions
(the abandonment of certain ways of acting, certain controls. . . )
Some of the rules revealed through SIA are thus interpretable in math-
ematical didactics in terms of regulation of interactions. Two positions can
Didactics of Mathematics and Implicative Statistical Analysis 279
then be adopted: either these rules have the status of hypotheses for the di-
dacticians, it being his responsibility to invalidate them or to confirm them
by other methods of analysis (as for example the undertaking of interviews)
or they have the status of facts of experience (thus allowing the rules to con-
tradict or to not be able to invalidate the analysis a priori ). Some of the
rules revealed through SIA are thus interpretable in mathematical didactics
in terms of regulation of interactions [8, 9].
The asymmetry that these rules present is also to be taken into account
in the interpretation that mathematical didactics can make of them. It poses
the problem of an explication of the asymmetries by the modelling in terms of
a system of interactions. The didactic system that we have summarily evoked
(a triplet of interactions between student, teacher and knowledge) is a system
in which the asymmetries of the characteristics tied by a rule can be explained
in a number of different ways.
Let us begin with the most classic case, in which the established rules
are from data corresponding to observables (the actions or the effective dec-
larations of students or teachers gathered on site). An asymmetry between
observables must correspond to the asymmetry between modalities of vari-
ables linked by a rule which has resulted from S.I.A. “all the subjects having
the characteristic A have the characteristic B”. This asymmetry leads to ques-
tion didactically the fact that very few students have done B without having
done A, have succeeded in B without having succeeded in A, have answered
B without having answered A, that very few teachers have done B without
having done A, etc. These regulations of doing, saying and of their effects
can be the effects of temporality, of differences between the tasks proposed,
of organization of knowledge. . .
Another case can be envisioned, that in which the established rules are
from data corresponding to observables on site, but also from data perpetuated
from these observables. The stability of the latter therefore reveals groups of
“fixed” subjects (the students of a same socio-cultural milieu, “novice” vs. “ex-
perienced” teachers, etc.). S.I.A. can then either provide rules linking actions,
statements, the effects of these actions, of these words and these constituted
groups, or, by the study of the contributions of subjects to rules, establish ten-
dencies shared by subjects from the same group, or on the contrary, equally
characteristic avoidances.
We will present several cases of studies in mathematical didactics exem-
plifying these different uses.
successive tasks. Firstly students are asked to put into order written decimals
and fractions 1.2; 5.9; 7.5; 4; 9.5; 12; 5.15; 1/2; 2.5 secondly to place them on
a graduated line. The question is: “range par ordre croissant 1,2 – 5,9 – 7,5
– 4 – 9,5 – 12 – 5,15 – 1/2 – 2,5”. In French the word numbers associated
with 5.15 and 5.6 are pronounced ‘five, comma fifteen” and “five, comma six”.
This way of pronouncing the numbers explains a frequent error at this school
level, which consists of placing 5.6 before 5.15, by only comparing the decimal
parts of these numbers. However the numbers have been chosen so that the
reproduction of this classification error in the second part of the task leads
to a contradiction that the students —still normal for students at this school
level— can comprehend. In fact, to put a point corresponding to 5.6 on the
graduated line, then to put one that corresponds to 5.15, in moving back the
first one by a space of “9” (the space between 15 and 6) leads the student to
place 5.15 erroneously on the point that should correspond to 6.5 (5.6 + 0.9).
This placement can seem contradictory with that which corresponds to 6.2
and other points. We will say in this case that the information given to the
student by the erroneous placing of 5.15 is an element of the environment
with which the student interacts.
If consequently we expect certain students to commit errors in the ordering
of written numbers, on the other hand, we wonder about the effects of the
consequences of these errors during the execution of the second task. Two types
of common considerations in mathematical didactics allow us to anticipate
them. Firstly, the information that the erroneous placement of the points
on the line provides is not “naturally” interpreted in terms of a contradiction.
The reading and the comprehension of this information by the student require
that he uses certain knowledge: in fact it concerns considering the placement
of 5.15 and 6.2 as “strange” and sees to consider them as a consequence of the
classification error of 5.6 and 5.15. Previous research done on this error or this
problem in a school situation lead us to differentiate a student’s recognition
of an error from his doing of that error. Or to put it simply, the perception
of a contradiction in his results is often insufficient to lead a student in a
class to invalidate the latter because he still does not feel invested with the
responsibility to resolve the problem raised [1,2,11]. The question of the study
of students’ behaviour is therefore a legitimate question.
The study of the corpus of written productions of the students makes
apparent the diverse strategies used to respond to the two questions of the
exercise. To order the numeric writing, some students used a classification
strategy by ‘types of writing’, in classifying first the written fractionals, then
the written entire numbers, that have no decimal point, then the written
decimals, in separating those that only have one figure after the decimal point
from those that have two. The pupils adopting that classification take into
account only the length of the numbers as they are written out. Others, as we
could have expected, classified the written figures according to their entire part
(visible or calculated in the case of 1/2), then according to their decimal part,
which was also considered as an integer number (5.15 is then placed after
Didactics of Mathematics and Implicative Statistical Analysis 281
5.6). Finally, certain students ‘neglected’ to use the reference points of the
graduated line and instead used the line as a ‘writing line’ without placing
any points. This study also allows us to decide if, in the end, the student
produces two different orders which are consequently contradictory, or if on
the contrary he produces two coherent orders, even if they are erroneous.
If S.I.A. is applied to data it may result in an association group as in
Graph 1 that allows us to see the following rules:
1. “Adopting, definitively, a classification or writing by types of writing” im-
plies, ‘accepting a lack of accord between the two orders produced” (99%).
(Graph 1, 3 ⇒ 13)
2. “Working, definitively, on the graduated line, as a writing line” implies
obtaining “two coherent orders, even if they are erroneous” (95%) (Graph
1, 7 ⇒ 12).
3. “Producing, finally, an exact classification of written numbers” implies
obtaining “two coherent orders” (95%) (Graph 1, 4 ⇒ 12).
These three rules are interpreted as regulations of student behaviour when
faced with these two tasks. It is possible to read in rule (1) the fact that
certain of these students see —or decide to see— the two tasks as distinct. For
instance, one of the students answers (at first question): “4; 12; 1,2; 2,5; 7,5; 9,5;
5,15; 1/2”. However, he puts marks on the graduated line for “1,2”, next for “2,5”,
next for “4”, etc. We can consider that they do not understand (in the situation
explored) the articulation between the order that the linear arrangement of
the written numbers “shows” and that which the arrangement of points on the
graduated line “shows”. Secondly, rule (2) can be interpreted, in the context
of the situation, as an “avoidance” of the second task. The student copies the
preceding list of writing onto the graduated line, and thus avoids taking into
account the eventual difficulties that he will face in assuring coherence between
the two orders. Finally rule (3), in establishing the asymmetry between the
two modalities1 , leads us to surmise that checking the coherence of the two
orders allows, for certain students, a rectification of the classification of the
written figures.
Thus the didactic organization of these two tasks, and particularly the
conception of an environment by the interpretation of its retro-actions, to
reveal an incoherence of the results, is insufficient: it is necessary in fact, that
the student accepts to link these two tasks in order for him to accept to
interpret the results of one according to the other.
1
That could not particularly be assumed by a test of χ2
282 D. Lahanier-Reuter
We analyze these texts as pupils’ works, that is to say we try to take into ac-
count the context in which they were created and the different pupils’ status.
It is not the same, for instance, for a nine years old or a ten years old pupil to
produce, as a construction program, the following: “draw two perpendicular
lines of which centre is the meeting point of lines and link the intersection of
the two lines and of the circle” or “trace a square, its diagonals and its centre,
trace the circle whose centre is the same as the square’s and which touches
the summits of the square” or “draw a circle and a square inside the circle”. As
a matter of fact, the first text points to objects and geometrical relationships
between the objects that are constructible by pupils at this school level. On
the other hand, the second one needs the drawing of a square, which isn’t as
easy for them. The last one avoids the necessary drawings to be able to fit a
square in a circle. Taking into account the pupils’ school level and considering
the outcomes as productions, we have chosen not to rank them according to
how correct the answers are, but according to the choices pupils made.
We have kept as first indicators of the way pupils write, the chosen el-
ements and their designations, the geometrical relationships mentioned and
their designations. The corresponding indicators are the numbers of geometri-
cal terms used, the chosen elements -circles, lines, summits, . . . -, relationships
such as perpendicular, topological positions, etc. Therefore, the point is to de-
termine the figure analysis pupils have chosen to make through what they say.
The way of analyzing can be quite different from one pupil to the other. Some
of them only dealt with the lines that shape the diagram: the two lines, the
circle and the square, O or the four summits of square A, B, C, D. From a
theoretical point of view, these two ways of looking at it are linked to dif-
ferent analytical skills. As a matter of fact, looking at a geometrical figure
as a punctual structure requires going beyond immediate perception, which
only shows lines entangled. The use of S.I.A. makes it possible to test the
Didactics of Mathematics and Implicative Statistical Analysis 287
3
Original spelling has been changed.
288 D. Lahanier-Reuter
Therefore it is indeed the change of status (in turn, centre of the circle and
intersection of lines or centre of the circle and intersection of the diagonals)
which makes up the decisive criterion for classifying the pupils’ production
and its grading.
Trusting the reader with this change and finding the discursive ways to do
so might be a crucial stage. It involves indeed being able to find the means
to bring back, to recall an element already present in the text (therefore a
handling or the implementation of anaphora), it also implies being able to get
away from visual evidence of the figure, in short being able to go beyond an
immediate visual contact to a written account of an invisible change. Point O
doesn’t move but its function changes.
So S.I.A. allows us proving the deciding role played by some analysis cri-
teria and the necessity of their presence in skills evaluation.
• (1) “Writing “I” implies “using the indicative mode” (99%) (Graph 3
12 ⇒ 11).
• (2) “Using the infinitive mode” implies “Building a generic subject “one”
“(99%) (Graph 3 9 ⇒ 13).
• (3) “Using the imperative mode” implies “building a subject “you” ” (99%)
(Graph 3 10 ⇒ 14).
The links that come up are expected for the most part since the very use,
even partial, of the imperative and infinitive modes is indeed linked to the
pronouns which signal what the reader puts back together: a “peer” for the
imperative mode, signalled by the pronoun “tu” (informal you) and “vous”
(formal you), “a generic reader” for the infinitive mode, an “evaluative reader”
signalled by “I” and the indicative mode. However, the presence of univo-
cal relationships between the chosen pronouns and the modes used, gives an
unexpected rigidity since it is possible, in the indicative mode to use “on”
(people)” and “tu” (informal you)”. Unlike classical tests suggesting symmet-
rical links between studied variables, the S.I.A. allows questioning of these
strong constraints, stemming from the fact that the writing is produced in a
school situation.
This makes it possible to think that the rules students use define them
as actualisations of discursive genres. Thus, we suppose that building the
reader as a “peer” is characteristic of a school writing genre in math class, and
therefore can be perceived as legitimate by pupils for several reasons. It could
be that the pedagogical and didactic devices make such positions possible,
because help and cooperation are principles put into practice in the language
used in the different subjects taught. It could also be that the ways exercises
are written in school books define such a reader. Building a “generic reader”
is also an identifiable characteristic. However, this characteristic is not as
frequent in geometric construction exercises in elementary school schoolbooks.
On the other hand, it is frequent in “description of recipes” schoolbooks.
At this level, this genre, like those of “construction programs” or “users’
manuals”, take up a rather important part in school activities, and can be
found in places other than schools. Lastly, the discursive ways of using “I”
with which a pupil shows the reader what he can do or manages to do is a
characteristic of school evaluation situations.
Therefore, S.I.A. interpretation of results appears to take into account
univocal links. Unlike in traditional analysis, links can be interpreted as con-
straining rules governing the actions observed.
290 D. Lahanier-Reuter
Methodologically, we can first look at the isolated groups. In the case of the
example we are developing, one of the CM2 class (“CM2 Wa”) is isolated. As
it would be the case in “classical” analysis, the absence of implicative links
marking this group of pupils is interpreted as a sign of diversity in this group
of pupils’ written productions (according to the chosen criteria).
S.I.A. makes it possible to perceive the cases where characteristics are those of
a part of the group of pupils studied, compared to a group where all the pupils
Didactics of Mathematics and Implicative Statistical Analysis 291
of the group were studied. They “almost” all share the same characteristics,
contrary to the modes of analysis which produce symmetrical links. We will
start by presenting a case where a characteristic comes up as specific to one
of the groups.
The example is that of the characteristic “writing a text using “I” which is
the characteristic of one of the classes coded CM1 Wb.
- (1) “Writing a text using “I” ” implies “belonging to class CM1 Wb” (99%)
(Graph 5 12 ⇒ 2).
Only the pupils of this class chose a particular writing behaviour, which
indicates that pupils read into a suggested situation an evaluation situation:
the expected reader is the teacher and pupils show what they can do. But what
seems even more important is the fact that the implicative link of maximal
intensity means that no pupil “or rather almost none” in other classes have
reacted in that way. To explain this specificity, we support the hypothesis
that the way the task was done has been singularized in class: maybe only
this CM1 teacher has presented the exercise as an evaluation or at least as an
exercise that he would check on.
On the other hand, other implicative links show that almost all pupils of a
same class share some skills and also make the same mistakes, etc. Let’s keep
in mind some of them:
• (1) “Belonging to CM2 Hb1” implies “using “tu” ” (90%) (7 ⇒ 14).
• (2) “Belonging to CM2 Wa” implies “Mentioning point 0 and changing its
status” (95%) (5 ⇒ 29).
This time, these characteristics are met in almost all the pupils of a same
class.
We think that they are the results — sometimes indirect — of the didactic
or pedagogical approaches used in class. What is left is to interpret these
different rules. If almost all the pupils of the first particularized class address
a reader who acts like “a peer” in their writings, it is no doubt because help and
cooperation are legitimized and encouraged in these classes or because these
forms of communication are used. If almost all the pupils of the second class
show good geometric skills, it is no doubt thanks to the teaching techniques
used.
But this geometry skill cannot be understood without the linguistic skills
which make it possible to communicate it. Using “tu” (informal “you”) is not
always the required school form. Now, these two classes have pupils from
different social backgrounds: in the second class the pupils come from more
privileged families than the other studied classes. Relationships between so-
cial classes and linguistic strategies are certainly complex and not mechanical.
However, the results we are getting are coherent with those of other studies
292 D. Lahanier-Reuter
4 Conclusion
Using a central problematic in mathematical didactics, we have been able
to show the efficiency of dealing with S.I.A. techniques. The input of this
method to analyze data cannot be disregarded for several reasons. The almost-
rules established by S.I.A. from observables can easily lend themselves to
interpretation in terms of action regulation. The implicative paths can be
read in terms of networks. Lastly, the asymmetry of links seems essential
in exposing any explicative hypothesis of certain phenomenon pertinent to
Didactics of Mathematics and Implicative Statistical Analysis 293
teaching and learning. We have also been able to sketch, through the detailed
relationships of research examples, particular methodological behaviours: the
attention to give to the interpretation of the internal cohesion of implicative
graphs, but also to the separation of these graphs, as well as to the graphs’
nodes and the univocal links. These are paths of thinking to be pursued.
References
1. G. Brousseau. Le contrat didactique : le milieu. Recherches en didactique des
mathématiques. Volume 3, pages 309–336, La Pensée Sauvage, Grenoble, 1990.
2. G. Brousseau. Théorie des situations didactiques. La Pensée Sauvage, Grenoble,
1998.
3. M. Bru, M. Altet, C. Blanchard-Laville A la recherche des processus caractéris-
tiques des pratiques enseignantes dans leurs rapports aux apprentissages. Revue
Française de pédagogie. Volume 148. INRP, Paris, 2005.
4. R. Gras. L’analyse des données : une méthodologie de traitement de questions
de didactique. Recherches en didactique des mathématiques. Volume 12:1, pages
59–72, La Pensée Sauvage, Grenoble, 1992.
5. R. Gras, A. Totohasina, S. Almouloud, H. Ratsimba-Rajohn, M. Bailleul. La
méthode d’analyse implicative en didactique. Applications. In : M. Artigue,
R. Gras, C. Laborde, P. Tavignot (eds.): Vingt ans de didactique des mathéma-
tiques en France. Pages 349–363, La Pensée Sauvage, Grenoble, 1994.
6. R. Gras. L’implication statistique, Nouvelle méthode exploratoire de données.
La Pensée Sauvage, Grenoble, 1996.
7. R. Gras, J. David, J.C. Régnier, F. Guillet. Typicalité et contribution des sujets
et des variables supplémentaires en Analyse Statistique Implicative. Extraction
et Gestion des Connaissances (EGC’06). Volume 2, pages 359–370, Cépaduès
Editions, 2006.
8. D. Lahanier-Reuter. Conceptions du hasard et enseignement des probabilités et
statistiques. P.UF., Paris, 1999.
9. D. Lahanier-Reuter. Exemple d’une nouvelle méthode d’analyse de données :
l’analyse implicative. Carrefours de l’éducation. Volume 9, pages 96–109, CRDP
Amiens, 2000.
10. D. Lahanier-Reuter. Enseignement et apprentissage mathématiques dans une
école Freinet. Revue Française de Pédagogie, Volume 153, pages 55–65, INRP,
Paris, 2005.
11. C. Margolinas. De l’importance du vrai et du faux. La Pensée Sauvage, Grenoble,
1993.
12. A. Mercier, C. Buty C. Evaluer et comprendre les effets de l’apprentissage de
l’enseignement sur les apprentissages des élèves : problématiques et méthodes
en didactique des mathématiques et des sciences. Revue Française de Pédagogie.
Volume 48, pages 47–59, INRP, Paris, 2004.
13. Y. Reuter. Les représentations de la discipline ou la conscience disciplinaire. La
Lettre de la DFLM. Volume 32, pages 18–22, 2003.
294 D. Lahanier-Reuter
Appendix
a) S.I.A Graph 1.
b) S.I.A Graph 2.
c) S.I.A Graph 3.
Fig. 4. 1 CM1 Wa; 2 CM1 Wb; 3 CM1 Hb; 5 CM2 Wa; 6 CM2 Hb1; 77 CM2 Hb2;
9 Using infinitive mode; 10 Using imperative mode; 11 Using indicative mode; 12
Using ‘Je’ (I); 13 Using ‘On’ (one); 14 Using ‘Tu or Vous’ (informal you or formal
you); 15 Planning marks; 16 End marked; 17 Error on ‘le’ (the) or on ‘un’ (a); 18
Circle determined; 19 Circle independent; 20 Circle located; 21 Square independent;
22 Square located; 23 Square determined; 24 Lines independent; 25 Lines located; 26
Lines determined; 27 O No mention; 28 O mentioned; 29 O mentioned, two status;
30 ABCD No mention; 31 ABCD mentioned; 32 ABCD mentioned, two status; 33
ABCD constructed.
Didactics of Mathematics and Implicative Statistical Analysis 297
d) S.I.A Graph 4.
e) S.I.A Graph 5.
Fig. 6. 1 CM1 Wa; 2 CM1 Wb; 3 CM1 Hb; 5 CM2 Wa; 6 CM2 Hb1; 77 CM2
Hb2; 10 Using imperative mode; 11 Using indicative mode; 12 Using ‘Je’ (I); 13
Using ‘On’ (one); 14 Using ‘Tu or Vous’ (informal you or formal you); 15 Planning
marks; 19 Circle independent; 21 Square independent; 24 Lines independent; 27 O
No mention; 29 O mentioned, two status; 30 ABCD No mention.
Using the Statistical Implicative Analysis
for Elaborating Behavioral Referentials
Summary. Various informatic assessment tools have been created to help human
resources managers in evaluating the behavioral profile of a person. The psycholog-
ical basis of those tools have all been validated, but very few of them have follow
a deep statistical analysis. The PerformanSe Echo assessment tool is one of them.
It gives the behavioral profile of a person along 10 bipolar dimensions. It has been
validated on a population of 4538 subjects in 2004. We are now interested in building
a set of psychological indicators based on Echo on a population of 613 experienced
executives who are 45 years old and more, and seeking a job. Our goal is twofold:
first to confirm the previous validation study, then to build a relevant behavioral
referential on this population. The final goal is to have relevant indicators helping
to understand the link between some behavioral characteristics and current profiles
that can be categorized in the population. In the end, it may provide the founda-
tion for a decision support tool intended for consultants specialized in coaching and
outplacement.
1 Introduction
Human resources managers have been early users of computer tools. The need
for evaluating the behavioral profile of a person in human resources has led to
the creation of personality assessment tools. Initially, the first tools were paper
based ones. Then, the first developed computer ones were expert systems (e.g.
Human Edge [12]). Then more complex decision support tools were created:
MBTI (Myers-Briggs Type Indicator) [5, 14], PerformanSe Echo. Meanwhile,
S. Daviet et al.: Using the Statistical Implicative Analysis for Elaborating Behavioral
Referentials, Studies in Computational Intelligence (SCI) 127, 299–319 (2008)
www.springerlink.com © Springer-Verlag Berlin Heidelberg 2008
300 S. Daviet et al.
3
APEC stands for Agence pour l’Emploi des Cadres, in English: Job Center for
Executives
Elaborating Behavioral Referentials with SIA 301
2 Applicative Context
Various personality assessment tools are widely used in human resources man-
agement for profiling people in job oriented decision support. A personality
assessment tool intends to draw up the behavioral profile of a person from
the results of a questionnaire. The goals of those types of tools are multi-
ples: support for recruitement, support for vocational guidance, behavioral
checkup accompanying a competence checkup. Those tools are not intended
to be used in a discriminative way to select among applicants for a job, but
more as a basis to help a human ressource manager for instance when receiv-
ing people for an interview. We have two types of questionnaires. The first
one is composed of open questions where the person is free to enlarge. The
answers are examined by a psychological expert to draw up the behavioral
profile. It is highly questionnable due to the subjectivity and variability of the
interpretation of the expert from one person to another. There are very few
questionnaires of this type. To name but one, Phrases [18] has the attendee
complete 50 phrases in 30 minutes, under the scrutiny of the examiner. Both
the answers and the behavior of the person during the test are evaluated. This
type of questionnaire is poorly studied due to the difficulty to build statistical
analysis on open questions.
The second type of questionnaires is composed of closed questions and is
the most widespread. Those questionnaires could be handwritten or comput-
erized. It generally consists of a set of questions (also named items) with 2 or
more answers. A set of rules, like those we can encounter in expert systems,
have beforehand been established and give a behavioral profile along a pre-
determined number of personality traits (also named dimensions). There is a
great number of those types of tools. Here we show a set of the computerized
ones:
• Sosie (from ECPA): 20 personality traits evaluated through 98 groups of
4 assertions,
• PAPI (PA Preference Inventory from Cubiks):
– the classic test: a choice between 90 pairs of sentences,
– the normative test: 126 assertions with a choice from “totally disagree”
to “totally agree”,
• MBTI (from Myers and Brigg): 126 questions with a choice between two
answers and a profile chosen among 16 predefined ones,
• PerformanSe Echo: 70 questions with two answers and a profile on 10
bipolar dimensions determined through a set of rules,
• Assess First: 90 questions with two choices drawing a profile over 20 be-
havioral dimensions and 5 families.
Among all these tools, few of them have undergone a real statistical val-
idation study. Indeed, most of those products are based on well-grounded
psychological basis: Jungian theory [5, 15], Big Five model [8, 21], or study
of motivations [7]. But statistical studies are a necessary counterpart of the
302 S. Daviet et al.
This study has been governed by a need of the APEC to get relevant behav-
ioral indicators for supporting their everyday task: to help people finding the
right job. The APEC is a national organization that provides job guidance
to people. It is somewhat comparable to the ANPE4 , but especially intended
for executives. They provide an assistance for all that is related to job ori-
entation, training, reemployment, skill validation and skill assessment. Their
major difficulty is to get the good clues to determine at best the function
that matches a given person. The PerformanSe tools have been purpose-built
for the employment field. The assessment provides a set of recommandations
with support/vigilance points. But, those conclusions are quite general and
the need of the APEC is more specific to each position. The stake is twofold:
first, to determine the main characteristics that promise the better chance
of success for reemployment, then the personal profile that better matches a
4
ANPE stands for Agence Nationale Pour l’Emploi, in English: National Job
Center
306 S. Daviet et al.
given job. Finding those characteristics boils down to building a job referential
which is our main goal.
To meet those needs, we have organized our study in two steps: the first
step consists in bringing out the specificities of this population of executives
in respect to the global population. To succeed in this task, both classical
statistical tools and more advanced SIA tools are used. In the end, we want
to determine both the dimensions and the factors (combination of multiple
dimensions) that characterize this population, and thanks to a psychological
expert give a meaning to the discovered factors. As previously said, it is not
the dimensions themselves but the factors that can be interpreted. The SIA is
a good way to obtain those combinations of dimensions, notably via similarity
trees and cohesive graphs.
This first study would bring some indicators that may differentiate the
studied population from the global population, but also discriminate some
subgroups in this studied population. If global indicators may characterize
the main part of the sample, there might be some subpopulation that is not
fully characterized by those indicators and could be interesting to study. To
summarize, the first step may bring the discovery of subgroups that we will
analyze in a second more local step. We will use the same tools of usual
statistics and SIA to drive this study.
5 Data
The reference population is the one that has been used for validating the tool.
The data has been collected through a partnership between PerformanSe and
a large sample of clients, who have communicated their assessments. It is
composed of 4538 people with wide-ranging backgrounds:
• companies, national and international groups and SMEs in all sectors of
the economy,
• consultancy firms,
• business schools and engineering schools,
• public organizations for professional mobility and orientation, governed by
the Ministry of Employment or the Ministry of Education.
People within this sample could be of any age, employed or not, from mis-
cellaneous socio-cultural origins. Each person of the sample is described with
the 20 traits of the Echo questionnaire. For the computation, it is the val-
ues from 0 to 35 for each trait that have been used, not those of the bipolar
dimensions. On each of the 20 traits of the Echo model, the average score
values of this sample teeters from 17.12 to 17.98 on a scale from 0 to 35. The
standard deviation values are spread between 5.83 and 7.42. The population
follows a normal distribution (centered Gaussian) over the 20 traits. People
Elaborating Behavioral Referentials with SIA 307
are divided out: 25% in low values, 50% in medium values and 25% in high
values. Tab. 5.1 shows the results of this study over the 20 traits. On every
trait, the population follows a centered Gaussian distribution. It is important
to specify that this distribution has been obtained directly from the gross
results without any curve fitting. That shows the relevance and accuracy of
this personality assessment tool. This study has also shown that the tool did
not need to be recalibrated.
Thanks to a partnership with the APEC, we get access to the data collected
by this national organization. It means a large sample of people having passed
the behavioral assessment: 2788 people. In our case, we have restricted our
study to a particular cross-section of population: the experienced executives
who are 45 years old and more and seeking a job. This restriction stems from
a need of the APEC to get a more specific analysis on this particular part of
the population. Indeed, the average behavioral profile of this sample may be
different from the one of the overall population. In our case, we get a cross-
section of the population that contains 613 assessments (one assessment per
subject), in other words 20% of the global sample. This may be interesting for
characterizing some specificities of this population in respect to the reference
308 S. Daviet et al.
population, and then inside this population between some particular profiles
typical of some subgroups of the population.
To study the data in CHIC, we have chosen to transform it into binary
data. In the previous validation study, the computation was made on the
20 traits valued from 0 to 35. In this study, we have used the 10 bipolar
behavioral dimensions discretized in +, 0, − (i.e. for instance: EXT-, EXT0
and EXT+). We have then transform this data into binary data as usually
done in this type of case (i.e. 1 if the characteristic is present, 0 if not).
A sample illustration is shown in Tab. 5.2. The first reason is that CHIC and
the SIA were initially designed to study this type of binary data, and it so
also seems to be the simplest way. The second and most important reason is
that we want to make some indicators appear, in other terms some factors (or
combinations of multiple dimensions). But these indicators may be big trends
more than precise values, because it is more likely to give some consistent
and meaningful classes than discreet values between 0 and 35. That is why
we have driven our study on discretized values.
7 Global Study
7.1 Goal
We have firstly dealt with the whole studied population of experienced exec-
utives and driven our research along this axis. First of all, we have made a
comparative study between the studied population and the reference popula-
tion thanks to the classical statistical measure: mean. Our goal is to highlight
some relevant indicators that differentiate those executives from the common
individual. Then, we have more deeply studied the inner characteristics of
this population with the tools made available with CHIC. Our goal here is to
find out some specific significant subpopulations both in the statistical and
semantical meaning, and with the expert support, to isolate the behavioral
dimensions implied in this dichotomy and their explanations.
This first step of our study will also help us for the second step. The sub-
population found in this first global study will be more locally analyzed in a
second phase. It will be the basis of the second study to reveal indicators. The
goal is to complete the characterization realized in the global study and con-
firm the first draft of the indicators with a set of complementary dimensions.
be overhasty. Nothing indicates that these characteristics does not split the
global population into two subpopulations or more. To delve into this analysis,
we have completed this study with a more advanced tool of CHIC: similarity
trees.
We have used similarity trees with the entropic implication and the Poisson
distribution. Indeed, we have a population of more than one hundred people
and the classical method is not recommended because less discriminatory.
With the same restrictive goal, we have chosen the Poisson distribution.
This second analysis with similarity trees provides a way to determine the
essential classes that partition our population of executives. As we can see on
Elaborating Behavioral Referentials with SIA 311
Fig. 4, this analysis confirms the significance level of the ASN+ and POW+
dimensions. Indeed, the pair (ASN+, POW+) forms the first significant node
of the tree (marked in bold) with a similarity coefficient of 0.954724. The
dimension EXT+, combined with the (ASN+, POW+) pair, appears to be
relatively significant and discriminative of the population of senior executives.
Those three dimensions underlie most of the significant nodes (similarity of
(EXT+ (ASN+, POW+)) = 0.876296). Therefore, this triplet could be con-
sidered as a good candidate to partition our population and to be a relevant
behavioral indicator.
8 Local Study
On the basis of the results discovered through similarity trees, we have studied
each of the three discovered subclasses, discriminated with the ASN dimen-
sion. Indeed, ASN is not sufficient to build a relevant indicator. We need more
clues on what are the main trends of each of these three groups. Thus, we have
used implicative and cohesitive trees to detect link between dimensions. Those
links could then be used to build our indicators as a combination of multiple
dimensions. In the following studies, we have discarded the ASN dimension
that is no more discriminative for each of the 3 subpopulations: ASN-, ASN0
and ASN+. We will only present the statistical details of the study concerning
the ASN- subpopulation, and our conclusions on the 2 other populations. The
conclusions are based both on the comparison between the subpopulations
and the global population of executives, and between the subpopulations and
the ordinary population.
ASN- Subpopulation
Levels Cohesion
1 (REC- BEL-) 0.993
2 (COM+(REC- BEL-)) 0.991
3 (RIG- InD+) 0.966
4 (BEL+ REC+) 0.962
5 (ACH+ InD-) 0.948
6 ((RIG- InD+)ACH-) 0.944
7 ((ACH+ InD-)ANX+) 0.939
8 (REC0 COM0) 0.893
9 (ANX- RIG0) 0.876
10 ((BEL+ REC+)COM-) 0.868
11 ((COM+(REC- BEL-))RIG+) 0.856
12 (ANX0 InD0) 0.559
13 (((BEL+ REC+)COM-)ACH0) 0.366
14 (BEL0 EXT-) 0.313
15 ((BEL0 EXT-)POW-) 0.247
16 (POW0 EXT0) 0.155
Table 6. Cohesitive values
Elaborating Behavioral Referentials with SIA 315
it reveals the loss of leadership and can explain the difficulty of this sub-
population to reintegrate the working world. However, the expert has found
interesting those combinations of dimensions:
• (((BEL+ REC+) COM-) ACH0): relies on others by accepting concessions,
so as to lighten his/her work load,
• ((COM+ (REC- BEL-)) RIG+): hides behind an inflexible, aloof and even
strongly opposed behavior.
In the light of those results for the ASN- subpopulation, the expert has char-
acterized a set of dimensions that is meaningful according to its psychological
knowledge. Hereunder, you can see the indicators built:
• Indicator of adaptation: REC0/COM0 (17% of the sample),
• Indicator of illusion: RIG-/InD+/ACH-,
• Indicator of cry for help: BEL+/REC+/COM-,
• Indicators of autistic withdrawal:
– passive: EXT-/POW-/BEL0,
– offensive: COM+/REC-/BEL-,
• Indicator of strictness by:
– obstinacy: ACH+/RIG+,
– nervous tensing up: ACH+/InD-/ANX+.
If we consider the implicative graph, we can see that almost all the combina-
tions over the 0.90 threshold have been kept and construed by the expert.
References
1. R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In
J.B. Bocca, M. Jarke, and C. Zaniolo, editors, 20th International Conference on
318 S. Daviet et al.
Very Large Data Bases, VLDB’94, pages 487–499. Morgan Kaufmann, 1994.
2. J. G. Carlson. Recent assessment of the mbti. Journal of Personality Assess-
ment, 49(4), 1985.
3. M. Carlyn. An assessment of the myers-briggs type indicator. Joumal of Per-
sonality Assessment, 41:461–473, 1977.
4. R. Couturier. Traitement de l’analyse statistique implicative dans chic. In
Journées sur la fouille des données par la méthode d’analyse implicative, pages
33–55, 2001.
5. D. Cowan. An alternative to the dichotomous interpretation of jung’s psycholog-
ical functions: Developing more sensitive measurement technology. In Journal
of Personality Assessment, volume 53, pages 459–471, 1989.
6. U. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, editors. Ad-
vances in Knowledge Discovery and Data Mining. MIT Press, Cambridge, MA,
1996.
7. J. George and G. Jones. Organizational Behavior. Prentic Hall, Upper Saddler
River, NJ, 3rd ed. 2004 edition, 2002.
8. L. R. Goldberg. Language and individual differences: The search for universals
in personality lexicons. Review of Personality and Social Psychology, 2:141–165,
1981.
9. R. Gras. L’implication statistique : une nouvelle méthode exploratoire de don-
nées. La Pensée sauvage. 1996.
10. R. Gras, H. Briand, P. Peter, and J. Philippe. Implicative statistical analysis.
In Proceedings of International Congress I.F.C.S., Kobe, Tokyo, 1997. Springer-
Verlag.
11. R. Harvey, W. Murry, and S. Markham. Evaluation of three-short-form versions
of the mbti. Journal of Personality Assessment, 63(1):181–184, 1994.
12. J.H. Johnson and T.A. Williams. Using a microcomputer for on-line psycholog-
ical assessement. Behavior Research Methods & Instrumentation, 10:576–578,
1978.
13. I.C. Lerman. Classification et analyse ordinale des données. Dunod, 1981.
14. I. Myers. The myers-briggs type indicator. Educational Testing Service, 1962.
15. P. Myers. Gifts Differing. Understanding Personality Type. Davies-Black Pub-
lishing, 1995.
16. T. Patel. Comparing the usefulness of conventional and recent personality as-
sessment tools: Playing the right music with the wrong instrument? Global
Business Review, 7(2):195–218, 2006.
17. V. Philippé, S. Baquedano, R. Gras, P. Peter, J. Juhel, P. Vrignaud, and
Y. Forner. étude de validation : Performanse echo, performanse oriente. Tech-
nical report, Study realized with the collaboration of PerformanSe, Laboratoire
COD de l’École Polytechniquede l’Université de Nantes, Laboratoire de Psy-
chologie Différentielle de l’Université de Rennes 2, 2004.
18. B. S. Stein and J. D. Bransford. Constraints on effective elaboration: effects of
precision and subject generation. Journal of Verbal Learning andVerbal Behav-
ior, 18:769–777, 1979.
19. O. Tzeng, D. Outcalt, S. Boyer, R. Ware, and D. Landis. Item validity of the
mbti. Journal of Personality Assessment, 48(3), 1984.
20. T. Vacha-Haase and B. Thompson. Alternative ways of measuring counselees’
jungian psychological-type preferences. Journal of Counselling and Develop-
ment, 80, 2002.
Elaborating Behavioral Referentials with SIA 319
1 Introduction
The use of multivariate analysis in the field of Didactics of Mathematics (DM)
has already got a long tradition in the frame of fundamental didactics. Im-
portant references can be found among the contributions of Journées de Caen
(1995 & 2000), such as [3] and [10], in which new statistical tools are provided,
motivated in the context of DM as in many other occasions, but fruitful for
both the fields of DM and multivariate statistics.
In the usual multivariate methods (generally factor analysis and princi-
pal component analysis), Brousseau [3] uses some supplementary individuals
(fictitious individuals) in his data in order to be able to compare the a pri-
ori and a posteriori analysis of a questionnaire. The a priori analysis of the
questionnaire leads to certain criteria of characterization of its questions (the
variables). In this way, two matrices are obtained: one coming from the pre-
experimental analysis (the a priori matrix of the questionnaire: criteria ×
questions) and the empirical matrix, made of the collected data, where ques-
tions remain characterised by the present sample (answers × questions).
Fictitious individuals allow for the simultaneous consideration of both
the pre-experimental criteria and those provided by the sample in a single
matrix. Fictitious individuals, as features of the variables involved in the
P. Orús and P. Gregori: Fictitious Pupils and Implicative Analysis: a Case Study, Studies in
Computational Intelligence (SCI) 127, 321–345 (2008)
www.springerlink.com © Springer-Verlag Berlin Heidelberg 2008
322 Pilar Orús and Pablo Gregori
The aim of this chapter is the search of new potential uses of SIA, within the
context of DM, in order to promote the development of this theory and its
applications. In this way, we present and study a conjecture on the introduc-
tion of fictitious individuals in the sample data matrix and in the subsequent
processing through CHIC [5]. On the one hand, this procedure offers assis-
tance in the didactical interpretation of the results. On the other hand, the
final results of the analysis are perturbed by this artificial data, so that a non
negligible size of this perturbation would invalidate the advantages achieved
with respect to the interpretation. Therefore, this work evolves in two direc-
tions: one in DM, and the other one in the methodology of SIA, as a corollary
of the first one.
We show, in the several analyses performed, not only results but also the
philosophy concerning the use of fictitious students: when, where and what
they are introduced for. Firstly, we present data. Then, in Sect. 3, we show
the results of the different SIA procedures applied to data under the classical
version [18], with and without fictitious data, keeping track on the differences,
checking that they are reasonably small, and stressing on the gain of informa-
tion through the procedure. Next, in Sect. 4, we take profit of the conclusions
of the previous section to design a new set of fictitious students with which
we analyse the structure of the same dataset (using the entropic version of
the SIA), improving the obtained information, as well as the appreciated dif-
ferences with respect to the previous results (with classical implication).
Having posed the question on how the structure of SIA results do vary
when adding a small number of individuals to an existing sample, we show,
in Appendix A, and for the interested reader, a short introduction to quasi-
implications and their intensity, and a result on its variation under addition
of new individuals to the sample. In the case study shown in this chapter, the
number of new individuals are less than 1% of the sample size.
Finally, the questionnaire used to obtain the data of the study is shown in
Appendix B.
A test on the initial skills in Mathematics has been conducted over the popu-
lation of first-year students of University Jaume I (UJI) of Castellón (Spain)
since 2001 [14, 17, 19]. For the first time, it was part of the development of a
DM research project of Bosch and collaborators (see [4]) and a PhD thesis [8].
324 Pilar Orús and Pablo Gregori
The items of the test were selected in order to ascertain some given didac-
tics hypotheses on the didactical discontinuities between the mathematics at
the pre-university and university levels. Adapted versions of that test have
been conducted in the subsequent years, being part of several Educational
Improvement Projects promoted by the institution. They have been used to
help the UJI Mathematics Department professors to get to know the skills of
the students they are going to work with. As a consequence,they would assure
what is expected for them to master, and it would allow them to freely modify
their didactical strategies of education.
Data.
The item features in Table 1 have been used in our first SIA analyses,
under the classical theory, in order to define fictitious students. Let us note
that this classification is uneven in the sense that the cardinals of classes are
very different. Fictitious students are Algebra (A) and Calculus (C), scoring 1
only in items classified under those respective type of knowledge, and problem
(p), graphical (g) and exercise (e), scoring 1 only in items classified under
those respective types of task.
In this section we present our methodology, based in the analysis of the optimal
groups of individuals regarding the contribution to the formation of classes
and their typicality, in order to improve the interpretation of the rules and
the quantity of information about the sample. We shall present what will be
shown in particular cases, without the intention to cover all aspects.
The original data processed through software CHIC 3.7 [5] leads to a clas-
sification tree, after the similarity index defined by I.C. Lerman [16]. The
classification tree for the data containing the supplementary fictitious stu-
dents is so close to the one for the original data, that eye inspection cannot
distinguish among them. This fact allows us to go beyond the analysis re-
sult, examining the role of new students (item features) in the constitution of
classes and rules.
If we relabel items, reflecting the criteria of the a priori analysis, and
details on the type of task, we display the dependence on the variable class
formation in a better way, and we can then compare it to the fictitious stu-
dents appearing in the optimal groups of classes (see Fig. 1). Codes ‘m’, ‘t’,
‘i’ do represent mathematical modelisation (m), algorithmic technique (t), in-
terpretation or judgment (i). The relation between former and new labeling
is expressed in Table 2.
Fictitious students taking part in optimal groups for the contribution to
the formation of classes as well as for typicality are actually the same (however
we display it in Fig. 1 only for significant knots).
Classification tree shows significant knots at levels 1, 6, 9, 12, 15, 17, et
19, being level 12 the most significant.
For instance, items P4, P5 of class 1 are, a priori, calculus exercises, and
those features do appear as fictitious students in the optimal group of indi-
viduals contributing to the formation of that class. On the other side, level 9
items (((P15, P17a), P16), P17b) are algebra problems, but only the fictitious
student Algebra do appear at the contribution optimal group of students.
326 Pilar Orús and Pablo Gregori
The results of the implicative analysis driven through CHIC on our question-
naire, using the classical implication and the Poisson distribution, depict rules
among the questions coming from the answers given by the students sample.
Figure 2 (left and center) represents those quasi-implication rules using, re-
spectively, the real sample and the enlarged one (with fictitious students).
Fictitious pupils and implicative analysis 327
m
m
m
m
m
t
t
e
p
p
A ,
m
t
j
C
e
A
X
a
p
t
t
t
t
C
g
A
p
e
4
7
C
C
A
X
1
6
1
b
C+e A+p
(C+e)
C
(C) A+p A
(A)
C+A A+p
(C+A)
A C +A
(A) (C+A) C+A
(C+A)
Fig. 1. Similarity tree of the questionnaire using new fictitious students (C: Calculus,
A: Algebra, e: exercise, p: problem, g: graphic) and highlighting them when belonging
to optimal groups regarding the contribution and typicality (between parenthesis).
Items are labeled as shown in Table 2.
9Cet b14Cpm
p9 p14b p9 p14b B
a17Xgt a14Cet
p4 p13b
4Cet b13Apm
17b p13b p4
A
17b p13a p5
p13a p5 p6b
b17Xpm 5Cet a13Apm
p15 p8 p6b
p8 p15 p7 p6a
15Apm
p10 p16 p2
p12 p3 p2
C
p1 p3 p12 16Apgm
p1
Fig. 2. Implicative graph of data without (left) and with (center) fictitious students.
Implicative graph of the subset of items forming class 12 in the classification tree
(right). Items are labeled as shown in Table 2.
The cohesion tree of the original data shows significant knots at levels 1, 3, 8,
14 et 17, being the most significant the one at level 1 (see Fig. 3).
Here we discover again that algebra problems P17a, P17b, P15 and P16
are linked in the same class, but now, the chain of implications given in the
implicative analysis shows new nuances in the cohesion tree: [P17b → (P17a
Fictitious pupils and implicative analysis 329
a
b
5
6
1
p
p
Fig. 3. Cohesion tree of items in the questionnaire.
→ P16) → p15]. This class puts together, not symmetrically, two translation
tasks, one in graphical language and another one in formal language.
In level 14 we find items from implicative subgraph B. The cohesion analy-
sis shows a stronger relation between a- and b-parts of each problem P13 and
P14, than the implicative analysis did.
Now, the introduction of fictitious students lead to a similar cohesion tree,
where only a new level of aggregation (level 11) is significant. Cohesion is
strong: it equals 1 until level 15 and greater than 0.99 in the lowest significant
level. We display in Fig. 4 the cohesion tree with relabeling of items (see
Table 2), and fictitious students appearing in the contribution and typicality
optimal groups of each significant knot.
If we consider Fig. 4 as the compilation of all the provided information,
we can assess the existence of three classes, T, O and M, being M the most
significant, not only because of the signification of the meta-rule relating all
the items, but also because of the significance of the included rules.
Class M establishes a meta-rule characterised by the need of modelisa-
tion. It comprises complex questions, which require the interpretation of the
context and the knowledge of mathematical models in several frameworks
(graphical, algebraic, functional, . . . ). It establishes a dissimetry among two
meta-rules, also significant and already analysed [P14b →P14a] → [(P13a →
P13b) → P10)], which specifies rules between abilities and high level knowl-
edge (non algorithmic), and [((P17a → (P17b → P16)) → P15], which reveals
relations between graphical and algebraic registers. We have found fictitious
students Algebra and problem in the optimal groups for the contribution to
the formation of classes. Also, the presence of student Calculus in [P14b →
P14a] allows us to formulate that the ability in mathematical modelisation of
330 Pilar Orús and Pablo Gregori
a C m
3 m
b p m
7 m
A m
b A t
1 p t
m
1 p
1 e
1 p
0 p
1 p
6 g
5 g
1 j,
C t
6 pj
C j
1 j
3 pm
4 e
a C
9 p
1 p
b pa
p
1 A
a X
1 X
1 et
5 et
7 et
1 gt
2 et
8 et
C
A
A
A
4
4
3
7
C
C
A
A
X
A
6
1
b
a
C+e
A+p (C + e)
(A + p) A+p
(A + p)
C
C
(C) B
A+p
C+e (A + p)
(C + e) O
f +72 al.
(f) A+e
(A + e)
f +91 al
e
(C)
A
(A+p)
M
T f +218 al
e
A
A (A+p)
(A+p)
Fig. 4. Cohesion tree of the questionnaire using fictitious students (A: Algebra, C:
Calculus, e: exercise, p: problem, g: graphic). We point them out into squares when
they belong to optimal groups regarding the contribution and typicality (the last
one between parenthesis). φ is used to indicate no fictitious student belong to the
optimal group. Items are labeled as shown in Table 2.
problems implies the ability in its resolution technique. And it confirms the
characterisation that we have done to class M, strengthened by the presence
of students Algebra and problem in the optimal group of typicality of the class,
and all its significant subclasses.
Class T groups together practically all items which can be solved with al-
gorithmic techniques. It also includes some rules determined by the presence
of Calculus and exercise as students in the optimal groups for the contribu-
tion. Here the type of knowledge of the item is not relevant but the applied
technique used to solve it.
Class O is determined by the contingency: no fictitious student appear
among the 91 individuals contributing to the formation of this class. Calculus
appears only in the optimal group for the contribution to the rule P8 → P3,
together with 93 more students, but it disappears at the following level.
Fictitious pupils and implicative analysis 331
Taking into account the new fictitious students and the according item labels,
we observe that the similarity tree is practically the same one as the one built
from original data, therefore only one tree is shown in Fig. 5.
332 Pilar Orús and Pablo Gregori
Item
Requires modelisation P8, P10, P12, P13a, P15,
Type of task (m) P17b
Requires interpretation P6a, P6b, P8, P10, P12,
(i) P13b, P14b, P16
Application of technique P1, P2, P3, P4, P5, P7,
(t) P9, P11, P14a, P17a,
P17b
Function P2, P6a, P6b, P8, P10,
Type of knowledge
(F) P12, P15, P16, P17b
Equation P3, P4, P5, P8, P9, P10,
(E) P11, P13a, P13b, P14a,
P14b, P16, P17b
Table 3. Redefinition of types of knowledge and task of items in the questionnaire
(new fictitious students).
Then we use the analysis including the fictitious data in order to get new
conclusions. Three classes are identified, respectively dubbed C1 , C2 and C3 .
Class C1 contains the highest statistical significance levels of aggregation, and
shows a class of items characterised by applicability tasks in particular context
(what is commonly referred as problem and labelled by ‘p’) even though if the
nature of the application varies: use of algorithm (P1, P3, P14a), contextual
interpretation (P10, P13b, P14b), mathematical modelisation (P13a) or a
combination of them (P8). Class C2 may be the result of chance, since it does
not contain significant levels. Finally, class C3 gathers items concerning the
graphical nature of functions or their operations.
The inspection of optimal groups of students for both the contribution and
the typicality leads to the information shown in Fig. 5.
We can state that, in the classification analysis, fictitious students only
contribute to the formation of most of subclasses along the first levels of
aggregation, because of the heterogeneity of the final classes. However, they
belong to typicality optimal groups along all subclasses and classes.
The cohesion tree of the original data resulting under the entropic theory
(Fig. 6, left) shows a weaker structure: isolated and binary rules are abundant,
and only one meta-rule is present. Nevertheless, it is practically a simple rule,
since the implication of the premise involves two items which are parts ‘a’ and
‘b’ of a double question of the questionnaire. This cohesion tree does not seem
to show relevant information. However, the consideration of fictitious students
(Fig. 6, right) adds up structure to data: it reduces the number of isolated
items whereas the number of meta-rule raises, involving items belonging to
class C1 of similarity tree.
Fictitious pupils and implicative analysis 333
Table 4. Relabeling of items following the new fictitious students: Each item is
described as (1) item number (now with prefix a or b if it had that suffix in the
initial labeling) (2) series of letters expressing features of the item, (E: Equation,
F: Function, X: both Equation and Function, e: exercise, p: problem, g: graphic, m:
mathematical modelisation, t: algorithmic technique, i: interpretation or judgment).
334 Pilar Orús and Pablo Gregori
1 pm
x
a pm
7 i
4 t
7 t
4 i
E i
a pg
g
a Fp
b Fe
2 Fp
b Fg
a mi
8 pi
b pi
9 pi
1 pi
1 et
E
F
5 t
1 t
4 t
7 t
1 t
X
X
p
e
g
F
3
3
t
t
X
F
F
p
1
1
1
F
(F)
t
(t)
t F E+i
(t) (F) t (E+i) F
(t) (F)
f f
(t) t (f)
(t) C2 f
f (F)
f (t) C3
(f) C1 f
(t)
F f
f (F) (F)
(F)
Fig. 5. Similarity tree of the questionnaire using new fictitious students (F: Func-
tion, E: Equation, t: technique, i: interpretation, m: modelisation), pointing them
out into squares when they belong to optimal groups regarding the contribution and
typicality (the last one between parenthesis). φ is used to indicate that no fictitious
student belong to the optimal group.
14 m
4 m
6X x
6X x
a pm
g t
e t
gi
3 i
14 i
a pm
g t
e t
gi
3 i
14 i
1 Eg
1p mi
a1 Fp
b Fp
a Fp
3F Fe
2E Fg
1 Eg
0X i
a1 Fp
b1 Fp
a Fp
3F Fe
2E Fg
a6 pi
7e pi
11 pi
1 et
b1 pi
1 m
6E i
7e pi
1p pi
11 pi
5F et
p
p
b1 t
b t
et
10 t
15 t
b1 t
1 t
4 t
b1 t
15 t
p
6E
2E
E
e
17
2E
E
e
F
e
e
3
17
7
t
t
X
4F
9F
8X
9F
F
b6
8
Arbre cohésitif : C:\Documents and Settings\PILAR\Mis documentos\LbroASI\datos-2mod-libro2.csv Arbre cohésitif : C:\Documents and Settings\PILAR\Mis documentos\LbroASI\datos-2mod-libro1.csv
Fig. 6. Cohesion tree under entropic version without (left) and with (right) fictitious
students.
Fictitious pupils and implicative analysis 335
4 m
X x
a1 pm
g t
e t
i
3 i
4 i
16 Eg
pg
10 mi
b1 Fp
a1 Fp
a1 Fp
3F Fe
2E Fg
a6 pi
11 pi
5F et
7e pi
1p pi
b1 t
12 t
4F t
et
15 t
E
E
E
p
F
e
7
t
t
8X
9F
b6
b1
F
(F) F
(F) t
(t)
E+i
(E+i) f
F+t E
(F+t) F (E+F+i)
f (F)
f (E+i)
F+t
(f )
Fig. 7. Cohesion tree of the questionnaire using new fictitious students (F: Func-
tion, E: equation, t: technique, i: interpretation, m: modelisation). We point them
out into a square when they belong to optimal groups regarding the contribution
and typicality (the last one between parenthesis). φ is used to indicate no fictitious
student belong to the optimal group. Items are labeled as shown in Table 2.
We show the comparative implicative graphs in order to finish the study. En-
tropic formulation is more strict than the classical one regarding the formation
of rules. We switched the originality threshold in CHIC to 0.90 (see Fig. 8).
We observe that both graphs keep a low number of implications at a thresh-
old of 0.99 (the one used under the classical version). The comparison with
the classical setting, in which variations in the implication significances and
336 Pilar Orús and Pablo Gregori
b14Fpi b13Fpi
8Xpmi 5Fet b6Epi b17Egx a17Fgt b13Fpi b14Fpi
Fig. 8. Implicative graphs under entropic version without (left, threshold=0.95) and
with (right, threshold=0.90) fictitious students. Arrows are drawn normal (90%),
barred (95%) and doubled barred (99%) after the intensity the implications exceed.
5 Conclusions
The description of the method involving fictitious students in the different
analyses performed with software CHIC, have permitted us to point out and
show pieces of information which are complementary to the ones resulting
from the classical descriptive statistics and to SIA, that we summarise as
follows:
1. We have ascertained, rather generally, that the introduction of 5 fictitious
students in the initial data matrix has not basically altered the struc-
tures that the different SIA analyses resulted in, just slight modifications
showing a logical but low sensibility in front of a low number of added indi-
viduals (5 over 690 in our case). Then we warn the potential users to check
the size of changes in global analyses before proceeding to the search of
conclusions. This stability in SIA results legitimate, up to our knowledge,
the use the extended matrix in order to interpret results of the original
data through the fictitious students. In that sense, the slight variations
between the analyses mean a positive feedback to the used methodology.
2. Fictitious students, playing the role of students belonging to the opti-
mal group of students regarding either the contribution to the formation
of classes or the typicality within each class, help to explain features of
significant classes issued from the classification and cohesion analysis.
In the first part, using the classical theory of implication, version 3.7
of software CHIC and the a priori matrix made of fictitious students
Calculus, Algebra, exercise, problem, graphic, we have been able to see
that:
Fictitious pupils and implicative analysis 337
References
1. R. Agrawal, T. Imielinsky, and A. Swami. Mining association rules between sets
of items in large databases. In Proc. of the 1993 ACM SIGMOD international
conference on Management of data. ACM Press, 1993.
2. A. Bodin. Modèles sous-jacents à l’analyse implicative et outils complémentaires.
Cahiers du séminaire de didactique de l’IRMAR de Rennes, 1996.
3. G. Brousseau and E. Lacasta. L’analyse statistique des situations didactiques.
In Actes du Colloque Méthodes d’analyses statistiques multidimensionnelles en
Didactique des Mathématiques, ARDM, pages 53–107, 1995.
4. C. Fonseca C, J. Gascón, and P. Orús. Las organizaciones matemáticas en el
paso de secundaria a la universidad. análisis de los resultados de una prueba
de matemáticas a los alumnos de 1o de la uji. In Actas Jornadas de la CV,
Universitat Jaume I, 2002. Societat d’Educació Matemática de la C.V.
5. R. Couturier. Traitement de l’analyse statistique dans chic. In Actes des
Journées sur la Fouille de Données par la Méthode d’Analyse Statistique Im-
plicative, pages 33–50, IUFM de Caen, 2000.
6. R. Couturier and R. Gras. Introduction de variables supplémentaires dans une
hiérarchie de classes et application à chic. In Actes des 7èmes Rencontres de la
Société Francophone de Classification, pages 87–92, Nancy, 1999.
7. J. David, F. Guillet, V. Philippé, and R. Gras. Implicative statistical analysis
applied to clustering of terms taken from a psychological text corpus. In Con-
ference International Symposium Applied Stochastic Models and data Analysis,
AMSDA, Brest, 2005.
Fictitious pupils and implicative analysis 339
Appendix
expresses the gap between the theoretical and observed values assuming in-
dependence between A and B. This value is called implication index in spite
of being an indicator of the non implication, since it measures the size of
counterexamples.
Now, the intensity of implication a → b is denoted by ϕ(a, b) and defined
as
Whenever the Gaussian approximation fits, and therefore the use of the
implication index q(a, b), an approximate value of the implication intensity is
Z ∞
1 2
ϕ(a, b) = 1 − P (Q(a, b) ≤ q(a, b)) = √ e−t /2 dt
2π q(a,b)
Let us show how the intensity ϕ(a, b) varies when a new individual x0 is
added to the sample, in the four different cases (see Table 5). When necessary,
we use subindex 1 with values and random variables concerning the original
sample, and subindex 2 to the respective values regarding extended sample.
and
P (card(A2 ∩ B2 ) > na∧b ) − P (card(A1 ∩ B1 ) > na∧b ) =
∞ i i
∞ i i
X λ λ X λ λ
e−λ2 2 − e−λ1 1 = e−λ2 e−(λ1 −λ2 ) 2
− 1 ≥
i=na∧b +1 i! i! i=na∧b +1 i! i!
∞
∞ X (λ − λ )iλi−1
X λ i − λi
2 1
e−λ2 2 1 −λ1 1
≥e =
i! i!
i=na∧b +1
i=0
P1 You buy a shirt for PTA 4000 with 15% discount. How much should you
pay for the shirt?
2x + y = 1
P2 Find out the solutions of the system of equations
3x + 2y = 3
2
P3 Represent the graph for the function t(p) = 4p − p
5
P4 Calculate the derivative of the function f (x) =
(3x − 2)2
P5 Calculate the definite integral (where x is the variable of integration and
R3
a is a constant): 2axdx
1
P6a When solving an equation you get to the expression 0 · x = 8, how do
you interpret this result?
P6b And how do you interpret it when you get to the expression 0 · x = 0?
P7 Calculate the least common multiple of 280 and 350.
P8 A firm is getting an income of I(x) = 50x − x2 USD, where x represents
produced units, and it has expenses of C(x) = 38x + 20 USD. How many
units should it produce in order to get benefits?
P9 The functions f (x) = 3x4 + x and g(x) = x3 − 100x2 tend to zero as x
f (x)
tends to zero. Calculate the limit of the quotient as x tends to zero.
g(x)
P10 How would you compare the following job offers to distribute electoral
brochures? (a) You are paid a fixed amount of PTA 50.000 plus PTA 10
per each delivered brochure. (b) You are paid a fixed amount of PTA
30.000 plus PTA 15 per brochure.
P11 Calculate the derivative of the following function with respect to the
variable x: f (x) = 8sx (where s is a real number)
P12√Can we consider√ both x = 4 and x = 36 as the solutions of the equation
3x − 8 = 4 − x? Provide arguments.
P13a The amount C(t) of water springing from a tap (in litres) is expressed
by an affine function with respect to time t (in seconds). If the water
344 Pilar Orús and Pablo Gregori
gathered in the first second is 3 litres, 5 litres in the second one, and 7
litres in the third one, how much water is gathered in a general instant t?
P13b How much water is gathered in one hour?
P14a The sales of a product after t years from its commercial launch, V (t)
(in thousands of units), is expressed by the function V (t) = 30 · e−1.8/t .
Calculate the limit of V (t) as t tends to infinity.
P14b Interpret the previous result in terms of sales of the above referred
product.
P15 Express in an algebraic language the following sentence: “The product of
three consecutive even numbers is 1287”.
P16 The graphic of the function f (x) = (x − 1)(x + 1)(x + 3), in which points
does it cross the x-axis?
P17a Draw the curves x2 + y 2 = 4, y = 2 − x along the same coordinate axis.
P17b Find out, algebraically, the points where they both intersect.
P1 Compras una camisa que marca 4000 ptas. y te hacen un descuento del
15%. Calcula lo que tendrás que pagar por la camisa.
2x + y = 1
P2 Busca soluciones del sistema de ecuaciones
3x + 2y = 3
P3 Representa gráficamente la función t(p) = 4p − p2
5
P4 Calcula la derivada de la función f (x) =
(3x − 2)2
P5 Calcula la integral definida (donde x es la variable de integración y a es
R3
una constante): 2axdx
1
P6a En la resolución de una ecuación llegas a la expresión 0 · x = 8, ¿cómo
interpretas este resultado?
P6b ¿Y si llegas a la expresión 0 · x = 0?
P7 Calcula el mínimo común múltiplo de 280 y 350.
P8 Una empresa tiene unos ingresos de I(x) = 50x − x2 dólares, donde x
representa las unidades producidas, y unos costes de C(x) = 38x + 20
dólares. ¿Cuántas unidades hay que producir para obtener beneficios?
P9 Las funciones f (x) = 3x4 + x y g(x) = x3 − 100x2 tienden a cero cuando
f (x)
x tiende a cero. Calcula el límite de la función cociente: cuando x
g(x)
tiende a cero.
P10 ¿Cómo compararías las siguientes ofertas de trabajo de repartir propa-
ganda electoral? (a) Te pagan una cantidad fija de 50.000 ptas. más 10
ptas. por cada papeleta depositada en un buzón. (b) Te pagan 30.000 ptas.
fijes más 15 ptas. por papeleta.
P11 Calcula la derivada de la siguiente función respecto de la variable x:
f (x) = 8sx (donde s es un número real)
Fictitious pupils and implicative analysis 345
√ √
P12 ¿Se pueden considerar como soluciones de la ecuación 3x − 8 = 4 − x
los siguientes valores, x = 4 y x = 36?. Razona la respuesta.
P13a El volumen C(t) de agua que mana de un grifo (en litros) viene dado
por una función afín respecto del tiempo t (en segundos). Si en el primer
segundo el agua recogida es de 3 litros, en el segundo es de 5 litros y en el
tercero es de 7 litros, ¿cuál es el volumen de agua recogida en un instante
cualquiera t?
P13b ¿Cuál es el volumen de agua recogido en una hora?
P14a La cantidad de miles de unidades vendidas de un producto, V (t), de-
spués de transcurridos t años de su lanzamiento comercial, viene dada por
la función V (t) = 30 · e−1.8/t . Calcula el límite de V(t) cuando t tiende a
infinito.
P14b Interpreta el resultado anterior en términos de ventas del producto en
cuestión.
P15 Expresa en lenguaje algebraico el enunciado siguiente: “El producto de
tres números impares consecutivos es igual a 1287”
P16 La gráfica de f (x) = (x − 1)(x + 1)(x + 3), ¿en qué puntos corta al eje
de las x?
P17a Dibuja las curvas x2 + y 2 = 4, y = 2 − x sobre los mismos ejes de
coordenadas.
P17b Encuentra, de manera algebraica, los puntos donde se cortan.
Identifying didactic and sociocultural obstacles
to conceptualization through Statistical
Implicative Analysis
1 Introduction
In 1744, Tatanga Mani, in his autobiography, “Indian Stoney” commented on
his education “Oh yes! I went to the white man’s school. I learned how to
read his schoolbooks, newspapers and the Bible. But I discovered in time that
this was not enough. Civilized people depend far too much on the printed
page. I turned to the book of the Great Spirit which is present in the whole
of creation. You can read most of this book by studying nature. You know, if
you take all your books and spread them out them under the sun and leave
them for some time, the rain, snow and the insects will do their work, and not
much will remain. But the Great Spirit gave you and I the chance to study
at the university of Nature: the forests, the rivers, the mountains, and the
animals to whom we belong” [20, Mc Luhan, 1971 p.110 in Dasen 2001].
This piece of research is situated within the field that deals with the re-
lationship between culture and cognition [1, 26]. The subject matter takes its
N.M. Acioly-Régnier and J.-C. Régnier: Identifying didactic and sociocultural obstacles
to conceptualization through Statistical Implicative Analysis, Studies in Computational
Intelligence (SCI) 127, 347–379 (2008)
www.springerlink.com © Springer-Verlag Berlin Heidelberg 2008
348 Acioly-Régnier and Régnier
3
Universidade Federal do Pernambuco Recife (Brazil), Town situated in the trop-
ical zone of the southerly hemisphere. (8.03S, 34.54W)
Identifying didactic and sociocultural obstacles to conceptualization 349
Our concern in this paper is not to understand how competences are acquired,
but rather to understand how at a given time they are organized by a level
of conceptualization, a notion that underlies the idea of cognitive develop-
ment. Without calling into question Piaget’s model, our approach requires
a theoretical framework that integrates the idea of life long adult cognitive
350 Acioly-Régnier and Régnier
more difficult. Consequently, for the observer, this level remains rather inac-
cessible.
In conclusion, in a given situation, the possibility of locating the various
representations which the subjects use to solve a problem can give us im-
portant information to understand their cognitive processing and their level
of conceptualization. In particular, they can show us the obstacles which pre-
vent certain subjects from passing from one type of representation to another,
which may be better adapted and more effective.
theory offers many examples of the obstacles related to the period of de-
velopment of the subject.
• Obstacles of epistemological origin. These are the obstacles identified
throughout the history of the conceptual development of a discipline. Some
learners’ difficulties may be obstacles which arise from this history. These
obstacles are met for example in the history of science, elements of which
may be observed in the spontaneous models of pupils in learning situa-
tions. An epistemological obstacle is then constitutive of incomplete learn-
ing. This shows that it is not the single result of a chance error which it
is sufficient to correct, or ignorance which can be remedied, or of another
unspecified incapacity. It can result from cultural, social and economic
conditions, but these causes are brought up to date in designs which resist
even when the causes disappear.
• Obstacles which have an educational origin. Bachelard [4] speaks about
the teaching obstacle and Brousseau [5] introduced the concept of an ob-
stacle of didactic origin. As far as we are concerned, we consider that the
conceptualization of reality as well as the representations are constructed
through various types of learning which emphasize certain specific aspects
of reality, and which are themselves related to the particular culture in
which the learning takes place. These obstacles simultaneously come from
different sources: teaching, didactic and socio-cultural and depend upon a
model of teaching and learning.
The phases of the moon correspond to apparent changes of this satellite of the
Earth. These changes depend on the respective positions of the Earth and the
Moon in relation to the to the sun. Depending on the time of the month and
year, the sun’s light will shine on one or other part of the moon. When the
moon is between the sun and the Earth, the part lit by the sun is invisible,
giving the new moon. When the first zone becomes visible from the Earth, the
first crescent is visible. When the moon reaches its first quarter, a half-moon
is visible. When the moon is opposite to the sun in relation to the Earth, a
full circle, the full moon may be seen. When the moon reaches three quarters
of its cycle, it is the last quarter, one sees the other half lit by the sun. This
part continues to decrease, the last quarter; then the new moon returns. The
cycle of the phases of the moon, called the lunar month, lasts twenty-nine
and a half days. During the complete rotation of the Earth, the moon also
rotates on its own axis. This brief description (Fig. 1), shows the high degree
of conceptual complexity implied in this process of celestial and astronomical
mechanics. It requires taking into account the moon, the Earth and the sun,
but especially the relationship between three elements, which are in motion.
The moon turns on itself and around the Earth, which turns around the sun
and on itself. The position of the observer on the Earth as well as the time of
day should also be taken into account.
354 Acioly-Régnier and Régnier
Throughout history, human beings have always wondered about the moon [15].
We cannot approach in detail all the questions raised here. A number of works
have treated this question, notably, The Moon, myth and image by Jules
Cashford [8]. One of the dominant representations of the moon appears as the
image of a lunar boat crossing the night sky. Figures (2, 3, and 4) show ancient
pictorial representations while the following figures (5, 6, 7 and 8) show visual
representations of the moon with which subjects may be confronted in their
actual daily life. Drawings (6, 7) are taken from comic strips produced in the
southern hemisphere respectively by Mauricio de Sousa (Fig. 6), in Brazil and
by Bernard Berger in New Caledonia (Fig. 7).
It is also necessary to add all the scientific photographs produced by satel-
lites or even by man’s direct visits, on several occasions, to the moon itself.
There are also those produced synthetically by computers.
We can find artistic representations of the moon’s phases such as figure 8
The moon in all its apparent phases constitutes an object of the perceptive
experience of everyday life in early childhood. The observation of this object
Identifying didactic and sociocultural obstacles to conceptualization 355
Fig. 4. Lunar tree surrounded by lattice and torches (Harding 2001 p.93)
Fig. 6. Chico Bento (number 163, 1993) Mauricio de Sousa Editora GLOBO São
Paulo Brazil
356 Acioly-Régnier and Régnier
constitutes then a first experience in the Bachelardian sense. The mental rep-
resentations that the subjects construct in this way, can constitute obstacles
of epistemological origin with which they will be confronted when it is a ques-
tion of understanding the phenomenon of the phases of the moon. From a
Vygotskian understanding, this observation leads to the spontaneous forma-
tion of daily concepts relating to the phenomenon of the apparent movement
of the moon and of its phases, and its link with the apparent movement of
the sun, with a weak use of language. These concepts are isolated from each
other and develop apart from any given system. They are temporally or lo-
cally relevant, and may also lead to generalizations which can be abusive.
Their conceptual weakness is manifested by an incapacity for abstraction or
an inaptitude for intentional use. What is characteristic is their incorrect use.
Frequently saturated by the rich personal experience of the subject, and as
such, socio-cultural obstacles to conceptual development become apparent.
Data resulting from discussions with eight adult illiterate subjects of rural
areas of Brazil illustrates our research. These subjects are characterized as all
having little contact with a written culture. Questioned on “how they see the
moon when it is not full ”, they provide answers giving clues to the nature of the
obstacles indicated in this article. Thus Maria, 60 years old, a cleaning lady,
from the city of the Sertão, Nordeste of Brazil, draws the moon B (Appendix
1) and explains that “it is like Lampião’s hat” (a famous character known
in that city). Nen, 40 years old, with the drawing of the moon C (Appendix
1), explains why “it is like a smile”, and Neta, 35 years, that “the moon is
presented in the form of a hammock”. This complementary data consolidate
Identifying didactic and sociocultural obstacles to conceptualization 357
Concerning scientific concepts, Vygotski postulates that they arise from indi-
rect contact with the object and can be acquired only by a continuous process
from general experience to the private individual. They are formed under the
intentional action of the school, beginning with a teacher’s explanation, who
exposes a scientific formulation of the concept. Their weakness lies on the one
hand in his verbalism, principal source of the shifts which generate obstacles to
conceptual development; and on the other hand, insufficient links to concrete
experience and knowledge. However, analysis of our research data corroborates
with Piaget’s idea (1969) where verbalism of the image also exists.
Through a survey carried out over the last five years among primary school
teachers still in training and those in secondary schools in initial or profes-
sional training at the University Teacher Training Institute (IUFM) of Lyon,
we collected data which enabled us to identify three main categories of ap-
proaches to teaching the question of the phases of the moon in their courses
in France. The first was described as a way of telling a fable or tale to ap-
proach the concept of the phases of the moon for young children. The second
was based on more technical elements calling upon mnemonic techniques that
they themselves had acquired during their own education and more related to
learning at secondary school. Finally the third target group was also primary
school teachers and was based on a model implying a scientific approach to
observation similar to that of astronomers. In an article on Toussaint (1999)
we found an extremely relevant and adequate description similar to that which
we observed among teachers’ replies, related to the first two steps
associated only with only one situation is not restricted here by one meaning to
give an account of all the representations which the subjects by will have built
elsewhere through early experience by looking at the moon in their childhood.
This story-telling approach does not give optimum conditions for learning
subjects to go beyond the everyday concepts relating to the phases of the
moon, to arrive at scientific concepts with the meaning given by Vygotski [35].
In addition the images given by the forms D and C constitute prototypical
representations of phases of the moon within Rosch’s meaning [25].
Here the question of the phases of the moon is introduced with more technical
concepts. Toussaint [28] gives an account of it by calling upon its characteris-
tics experienced by a schoolboy.
“Later, I came across another rule which said that the moon (it was less
funny!) was no longer telling lies: by carefully drawing the diameter which
goes from one end to the other, one can write in tiny characters a P with the
first quarter and a D with the last”. We see that in this case a geometrical
concept appears — diameter — which can give the impression of a more
learned approach. But the use of diameter does not bring anything more than
the mapping of a lunar position with one of the two letters P and D, which are
mobilized only as meaning in a way identical to the first step. Surprisingly, the
teachers however, seem to regard this approach as requiring a higher level of
conceptualization. This is suggested by the fact that this approach was never
used with primary school pupils.
This less frequent third approach called upon systematic observations and the
use of a model with the pupils. We collected results from a teacher training
course on the didactics of physics. In this step, they were asked to identify the
representations of the phases of the moon, then to carry out systematic ob-
servations of the moon with which the representations are confronted. Finally
the trainee teachers are confronted with a model in the shape of a model of
the system Earth-moon-sun which reproduces the movements of these bodies.
Starting from this mechanical device, they are confronted with problems of
the type: in its various phases how does the moon appear from the Earth?
It appears that this model, in spite of its concrete material characteristics,
allows only a limited understanding for trainee teachers, insofar as when they
approach the phases of the moon they do so infrequently or not at all in their
teaching practice. The main argument they use relates to the high cost of such
device in a teaching-learning situation. However it is clear that this step, as
“costly” as it is, requires an active process of conceptualization.
Identifying didactic and sociocultural obstacles to conceptualization 359
To begin with, we were struck by the fact that the Brazilian students faced
with the instruction to draw the moon as they saw it from the subequatorial
tropical zone, systematically produced drawings with a typical representa-
tion of the moon observed more frequently from mainland French sky. The
great majority presented the moon in the shape of a “crescent” (Fig. 6) facing
the right and a lower proportion in the shape of a “crescent” facing the left
(Fig. 5). The contradictions of these two categories of response give rise to
a genuine situation of socio-cognitive conflict. However by remaining on this
level of exchange, the nature of the responses built on the basis of first-hand
experience and on the tools provided by the cultural environment and the
written culture did not offer the conditions for a significant rise in the level
of conceptualization. To modify these conditions, the subjects were then in-
vited to directly observe the moon and to compare their perception with their
drawings. The result then seemed to provoke a real cognitive destabilisation
and a desire to understand the situation from a conceptual point of view.
Within the context of psychology teaching, this didactic situation carried out
on the phases of the moon led the students to analyse the school textbooks
to pinpoint the role and the place of iconic representations in scientific learn-
ing. Piaget himself [23, p.110] considered that: “image, film and audiovisual
360 Acioly-Régnier and Régnier
processes of which all pedagogy keeps harping on about today and wants to give
the illusion of being modern, are invaluable auxiliaries as an addition or as
a spiritual crutch, and it is obvious that this is a progress compared to purely
verbal teaching. But there is verbalism of the image just as there is verbalism
of the word”.
This activity had led the students to verbalise their awareness that the
responses initially provided were determined by school learning which, in
the name of immediate efficiency, favours excessive simplification and reduces
memory based learning of the signifiers without working on the concepts to
which they are attached. This perspective is reinforced, in the cultural en-
vironment, by the graphic representations of the moon that we find in the
media, in comics (Fig. 6 and Fig. 7), in advertisements, etc.
When we transferred the situation in France to the Teacher Training Insti-
tute (Institut Universitaire de Formation des Maîtres) in Lyon, as well as at
the university (University Lyon2), we modified the procedures. The situation
problem given to the students and the trainee teachers was not based on the
request of drawing the moon, but on a story told as follows: “In Brazil, we
asked to students to draw the moon as they saw it in their sky. Because they
drew the moon as a vertical crescent or a slightly bent one, that faced the right
for the most part of them or the left for the others, I asked them to observe
it directly by watching in the sky. I told them to increase the stakes by paying
for a coffee to anyone that will see it as he drew it”. To end the story we
added: “Nobody came to claim the offer ”, and we then addressed the following
question to the French students: “Why was this the case?”
Using a qualitative approach, the analysis of the responses obtained lead
to differentiate five main categories. The subjects were allowed to give several
answers in their arguments and also to change the meaning of the answer by
exchanging their views with their fellow students.
The first category [CAT1] corresponds to responses overly determined by
the socio-cultural dimension. In this category we found the following: “They
did not dare to do so because you are married” or “. . . you are a professor” or
“ . . . your husband is jealous”, and also “Because they watched the moon lying
in a hammock” etc.
The second category [CAT2] corresponds to responses overly determined
by the personal characteristics of the subjects questioned. For example: “They
disliked coffee, or pubs” etc.
The third category [CAT3] corresponds to responses overly determined by
local conditions, contexts, or circumstances. We found for example: “they were
living downtown and could not see the sky so well” or “there was fog” etc.
The fourth category [CAT4] corresponds to answers overly determined by
the exact knowledge of the situation-problem. For example: “I cannot say, I’ve
never been there” or “I went there, and obviously the moon is not like that”
etc.
The fifth category [CAT5] corresponds to responses overly determined by
school knowledge. We found for example: “I believe it has something to do
Identifying didactic and sociocultural obstacles to conceptualization 361
with shadows, and light” or“. . . with the moon, the sun, and the earth and the
hemispheres” etc.
As said in the introduction, Luria [21], studying the illiterate Uzbek farm-
ers without any written culture, observed similar results. The question is then:
Are the facts here collected just anecdotic, representing particular behav-
iour of subjects confronted with a scholarly situation-problem even though
they are at a university level during their initial training?
Alternatively, do these facts reveal socio-cultural obstacles, impeding the
rise of the level of conceptualization, with pedagogic and/or didactic origins
that could be found in the school institution, or with epistemological origins
that are linked to the development of the concept? In order to explore such
hypotheses further we designed a questionnaire aimed at investigating mental
representations concerning the moon phases and we conducted a survey on
subjects from various geographical and socio-cultural contexts.
Questionnaire-based survey
This questionnaire-based survey was carried out along with observation ob-
tained through empirical methods located respectively in metropolitan France
in Lyon, in Noumea, New Caledonia and in Recife, in north-east Brazil over a
long period from 2001–2004. The overall sample is composed of 198 subjects.
Note that the individuals of the Ech_IUFM sample were not submitted
that to two questions Q1 and Q2
362 Acioly-Régnier and Régnier
The first part of the questionnaire consists of 7 questions which provide infor-
mation relating to the subjects such as sex, age and the length of professional
experience. The variable SEX corresponds to the vector variable (MALE;
FEMALE) whose two components are additional binary variables. We also
introduced a “place of residence” variable whose detailed form corresponds to
the membership of a “usual cultural area: Kanak” in New Caledonia. However
we restricted it here to the couple of additional binary variables (HN; HS)
indicating respectively the northern and southern hemispheres. The variable
“Age” was modelled by the couple of additional binary variables (Child; Adult)
The question Q1 concerning the recognition or not of the shapes and ap-
parent positions of the moon is represented by a variable vector with 12 binary
components (1Asim; 1Anao; 1Anr; 1Bsim; 1Bnao; 1Bnr; 1Csim; 1Cnao; 1Cnr;
1Dsim; 1Dnao; 1Dnr). We coded xxnr the absence of answer.
Identifying didactic and sociocultural obstacles to conceptualization 363
The questions Q2 and Q3 which urge the subject to be located from an-
other point of view, each one are represented by a variable vector with 15
binary components:
Results (Table 3) relating to Q1 (Did you already see the moon like that
in reality?) confirm the use of prototypical memory (within the meaning of
Rosch [25]) in the evocation and the recognition of these lunar forms. Ac-
cording to Eleanor Rosch, among all the levels of abstraction possible, one
is psychologically more accessible than the others: called “basic level”, level
which makes it possible for the individual to obtain the maximum informa-
tion with the minimum of cognitive effort. Compromised between the most
abstract possible level but which offers, in same time, a sufficient number of
concrete attributes. Thus, the figures 1A (86.36%) and 1D (80.81%) “growing
directed vertically”, seem figures prototypic independently of the group studied
(Test of χ2 Appendix 4) with a prevalence of 1A “horns directed towards the
line”. On the other hand the figure 1B (29.29%) “horizontal crescent turned
4
SPAD_T Système Portable d’Analyse des Données Textuelles CISIA-France
Identifying didactic and sociocultural obstacles to conceptualization 365
Initially, we studied the data built on the 27 principal binary variables (Ap-
pendix 2) starting from the sample of the 198 individuals of which 28 lived
in the northern hemisphere and 170, the southern hemisphere. We gave to
a second publication, the exploitation of the 67 basic variables (Appendix 2,
Appendix 3) on the sample of 170 individuals of the southern hemisphere. In
fact here we studied the 27 variables which relate to the whole of the total
sample.
366 Acioly-Régnier and Régnier
with which they are confronted daily at their age and, in particular, figures
that they meet in their textbooks or even in children’s books.
Let us explore the implicative graph built to leave the 27 binary variables
instanced on the total sample of 198 individuals. With a confidence level of
0.99, we identify 7 implicative chains (Fig. 11). Five comprise only 2 terms.
Chain (Chx)
(CH1) (CH2) (CH3) (CH4) (CH5) (CH6) (CH7.1) (CH7.2)
Questions Q1 Questions Q2
The most typical variable
Child Male Male Child HS Child Female HN
Level of risk incurred by choosing the variable like most typical
0.149 0.117 0.278 0.0952 0.224 0.112 0.194 0.0355
The most contributive variable
Child Male Male Child HS Child Female Male
Level of risk incurred by choosing the variable like most contributive
0.149 0.117 0.278 0.0952 0.224 0.1 0.236 0.149
Table 5. Contributions and typicality of the additional variables
In this classification, we find the classes which were formed in the approach
by similarity. The formed R-rules are organized in a structure which confirms
the properties of the prototypic representations and the obstacles that they
induce such as we identified them through the analysis of the implicative
graph.
The R-rules are conceived as an extension of the binary rules (A) ⇒ (b)
with the rules. They are affected to a d°R degree calculated according to the
Identifying didactic and sociocultural obstacles to conceptualization 369
Classify
C1_lev21 C2_lev20 C3_lev17 C4_lev9 C5_lev16 C6_lev14
(5) (4) (4) (5) (4) (5)
degree of R-rule
4 3 3 4 3 4
Coherence
71/120 ≈ 20/24 ≈ 20/24 ≈ 91/120 ≈ 20/24 ≈ 115/120 ≈
0.5916 0.8333 0.8333 0.7583 0.8333 0.9583
The most typical variable
MALE HS CHILD CHILD HN FEMALE
Level of the incurred risk
0.278 0.224 0.0334 0.1 0.0355 0.229
The most contributive variable
FEMALE HS CHILD CHILD HN FEMALE
Level of the incurred risk
0.315 0.365 0.0361 0.1 0.0833 0.236
while the order resulting from the occurrences gives (2Dnao, 2Bsim, 2Csim,
2Anao, 2Sim). These two permutations reveal an inversion. From there co-
herence O(C6) ≈ 0.9583. The FEMALE characteristic contributes more and
remains most typical of the C6 class like chain (Ch7.1). At this level of infor-
mation, we do not have a relevant interpretation of this class in relation to
the contribution and the typicality of the female group.
The C5 class incorporated on level 16 corresponds to a R-rule of degree
3 and is associated with the under-chain (Ch7.2a) [2Asim] ⇔ [2Cnao] ⇒
[2Bnao] ⇒ [2Dsim]. Its composition still reflects the association of the pro-
totypical figures, aside, and not-prototypical, other. Taking into account the
characteristics of the subjects through the additional variables enables the fol-
lowing interpretation: the most contributive characteristic at the same time
as the most typical is the membership of the northern hemisphere. In fact
because of the composition of the sample, this one merges with the group of
the trainee teachers of the IUFM of Lyon. These subjects, the more well-read
men of the total sample, by their answers to the Q2 question, clarify their
representations, which are organized around the idea that the “other” located
in the southern hemisphere cannot see the “same moon”. Their training seems
to lead them, through metacognitive distance, to reject the prototypical fig-
ures according to a particular condition which is to be located in the other
hemisphere. This university training specific to teaching professionals could
play a part in this distance as we suggested (2.6.3).
The C3 class incorporated on level 17 present the interesting results: on
the one hand, it is the combined reflection of the rejection of the forms 1B and
1C and the attraction of the forms 1A and 1D; in addition, it arises that its
composition is largely determined by the EchNC_Enf sub-group (Table 6).
Identifying didactic and sociocultural obstacles to conceptualization 371
This property occurs in the sense that we pointed to leave the analysis of the
similarities. The C4 class incorporated on level 9 is also under strong influ-
ence of the conducts of answers of the subjects CHILDREN (Table 6). This
directed class appears by a source rule (2nao ⇔ 2Anr) which translates the
fact that to answer Q2 negatively a cascade of coherent failures to reply true to
fact, as a logical consequence. This property is almost tautological. However,
what draws our attention is the dominant contributive share of the sub-group
CHILDREN. Indeed, the refusal to believe that a virtual observer can see
these moons in the opposite hemisphere evokes an obstacle of ontogenetic
origin as much as an obstacle of didactic origin.
4 Conclusion
The different symbolic systems which play a role in the process of How has our
methodological reasoning led to the identification of didactic or sociocultural
obstacles? From the point of view of dealing with the statistical data, SIA has
enabled us to establish clear links between binary variables and between class
variables, of which the analysis reveals the importance of the role of didactic
obstacles and of both the school and the extra-school graphic environment.
From this point of view again, the resorting to implicative statistical analysis
with the support of CHIC software enabled us to experience the practical-
ities and to pursue reflection on the issues specifically related to statistical
modelling.
From the point of view of psychology we move away from the central
role of the individual to take into account conceptualization. In this respect,
Bruner [6] wrote: “we were psychologists conditioned by a tradition that put the
individual first. However, the symbolic systems that people use to build mean-
ing are already installed, they are already “there” deeply ingrained in culture
and language”. In this piece of research, we observed that specific symbolic
representations emphasises specific aspects of the concept and gives rise to
potential didactic and/or socio-cultural obstacles. These symbolic representa-
tions are presented in temporal synchrnism with the natural language.
Therefore we spoke here of a whole comprised of linguistic and non lin-
guistic signifiers that are engaged in the teaching-learning situations. These
situations rely on the interaction between the various symbolic systems which
understanding and give rise to specific problems : learning type, conceptual-
ization level and also the nature itself of the concept as described by Vygotski
in his opposition of everyday concepts and scientific concepts. Vergnaud’s the-
ory on conceptual fields taking in consideration the learning context and the
characteristics of the learning method appears to be of greater interest for the
interpretation of our results than the dual opposition proposed by Vygotsky.
Indeed it is clear from our results that school situations and non-school situ-
ations should be differentiated since they bring up distinct focus and level of
awareness.
372 Acioly-Régnier and Régnier
References
1. N.M. Acioly. A logica matemática no jogo do bicho: compreensão ou utilização
de regras? Master’s thesis, Université Fédérale de Pernambuco, Recife, Brazil,
1985.
2. N.M. Acioly. LA JUSTE MESURE : une étude des compétences mathématiques
des travailleurs de la canne à sucre du Nordeste du Brésil dans le domaine de
la mesure. PhD thesis, Université René Descartes, Paris V, 1994.
3. N.M. Acioly-Régnier. Analyse des compétences mathématiques de publics
adultes peu scolarisés et/ou peu qualifiés. In Illettrismes : quels chemins vers
l’écrit?, Les actes de l’université d’été du 8 au 12 juillet 1996, Lyon, France,
1997. Ed. Magnard.
4. G. Bachelard. La formation de l’esprit scientifique. Paris Lib. Vrin, 1938/1996.
5. G. Brousseau. Les obstacles épistémologiques et les problèmes en mathéma-
tiques. Recherches en Didactique des Mathématiques, 4(2):165–198, 1983.
6. J. Bruner. . . . Car la culture donne forme à l’esprit : de la révolution cognitive
à la psychologie culturell. Paris : Editions Eshel, 1991. Original title: Acts of
Meaning, Harvard University Press.
Identifying didactic and sociocultural obstacles to conceptualization 373
Acknowledgments
With thanks to Tim Evans, trainer, for his linguistic skills in his mother-
tongue, and to Pascale Montpied, researcher at the CNRS, whose skills in
English made several useful contributions. Without their help, the English
version of this article would not have been possible.
Appendix 1
NB: The codes of the binary variables are to be found in the questionnaire
[xxx]. The following table is reduced because of space. For Q1, Q2, Q3 and
the first part of Q6, a non-response is coded by the binary variable [xxnr]
Questionnaire
Q1.
Answers
A-Yes A-No B-Yes B-No C-Yes C-No C-Yes C-No
[1Asim] [1Anao] [1Bsim] [1Bnao] [1Csim] [1Cnao] [1Dsim] [1Dnao]
Fig. 13.
Q2.
If someone (from the other hemisphere) tells you that he/she has
never seen any of these moons in reality would you believe that
person? ( ) YES[2sim] ( ) NO [2nao]
If YES: who? ()A[2Asim]/[2Anao] ()B[2Bsim]/[2Bnao] ()C[2Csim]/[2Cnao]
D[2Dsim]/[2Dnao]
Why? If NO, Why not?
Q3.
Q4.
Q5.
diameter. If you get a small letter p (premier) its the first quarter; if you get
the small letter d its the last (dernier) quarter”.
What do you think of this technique? Is it:
( ) adapted [5ADAP] ( ) inadapted [5INAD]
( ) efficient [5EFFI] ( ) inefficient [5INEF] Why?
Fig. 14.
Q6.
A teacher explained that the way the phases of the moon phases are perceived
depends on the hemisphere. For example, it is not seem in the same way from
the southern or from the northern hemisphere. Do you agree with with this
point of view? ( ) YES [6sim] ( ) NO [6nao] Why? If yes: How is the moon seen
from these three places? From the northern hemisphere, the southern hemi-
sphere or the equator. [Use HS (South Hemisphere); HN (North Hemisphere)
or E (Equator) to describe the corresponding representations of the moon.
Representations
of the Moon
[6A_HN] [6B_HN] [6C_HN] [6D_HN]
Hemisphere [6A_E] [6B_E] [6C_E] [6D_E]
[6A_HS] [6B_HS] [6C_HS] [6D_HS]
Fig. 15.
Identifying didactic and sociocultural obstacles to conceptualization 377
Appendix 2
Appendix 3
Appendix 4
Einoshin Suzuki
Summary. In this paper, we point out four pitfalls for categorizations of objec-
tive interestingness measures for rule discovery. Rule discovery, which is extensively
studied in data mining, suffers from the problem of outputting a huge number of
rules. An objective interestingness measure can be used to estimate the potential
usefulness of a discovered rule based on the given data set thus hopefully serves as a
countermeasure to circumvent this problem. Various measures have been proposed,
resulting systematic attempts for categorizing such measures. We believe that such
attempts are subject to four kinds of pitfalls: data bias, rule bias, expert bias, and
search bias. The main objective of this paper is to issue an alert for the pitfalls
which are harmful to one of the most important research topics in data mining. We
also list desiderata in categorizing objective interestingness measures.
Key words: data bias, rule bias, expert bias, search bias, objective interestingness
measure, rule discovery
1 Introduction
Rule discovery is one of the most extensively studied research topics in data
mining as shown by the proliferation of discovery methods for finding useful
or interesting (i.e. potentially useful) rules [3,5,8,29,34–38]. Such methods can
be classified either objective or subjective [32]. We define that an objective
method evaluates the interestingness of a rule based on the given data set
while a subjective method relies on additional information typically provided
by the user in the form of domain knowledge.
A subjective method often models interestingness more appropriately than
an objective method but cannot be applied to domains where little or no
additional information is available. A subjective method is also prone to
overlooking useful knowledge due to inappropriate use of domain knowledge.
An objective method is free from these problems and poses no cost to the
user for preparing subjective information. In the objective approach, many
The objective of rule discovery is to obtain a set Π of rules given data D and
additional information α. Here α typically represents domain-specific criteria
such as expected profit or domain knowledge and an element of Π represents
a rule π. As we focus on the objective approach, we assume α = ∅ in this
paper.
The case in which D represents a table, alternatively stated “flat” data, is
most extensively studied in data mining. On the other hand, the case when
data D represent structured data such as time-series data and text data typ-
ically necessitates a procedure for handling such a structure (e.g. [10, 22]).
In order to focus on the interestingness aspect, we limit our attention to the
Pitfalls for Categorizations of Objective Interestingness Measures 385
1
Intensity of implication is an excellent measure which can take the size of the
data set into account. We, however, explain J-measure here because intensity of
implication is explained many times in this book.
2
These assumptions are not necessary but help understanding of a wide range of
readers.
386 E. Suzuki
X P (x|y)
j(X; Y = y) = P (x|y) log2 (1)
x
P (x)
P (x|y) P (x|y)
= P (x|y) log2 + P (x|y) log2 (2)
P (x) P (x)
(2) is derived due to the nature of rule discovery i.e. it suffices to consider
the events of the conclusion X = x and its negation X = x. In [34], the
interestingness of the rule Y = y → X = x is evaluated with the average
information content of the rule under the name of the J-measure J(X; Y = y).
assess the ratio of the generated tables for all possible tables. Anyway, it
is impossible to know the rule set to be evaluated in practice. [9] selects 9
rules out of all rules to be shown to a domain expert for each data set for
investigating the subjective interestingness of the domain expert3 : 3 rules
with the lowest rank, 3 rules in the middle rank, and 3 rules with the highest
rank for each interestingness measure. This method serves as discriminating
3 typical situations but questions remain if it makes sense to choose only 9
rules out of hundreds or thousands of discovered rules. It seems impossible to
avoid rule bias in the categorization but we will show several countermeasures
for this problem in section 5.
Some of the categorization research ask domain experts to evaluate the degree
of interestingness of discovered rules from their subjective viewpoints and
try to obtain the ground truth of interestingness measures. We agree that
subjective interestingness of domain experts is more important than objective
interestingness in data mining but warn that the results depend on the domain
experts. For instance, [28] asks a single medical expert to evaluate discovered
rules. As the authors point out, any conclusions derived in the paper might
be specific to the domain expert. We have collaborated with medical experts
including the expert in related mining problems and have observed many
discrepancies of opinions among the domain experts [19–21].
It is well-know that domain experts tend to have different opinions for
a non-trivial technical problem, for which rule discovery is expected to be
effective. For instance, [33] explains a situation in which domain experts have
different opinions in classifying volcanoes on Venus and proposes a method for
calculating the minimal probability that domain experts are wrong. It seems
3
We will see this issue in section 4.3.
390 E. Suzuki
that resolving such discrepancies seems impossible: that is why the domain
experts are conducting research.
The authors of [9] point out the weaknesses of [28] and employ eight data
sets in the experiments. They clearly show that the performance of objective
rule interestingness measures varies in eight data sets that they employed.
Moreover, [9] tries to neutralize the effects of rule discovery methods by em-
ploying five classification algorithms. Due to the numerous settings, however,
they could employ one domain expert for each data set and in each case the
domain expert evaluated nine rules (i.e. three rules with the lowest rank, three
rules in the middle rank, and three rules with the highest rank) for each in-
terestingness measure. It is obvious that [9] tolerates some of the problems
of [28] but could not escape from the danger of giving false conclusions to the
reader. We will show several countermeasures for this problem in section 5.
We are aware that most of the papers that we cited as examples of the cat-
egorization research have another objective. For instance, the main objective
of [18] is to develop a tool for selecting the right interestingness measure with
clustering. It is highly recommended to consider the context in which the ex-
periments were performed and not to take a categorization result of objective
rule interestingness measures as a truth. In other words, we should respect the
Pitfalls for Categorizations of Objective Interestingness Measures 391
context of the paper. However, some of the readers might take a categoriza-
tion result of interestingness measures as a truth. The countermeasure is to
emphasize the true objective the paper and state that a categorization result
represents an example under some conditions.
We have a comment on [3] which is often criticized by researchers in the
“interestingness” community for its two overly simple measures i.e. support
and confidence. The main objective of [3] is to develop fast algorithms for disk-
resident data. As typical researchers in the database community, the authors
assume that the user will select his/her interesting rules with queries thus
support and confidence serve as indexes for a pre-screening. In this sense, [3]
shows another example that we should respect the true objective of a paper.
We are aware of this fact and respect the categorization papers as they have
other objectives.
Any papers in the categorization research should describe how much they
avoided the four biases in showing experimental results in an objective man-
ner. For instance, generating rules to be evaluated randomly is a countermea-
sure for the rule bias. In such a case, the authors should state to what extent
they explored possible kinds of rules under what assumptions (e.g. equal prior
for all rules) they generated the rules. Another typical example is to test
various values for parameters to avoid search bias. For example, a typical
clustering method necessitates a specification of several parameters such as
the number of clusters and a threshold for terminating search. Authors who
employ clustering in a similar manner to [18] should state what kinds of values
they explored under what assumptions.
Data mining as well as related research fields including machine learning
and artificial intelligence cope with ill-structured problems, which have no
clear solution. A typical first step to such kinds of problems is to accumu-
late empirical evidence by analyzing various experimental results then draw
general conclusions. We believe that currently no general empirical evidence
on categorization of objective rule interestingness measures exist despite the
numerous attempts [1, 2, 9, 18, 28, 30, 39].
The results derived in the attempts are special because they heavily de-
pend on at least one of the data sets, discovered rules, domain experts, rule
discovery methods, and analysis methods that they employed. There are at-
tempts to tolerate the effects by employing countermeasures such as random
experiments and multiple methods/experts [9,39] but their effectiveness is lim-
ited due to the huge number of possible choices. We admit that many of the
papers provide reasons for experimental results but we also noticed that such
reasons never consider all the four factors on which the results depend. Sadly
to say, the conclusions derived in the papers such as “34 objective interesting-
ness measures can be classified into 10 clusters” and “recall, Jaccard, kappa,
CST, χ2 -M, and peculiarity demonstrated the highest performance” are not
392 E. Suzuki
The objective of machine learning is to realize software which improves its per-
formance by a procedure called learning [27]. We have noticed that a typical
researcher in data mining from the machine learning community is interested
in human factors especially interestingness. At the same time, we are aware
that it is (practically) impossible to realize human intelligence with software
as the proliferation of weak AI shows [31]. Any scientific interest should be
respected but at the same time the reality should be recognized. We should
fight against the illusion that an objective interestingness measure is omnipo-
tent, i.e. it can estimate the subjective interestingness of a discovered rule by
domain experts in general.
Some of the papers [9, 18, 28, 39] try to convey this idea but at the same
time they give the illusion of omnipotence by showing their results on the cate-
gorization. We believe that an objective interestingness measure can just serve
as a naive filter for removing unpromising discovered rules thus should not
be expected to select interesting rules automatically. Classical papers in data
mining [3,30] seem to be aware of the danger and are much more conservative
than recent papers.
As we stated, [3] employs two overly simple measures i.e. support and
confidence, which serve as a measure for a pre-screening. This paper shows
a typical situation that objective rule interestingness measures are used. The
myth of omnipotent measures should be abandoned.
6 Conclusions
References
1. H. Abe, S. Tsumoto, M. Ohsaki, and T. Yamaguchi. A Rule Evaluation Support
Method with Learning Models Based on Objective Rule Evaluation Indexes.
In Proc. Fifth IEEE International Conference on Data Mining (ICDM), pages
549–552. 2005.
2. H. Abe, S. Tsumoto, M. Ohsaki, and T. Yamaguchi. Evaluating a Rule Evalua-
tion Support Method Based on Objective Rule Evaluation Indices. In Advances
in Knowledge Discovery and Data Mining (PAKDD), pages 509–519. 2006.
3. R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, and A. I. Verkamo. Fast
Discovery of Association Rules. In Advances in Knowledge Discovery and Data
Mining, pages 307–328. AAAI/MIT Press, Menlo Park, Calif., 1996.
4. J.-P. Barthélemy, A. Legrain, P. Lenca, and B. Vaillant. Aggregation of Valued
Relations Applied to Association Rule Interestingness Measures. In Modeling
Decisions for Artificial Intelligence, LNCS 3885 (MDAI), pages 203–214. 2006.
394 E. Suzuki
Key words: Classification tree, Implication strength, Class assignment, Rule rele-
vance, Profile typicality, Targeting
1 Introduction
G. Ritschard et al.: Inducing and Evaluating Classification Trees with Statistical Implicative
Criteria, Studies in Computational Intelligence (SCI) 127, 397–419 (2008)
www.springerlink.com © Springer-Verlag Berlin Heidelberg 2008
398 G. Ritschard et al.
Hence, it is too strong and leaves no place for dealing with the random con-
tent of statistical relationships. On the other hand, the classical confidence,
which measures the chances of matching the conclusion when the condition is
satisfied, is not able to tell us whether or not the conclusion is more probable
than it would in case of independence from the condition. For instance, as-
sume that the conclusion B is true for 95% of all the cases. Then, a rule with
a confidence of 90% would do worse than simple chance, i.e. than deciding
that B is true for all cases without taking care of the condition A. But why
looking at counter-examples and not just at positive examples? Indeed, this
is formally equivalent (see Section 2.2), and hence is just a matter of taste.
Looking for the rarity of counter-examples makes the reasoning closer to what
is done with logic rules, i.e. invalidating the rule when there are (too many)
negative examples.
Though, as we will show, this concept of strength of implication is applica-
ble in a straightforward manner to classification rules, only a little attention
has been paid to this appealing idea in the framework of supervised learning.
The aim of this article is to discuss the scope and limits of implicative sta-
tistics for supervised classification and especially for classification trees. One
difference between classification rules and association rules is that the conse-
quent of the former has to be chosen from an a priori set list of classes (the
possible states of the response variable), while the consequent for the latter
can concern any event not involved in the premise, since there is no a priori
outcome variable. A second difference is that unlike the premises of associa-
tion rules, those of a set of classification rules define a partition of the data
set, meaning that there is one and only one rule applicable to each case. These
aspects, however, do not intervene in anyway in the definition of the implica-
tion index which just requires a premise and a consequent. Hence, implication
indexes are technically applicable without restrictions to classification rules.
There remains, nevertheless, the question of whether they make sense in the
supervised learning setting.
The implication index measures how typical the condition of the rule is for
the conclusion, i.e. how much more characteristic than pure chance it is for the
selected conclusion. Indeed, we are only interested in conditions under which
the probability to match the conclusion is higher than the marginal proportion
corresponding to pure chance. A condition with a probability lower than the
marginal proportion would characterize atypical situations for the conclusion,
i.e. situations in which the proportion of cases matching the conclusion is less
than in the whole data set. It would thus be characteristic of the negation
of the conclusion, not the conclusion itself. Looking at typical conditions for
the negation of the conclusion could be useful too. Nevertheless, it does not
require any special attention since it can simply be handled by looking at
the implication strength of the rule in which we would have replaced the
conclusion by its negation.
The information on the gain of performance over chance provided by the
implication index usefully complements the knowledge provided for instance
Statistical Implicative Criteria for Classification Trees 399
Classification rules can be induced from data using classification trees in two
steps. First, the tree is grown by seeking, through recursive splits of the learn-
ing data set, some optimal partition of the predictor space for predicting the
400 G. Ritschard et al.
outcome class. Each split is done according to the values of one predictor. The
process is greedy. It starts by trying all predictors to find the “best” split of
the whole learning data set. Then, the process is repeated at each new node
until some stopping criterion becomes true. In a second step, once the tree
is grown, classification rules are derived by choosing the most relevant value,
usually the majority class (the most frequent), in each leaf (terminal node) of
the tree.
Figure 1 shows the tree induced with the CHAID method [11], using a
5% significance level and a minimal node size fixed at 20. The same tree is
obtained with CART [4] using a minimal .02 gain value. The three numbers in
each node represent the counts of individuals who are respectively ‘married’,
1 2 0 m a r r ie d
0 1 2 0 s in g le
3 3 d iv o r c e d /w id o w e d
m a le fe m a le
s e x
9 6 2 4
1 2 2 2 9 8
2 3 1 0
n o n te r tia r y te r tia r y p r im a r y n o n p r im a r y
s e c to r s e c to r
9 0 6 0 2 4
3 1 0 4 1 2 5 5 0 6 4 8
1 3 1 0 6 4
Fig. 1. Example: Induced tree for civil status (married, single, divorced/widowed)
Statistical Implicative Criteria for Classification Trees 401
Man Woman
primary or secondary
Civil Status secondary tertiary primary or tertiary Total
Married 90 6 0 24 120
Single 10 12 50 48 120
Div./Widowed 13 10 6 4 33
Total 113 28 56 76 273
Table 2. Table associated to the induced tree
‘single’, and ‘divorced or widowed’. The tree partitions the predictor space
into groups such that the distribution of the outcome variable, the civil status,
differs as much as possible from one group to the other. For our discussion,
it is convenient to represent the four resulting distributions into a table that
cross classifies the outcome variable with the set of profiles (the premises of
the rules) defined by the branches. Table 2 is thus associated to the tree of
Figure 1.
As mentioned, classification rules are usually derived from the tree by
assigning the majority class of the leaf to the branch that leads to it. For
example, a man working in the secondary sector belongs to leaf 3 and will
be classified as married, while a man of the tertiary sector (leaf 4) will be
classified as single. In Table 2, the column headings define the premises of the
rules, the conclusion being given, for each column, by the row containing the
greatest count. Using this approach, the four following rules are derived from
the tree shown in Figure 1:
R1: Man of primary or secondary sector ⇒ married
R2: Man of tertiary sector ⇒ single
R3: Woman of primary sector ⇒ single
R2: Woman of secondary or tertiary sector ⇒ single
In contrast to association rules, classification rules have the following char-
acteristics: i) The conclusions of the rules can only be values (classes) of the
outcome variable, and ii) the premises of the rules are mutually exclusive and
define a partition of the predictor space. Nonetheless, they are rules and we
can then apply to them concepts such as support, confidence and, which is
here our concern, implication indexes.
The index of implication (see for instance [6] p 19) of a rule is defined from the
number of counter-examples, i.e. of cases that match the premise but not the
conclusion. In our case, for each leaf (represented by a column in Table 2),
the count of counter-examples is the number of cases that are not in the
majority class. Letting b denote the conclusion (row of the table) of rule j
and nbj the maximum in the jth column, the number of counter-examples
402 G. Ritschard et al.
which can also be expressed in p terms of the number of cases matching the
rules as Imp(j) = −(nbj − nebj )/ n·j − nebj .
Let us make the calculation of the index explicit for our example. We define
for that the variable “predicted class”, denoted cpred, which takes value 1 for
each case (example) belonging to the majority class of its leaf and 0 otherwise
(counter-example). By cross-classifying this variable with the premises of the
rules, we get Table 3 where the first row gives the number nb̄j of counter-
examples for each rule and the second row the number nbj of examples.
Likewise, Table 4 gives the expected numbers neb̄j and nebj of negative ex-
amples (counter-examples) and positive examples obtained by distributing
the nj· covered cases according to the marginal distribution. Note that these
counts cannot be computed from the margins of Table 3. They are obtained by
first dispatching the column total using the marginal distribution of Table 2
and then separately aggregating each resulting column according to its corre-
sponding observed majority class (not the expected one!). This explains why
Tables 3 and 4 do not have the same right margin.
From these two tables, we can easily get the implication indexes using
formula (1). They are reported in the first row of Table 5. For the first rule,
the index equals Imp(1) = −5.068. This negative value indicates that the
number of observed counter-examples is less than the number expected under
the independence hypothesis, which stresses the relevance of the rule. For the
second rule, the implication index is positive, which tells us that the rule is
Statistical Implicative Criteria for Classification Trees 403
Man Woman
Predicted class primary or secondary
cpred secondary tertiary primary or tertiary Total
0 (counter-example) 23 16 6 28 73
1 (example) 90 12 50 48 200
Total 113 28 56 76 273
Table 3. Observed numbers nb̄j and nbj of counter-examples and examples
Man Woman
Predicted class primary or secondary
cpred secondary tertiary primary or tertiary Total
0 (counter-example) 63.33 15.69 31.38 42.59 153
1 (example) 49.67 12.31 24.62 33.41 120
Total 113 28 56 76 273
Table 4. Expected numbers neb̄j and nebj of counter-examples and examples
less powerful than pure chance since it generates more counter-examples than
would classifying without taking account of the condition.
In its formulation (1), the implication index looks like a standardized residual,
namely as the (signed square root of) the contribution to the Pearson Chi-
square (see for example [1] p 224). The implication index is indeed related
to the Chi-square that measures the divergence between Tables 3 and 4. The
contributions of each cell to this Chi-square are depicted in Table 5, those of
the first row being the implication indexes.
This interpretation of Gras’ implication index in terms of residuals (resid-
uals for the fitting of the counts of counter-examples by the independence
model) suggests that other forms of residuals used in the framework of the
modeling of the counts in multiway contingency tables could also prove useful
for measuring the strength of rules. These include:
Man Woman
Predicted class primary or secondary
cpred secondary tertiary primary or tertiary
0 (counter-example) -5.068 0.078 -4.531 -2.236
1 (example) 5.722 -0.088 5.116 2.525
Table 5. Contributions to the Chi-square measuring divergence between Tables 3
and 4
404 G. Ritschard et al.
q
The deviance residual , resd (j) = sign(nb̄j − neb̄j ) |2nb̄j log(nb̄j /neb̄j )|, which
is the square root of the contribution (in absolute value) to the likelihood
ratio Chi-square ( [2] pp 136–137).
√ q
Freeman-Tukey’s residual , resF T (j) = nb̄j + 1 + nb̄j − 4neb̄j + 1, which
p
Table 6 exhibits the values of these alternative implication indexes for each
of the four rules derived from the tree in Figure 1. We observe that they are
concordant as expected. The standardized residual is known to have a variance
that may be lower than one. This is because the counts nb· and n·j are sample
dependent and hence themselves random. Thus neb̄j is only an estimation of
the Poisson parameter. Ignoring the randomness of the denominator in for-
mula (1) leads to underestimating the strength. The deviance, adjusted and
Freeman-Tukey’s residuals are better suited for this situation and are known
to have in practice a distribution closer to the standard normal N (0, 1) than
the simple standardized residual. We can see in our example that the stan-
dardized residual, i.e. Gras’ implication index, tends to give lower absolute
values than the three alternatives. The only exception is rule R3, for which
the deviance residual provides a slightly smaller value than Gras’ index. Note
that R3 admits only six counter-examples.
Statistical Implicative Criteria for Classification Trees 405
The implication intensity and its variants are useful for validating each classi-
fication rule individually. This knowledge enriches the usual global validation
of the classifier. For example, among the four rules issued from our illustrative
tree, rules R1, R3 and R4 are clearly relevant, while R2, with an implication
intensity below 50% should be rejected.
406 G. Ritschard et al.
The question is then what shall we do with the cases covered by the con-
ditions of irrelevant rules. Two solutions can be envisaged: i) Merging cases
covered by an irrelevant rule with another rule, or ii) changing the conclusion.
The possible choice of a more suitable conclusion is discussed in Section 4.1.
We exclude indeed further splitting of the node, since we assume that a stop-
ping criterion has been matched. As for the merging of rules, if we want to
respect the tree structure we have indeed to merge cases of a leaf with those
of a sibling leaf, which is equivalent to pruning the corresponding branch. In
our example, this leads to merging rules R1 and R2 into a new rule “Man ⇒
married”. Residuals for the number of counter-examples of this new rules are
respectively ress = −3.8, resd = −7.1, resF T = −4.3 and resa = −8.3. Ex-
cept for the deviance residual, they exhibit a slight deterioration as compared
to the implicative strength of rule R1.
It is interesting here to compare the implicative quality with the error rate
used for validating classification rules. The number of counter-examples con-
sidered is precisely the number of errors produced by the rule on the learning
set. The error rate is thus the percentage of counter-examples among the cases
covered by the rule, i.e. err(j) = nb̄j /n·j , which is also equal to 1 − nbj /n·j ,
the complement to one of the confidence. The error rate suffers that from
the same drawbacks as the confidence. For instance, it does not tell us how
better the rule does than a classification done independently of any condition.
Furthermore, the error rate is linked with the choice of the majority class as
conclusion. For our example, the error rate is respectively for our four rules
0.2, 0.57, 0.11 and 0.36. The second rule is thus also the worst from this point
of view. Comparing with the error rate at the root node, which is 0.56, shows
that this rate of 0.57 is very bad. Thus, for being really informative about the
relevance of the rule, the error rate should be compared with the error rate of
some naive baseline rule. This is exactly what the implication index does. Re-
sorting to implication indexes, we get in addition probabilities which permits
to distinguish between statistically significant and non significant relevance.
Practically, in order to detect over-fitting, error rates are computed on
validation data sets or through cross validation. Indeed, the same can be
done for the implication quality by computing the implication indexes and
intensities in generalization.
Alternatively, we could consider, in the spirit of the BIC (Bayesian in-
formation criteria) or MDL (Minimum message length) principle, to penalize
the implication index by the complexity of the condition. Since the lower the
implication index of a rule j, the better it is, the index should be penalized by
the length kj of the branch that defines the condition of rule j. The general
idea behind such penalization is that the simpler the condition, the lower the
risk to assign a bad distribution to a case. As a first proposal we suggest the
following penalized form inspired from the BIC [14] and based on the deviance
residual
q
Imppen (j) = resd (j) + kj ln(nj ) .
Statistical Implicative Criteria for Classification Trees 407
For our example, the values of the penalized index are given in Table 3.
These penalized values confirm the ranking of the initial rules, which here
all have the same length kj = 2. In addition, the penalized index is useful for
validating results of merging the two rules R1 and R2. Table 3 highlights the
superiority of the merged rule “Man ⇒ married” over both rules R1 and R2.
It gives a clear signal in favor of merging.
At the root node, both the residual and the number of conditions are zero.
Hence, the penalized implication index is zero too. Thus, a positive penalized
implication index suggests that we can hardly expect that the rule would do
better in generalization than assigning randomly the cases according to the
root node distribution, i.e. independently of any condition. For our example,
this confirms once again the badness of rule R2.
Indexes Intensity
Residual married single div./wid. married single div./wid.
Standardized ress 1.6 0.1 -1.3 0.043 0.419 0.891
Deviance resd 3.9 0.8 -3.4 0.000 0.099 0.999
Freeman-Tukey resF T 1.5 0.1 -1.4 0.054 0.398 0.895
Adjusted resa 2.4 0.1 -2.0 0.005 0.379 0.968
Table 9. Implication indexes and intensities of rule R2 for each possible conclusion
Let us now look at the tree growing procedure and assume that the rule
conclusions are selected so as to maximize the implication strength of the
rules. The question is whether there is a way to split a node so as to maximize
the strength of the resulting rules. The difficulty here is that a split results
indeed in more than one rule. Hence, we face a multicriteria problem, namely
the maximization over sets of implication strengths.
To get simple solutions, one can think to transform the multidimensional
optimization problem into a one dimensional one by focusing on some aggre-
gated criterion. The following are three possibilities:
• A weighted average of the concerned optimal implication indexes, taking
weights proportional to the number of concerned cases.
• The maximum over the strengths of the rules belonging to the set.
• The minimum over the strengths of the rules belonging to the set.
The first criterion is of interest when the goal is to achieve good strengths on
average. The second one should be adopted when we look for a few rules with
high implication strengths without bothering too much for the other ones,
and the latter is of interest when we want the highest possible implication
strength for the poorest rule.
We have not yet experimented tree growing with these criteria. It is worth-
while however to say that, from the typical profile paradigm standpoint meth-
ods such as CHAID that attempt to maximize association seem preferable to
those based on entropies. Indeed, maximizing the strength of association be-
tween the resulting nodes and the outcome variable leads to distributions
410 G. Ritschard et al.
that departure as much as possible from that in the parent node, and hence
from that of the root node corresponding to independence. We may thus ex-
pect the most significant departures from independence and hence rules with
strong implication strength. Methods based on entropy measures, on the other
hand, favor departures from the uniform, or equiprobable, distribution and are
therefore more in line with the classification standpoint.
5 Experimental Results
We present here a series of experimental results that provide additional in-
sights into the behavior and scope of the original implication index and the
three variants we introduced. First, we study the behavior of the indexes.
We then present an application, which also serves as a basis for experimental
investigations regarding the effect of the continuity correction and the conse-
quences of using maximal implication strength rules instead of the majority
rule on classification accuracy, recall and precision.
standard
-20
F-T
adjusted
-30
deviance
-40
-50
0 20 40 60 80 100
step
0
implication index value
standard
F-T
-10
adjusted
deviance
-20
0 20 40 60 80 100
step
0
implication index value
standard
F-T
adjusted
deviance
-10
0 20 40 60 80 100
step
Fig. 2. Behavior of the 4 indexes between independence (Step 0) and purity (Step
100). Values reported include the continuity correction.
412 G. Ritschard et al.
We consider administrative data about the 762 first year students who were
enrolled in fall 1998 at the Faculty of Economic and Social Sciences (ESS)
of the University of Geneva [13]. The goal is to learn rules for predicting the
situation (1. eliminated, 2. repeating first year, 3. passed) of each student
after the first year, or more precisely to discover the typical profile of those
students who are either eliminated or have to repeat their first year. For the
learning data, the response variable is thus the student situation in October
1999. The predictors retained are age, first time registered at University of
Geneva, chosen orientation (Social Sciences or Business and Economics), type
of secondary diploma achieved (classic, latin, scientific, economics, modern,
other), place where secondary diploma was obtained (Geneva, Switzerland
outside Geneva, Abroad), age when secondary diploma was obtained, nation-
ality (Geneva, Swiss except Geneva, Europe, Non Europe) and mother’s living
place (Geneva, Switzerland outside Geneva, Abroad).
Figure 3 shows the tree induced using CHAID with minimal node size set
to 30, minimal parent node size to 50 and a maximal 5% significance for the
Chi-square. Table 10 provides the details regarding the counts in the leafs.
Here, our interest is not in the growing procedure, but rather in the state
assigned to each leaf.
Leaf 6 7 8 9 10 11 12 13 14 Total
1 eliminated 2 17 22 56 31 16 20 18 27 209
2 repeating 1 13 15 48 10 8 16 14 5 130
3 passed 35 87 55 143 28 9 48 12 6 423
Total 38 117 92 247 69 33 84 44 38 762
Table 10. Details about the content of the leafs in Figure 3
Leaf 6 7 8 9 10 11 12 13 14
Majority class 3 3 3 3 1 1 3 1 1
Standardized residual 3 3 3 3 1 1 3 2 1
Freeman-Tukey residual 3 3 3 3 1 1 2 2 1
Deviance residual 3 3 3 2 1 1 2 2 1
Adjusted residual 3 3 3 2 1 1 2 2 1
Table 11. State assigned by the various criteria
percentages.
R o o t
2 7 .4 e lim in a te d
1 7 .1 r e p e a tin g 1 s t y e a r
5 5 .5 p a s s e d
n = 6 7 2
T y p e o f s e c o n d a r y d ip lo m a
E n g in e e r ,A b r o a d ,O th e r C la s s ic ,L a tin ,S c ie n tific E c o n o m ic ,M o d e r n ,M is s in g
1 2 3
4 0 .7 1 6 .6 2 7 .5
2 1 .6 1 1 .7 1 8 .4
3 7 .7 7 1 .7 5 4 .1
n = 1 9 9 n = 2 4 7 n = 3 1 6
N a tio n a lity A g e a t s e c o n d a r y d ip lo m a A g e a t s e c o n d a r y d ip lo m a
S w itz e r la n d , G e n e v a ,
E u ro p e N o n E u ro p e 1 8 o r y o u n g e r 1 9 2 0 o r o ld e r 2 0 o r y o u n g e r 2 1 o r o ld e r
4 5 6 7 8 9 1 0
3 0 .8 5 4 .9 5 .3 1 4 .5 2 3 .9 2 2 .7 4 4 .9
2 0 .5 2 3 .2 2 .6 1 1 .1 1 6 .3 1 9 .4 1 4 .5
4 8 .7 2 1 .9 9 2 .1 7 4 .4 5 9 .8 5 7 .9 4 0 .6
n = 1 1 7 n = 8 2 n = 3 8 n = 1 1 7 n = 9 2 n = 2 4 7 n = 6 9
F ir s t tim e r e g is te r e d O r ie n ta tio n
B u s in e s s a n d S o c ia l
N o Y e s e c o n o m ic s s c ie n c e s
1 1 1 2 1 3 1 4
4 8 .5 2 3 .8 4 0 .9 7 1 .0
2 4 .2 1 9 .1 3 1 .8 1 3 .2
2 7 .3 5 7 .1 2 7 .3 1 5 .8
n = 3 3 n = 8 4 n = 4 4 n = 3 8
Statistical Implicative Criteria for Classification Trees
top to down: eliminated, repeating 1st year, passed. Figures next to the bars are
Fig. 3. CHAID induced tree for the ESS Student data. Outcome states are from
413
414 G. Ritschard et al.
We used successively the majority class rule and each of the four variants
of implication indexes for that. Table 11 reports the results. We can see that
the 5 methods agree for 6 out of the 9 leaves. The conclusion assigned to
leaves number 9, 12 and 13 vary, however, among the 5 methods. All four
implication indexes assign state 2, “repeating the first year”, to leaf 13 where
the majority class is 1, “eliminated”. This tells us that belonging to this leaf,
i.e having a not typical Swiss college secondary diploma obtained either in
Geneva or abroad and having chosen a business and economic orientation, is
a typical profile of those who repeat their first year. And this holds, indeed,
despite “repeating the first year” is not the majority class of the leaf.
The deviance and adjusted residuals agree about assigning also state 2,
“repeating”, to leaves number 9 and 12, and the Freeman-Tukey residual agrees
also with this conclusion for leaf 12. These leaves also define characteristic
profiles of those who repeat their first year, even though the majority class
for these profiles is “passed”.
In terms of the overall error rate, selecting the majority class is no doubt
the better choice. However, if we are interested in the recall rate, i.e. in
the proportion of cases with a given output value ck that are detected as
having this value, we may expect the implication indexes to outperform the
majority rule for infrequent classes. Indeed, highly infrequent outcome states
have high chances to never be selected as conclusion by the majority rule. We
may therefore expect low recall for them when we select the most frequent
class as conclusion. Regarding precision, i.e. the proportion of cases classified
as having a value ck that effectively have this value, expectations are less
Statistical Implicative Criteria for Classification Trees 415
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
majority standard adjusted deviance FT
clear since the relationship between the numerator and denominator seems
not linked to the way of choosing the conclusion.
In order to verify these expectations on our ESS student data, we computed
for the majority rule and each of the four variants of implication indexes, the
10-fold cross-validation (CV) values of the overall good classification rate, as
well as of the recall and precision for each of the three outcome states. As can
be shown on Figure 4 the loss in accuracy that results from using maximal
implication rules lies between 12% for the adjusted residual and 10% for the
standard residual.
Figure 5 exhibits the CV recall rates obtained for each of the three states.
They confirm our expectations: selecting the conclusion according to implica-
tion indexes deteriorates the recall for the majority class “passed”, but results
0.9 0.9
0.8 0.8
0.7 0.7
0.6 0.6
0.5 0.5
0.4 0.4
0.3 0.3
0.2 0.2
0.1 0.1
0 0
(c) eliminated
0.7 0.7
0.6 0.6
0.5 0.5
0.4 0.4
0.3 0.3
0.2 0.2
0.1 0.1
0 0
majority standard adjusted deviance FT majority standard adjusted deviance FT
0.6
0.5
0.4
0.3
0.2
0.1
0
majority standard adjusted deviance FT
(c) eliminated
6 Conclusion
The aim of this article was to demonstrate the usefulness of the concept of im-
plication strength for rules derived from induced decision trees. We have shown
that Gras’ implication index can be applied in a straightforward manner to
classification rules and have proposed three alternatives inspired from resid-
uals used in the statistical modeling of multiway contingency tables, namely
the deviance, adjusted and Freeman-Tukey residuals. As for the scope of the
implication indexes we have successively discussed their use for evaluating
individual rules, for selecting the conclusion of the rule and as criteria for
Statistical Implicative Criteria for Classification Trees 417
growing trees. We have stressed that implication indexes are a valuable com-
plement to classical error rates as validation tools. They are especially inter-
esting in a targeting framework where the aim is to determine the typical
profile that leads to a conclusion rather than classifying individual cases. As
criteria for selecting the conclusion, they may be a useful alternative to the
majority rule in the case of imbalanced data. Their advantage is that in such
imbalanced situation and unlike decisions based on the majority class, they
favor conclusion diversity among rules as well as recall for poorly represented
classes.
Four variants of implication indexes have been discussed. Which one should
we use? The simulation study of their behavior has shown that the deviance
residual curiously diminishes when the number of counter-examples tends to
zero and should therefore be disregarded. The standard residual (Gras’ in-
dex) and Haberman’s adjusted residual both evolve linearly between indepen-
dence and purity and thus seem to be the better choices. From the theoretical
standpoint, if we want to compare the values with thresholds of the standard
normal, Haberman’s adjusted residual is preferable.
We have also introduced the implication intensity as the probability to
get by chance more counter-examples than observed. This is indeed just a
monotonic transformation of the corresponding implication index. Hence rank-
ings based on the indexes or on the intensities will necessarily agree. Indexes
seem better suited, however, to distinguishing between situations with high
implication strengths. The intensities on the other hand, provide additional
information about the statistical significance of the implication strength.
It is worth mentioning that, to our knowledge, implication indexes have not
so far been implemented in tree growing software. Making them available is
essential for popularizing them. We have begun working on implementing the
maximal implication selection process and tree growing algorithms based on
implication criteria into Tanagra [15] a free open source data mining software,
and plan also to make these tools available in Weka.
Beside this implementation task, there are some other issues that would
merit further investigation. For instance, the penalized implication index we
proposed in Section 3 is not completely satisfactory. In a n-arry tree the paths
to the leaves are usually shorter than in a binary tree, even if they define the
same leaves. Penalization based on the length of the path as we proposed,
would therefore be different for a rule derived from a binary tree than for the
same rule derived from a n-arry tree. The use of implication criteria in the
tree growing process needs also a deeper reflection.
Despite all which remains to be done, our hope is that this article will
contribute to enlarge both the scope of induced decision trees and that of
implication statistics.
418 G. Ritschard et al.
References
1. Alan Agresti. Categorical Data Analysis. Wiley, New York, 1990.
2. Yvonne M. M. Bishop, Stephen E. Fienberg, and Paul W. Holland. Discrete
Multivariate Analysis. MIT Press, Cambridge MA, 1975.
3. Julien Blanchard, Fabrice Guillet, Régis Gras, and Henri Briand. Using
information-theoretic measures to assess association rule interestingness. In
Proceedings of the 5th IEEE International Conference on Data Mining (ICDM
2005), pages 66–73. IEEE Computer Society, 2005.
4. L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone. Classification And
Regression Trees. Chapman and Hall, New York, 1984.
5. Henri Briand, Laurent Fleury, Régis Gras, Yann Masson, and Jacques Philippe.
A statistical measure of rules strength for machine learning. In Proceedings
of the Second World Conference on the Fundamentals of Artificial Intelligence
(WOCFAI 1995), pages 51–62, Paris, 1995. Angkor.
6. R. Gras, R. Couturier, J. Blanchard, H. Briand, P. Kuntz, and P. Peter.
Quelques critères pour une mesure de qualité de règles d’association. Revue
des nouvelles technologies de l’information RNTI, E-1:3–30, 2004.
7. R. Gras and A. Larher. L’implication statistique, une nouvelle méthode
d’analyse de données. Mathématique, Informatique et Sciences Humaines,
(120):5–31, 1992.
8. R. Gras and H. Ratsima-Rajohn. L’implication statistique, une nouvelle méth-
ode d’analyse de données. RAIRO Recherche Opérationnelle, 30(3):217–232,
1996.
9. Régis Gras. Contribution à l’étude expérimentale et à l’analyse de certaines ac-
quisitions cognitives et de certains objectifs didactiques. Thèse d’état, Université
de Rennes 1, France, 1979.
10. Sylvie Guillaume, Fabrice Guillet, and Jacques Philippe. Improving the discov-
ery of association rules with intensity of implication. In Jan M. Zytkow and
Mohamed Quafafou, editors, Proceedings of the Eurpoean Conference on Prin-
ciples of Data Mining and Knowledge Discovery (PKDD 1998), volume 1510 of
Lecture Notes in Computer Science, pages 318–327. Springer, 1998.
11. G. V. Kass. An exploratory technique for investigating large quantities of cate-
gorical data. Applied Statistics, 29(2):119–127, 1980.
12. I. C. Lerman, R. Gras, and H. Rostam. Elaboration d’un indice d’implication
pour données binaires I. Mathématiques et sciences humaines, (74):5–35, 1981.
13. Claire Petroff, Anne-Marie Bettex, and Andràs Korffy. Itinéraires d’étudiants
à la Faculté des sciences économiques et sociales: le premier cycle. Technical
report, Université de Genève, Faculté SES, Juin 2001.
14. Adrian E. Raftery. Bayesian model selection in social research. In P. Marsden,
editor, Sociological Methodology, pages 111–163. The American Sociological As-
sociation, Washington, DC, 1995.
15. Ricco Rakotomalala. Tanagra : un logiciel gratuit pour l’enseignement et la
recherche. In Suzanne Pinson and Nicole Vincent, editors, Extraction et Gestion
des Connaissances (EGC 2005), volume E-3 of Revue des nouvelles technologies
de l’information RNTI, pages 697–702. Cépaduès, 2005.
16. Einoshin Suzuki and Yves Kodratoff. Discovery of surprising exception rules
based on intensity of implication. In Jan M. Zytkow and Mohamed Quafafou,
editors, Principles of Data Mining and Knowledge Discovery, Second European
Statistical Implicative Criteria for Classification Trees 419
1 Introduction
In this chapter, we focus on the generalization of statistical interestingness
measures. We will consider objective association rule interestingness measures,
which aim at quantifying the quality of rules extracted from binary transac-
tional datasets. Such measures are said to be objective since they only rely
B. Vaillant et al.: On the behavior of the generalizations of the intensity of implication: A data-
driven comparative study, Studies in Computational Intelligence (SCI) 127, 421–447 (2008)
www.springerlink.com © Springer-Verlag Berlin Heidelberg 2008
422 B. Vaillant et al.
Fig. 1. Sup and Conf values of the rules extracted from the Flag database
between the antecedent and the consequent of a rule. Some of them com-
pare the confidence of a rule to 0.5, which corresponds to an indetermination
situation.
In a previous work, [25], we first suggested to parameterizing the reference
value of both descriptive and statistical measures in order to compare the
confidence to a reference value θ chosen by the user. The case of statistical
measures, especially the intensity of implication and its generalizations has
been explored in [21]. Theoretical aspects of our works were extended in [26].
In this chapter, we present our results on generalized statistical measures
and we propose an original data driven comparative study of the behavior of
generalized statistical measures.
This chapter is organized as follows. In section 2, we present a general
synthetic overview of statistical measures making reference to independence:
modeling of counter-examples distribution, construction of statistical and
probabilistic measure, enhancement of the discriminating power of the sta-
tistical measures. We introduce in section 3 the statistical measures making
reference to indetermination. Section 4 deals with the generalization of sta-
tistical measures. Discriminant versions of generalized measures are proposed
in section 5. Finally, we conclude in section 6.
A statistical measure evaluates how far the observed rule is from a null hy-
pothesis H0 corresponding to a lower reference point. From the definition of
a statistical measure, which is a modeling of the kind of rules that one wishes
to discover, it is then possible to define a probabilistic measure as the proba-
bility of obtaining a value of the statistical measure, at most equal to what is
observed, given that the null hypothesis H0 is true.
Classically, this null hypothesis is the hypothesis of independence between
itemsets A and B, and it is tested against a one sided alternative hypothesis
H1 of positive dependence. The corresponding test can be written in terms
of theoretical frequencies referring to A and B (π(·) being the theoretical fre-
quencies):
• Modeling 1 (only one hazard level): margins are fixed, only the joint ab-
solute frequencies are random, but with only one degree of freedom.
– The modeling proposed by [31] applies to the distribution of examples,
within the 4 inner possibilities of the contingency table of (A, B), na
and nb being fixed, following a traditional statistical process.
Under H0 , Nab here follows the hypergeometric law H(n, na , pb ). Test-
ing H0 thus means testing the equality of the theoretical confidence of
A → B and A → B, at fixed margins.
• Modeling 10 (only one hazard level): still at fixed margins, an alterna-
tive approach which only takes into account the distribution of examples
between AB and AB is proposed in [25].
– In this case, Nab follows the binomial law B(na , pb ). Testing H0 then
means testing the conformity of the theoretical confidence of A → B, pb
being fixed beforehand.
• Modeling 2 (two hazard levels): modeling 2 of [31] corresponds to modeling
10 , with na also randomized.
– On a first hazard level, it is thus here assumed that Na follows the
binomial law B(n, pa ).
– On a second hazard level, conditionally to Na = na , Nab follows the
binomial law B(na , pb ). Thus Nab follows the binomial law B(na , pa pb ).
• Modeling 3 (three hazard levels): modeling 3 of [31] once again relies on
modeling 10 , where the values of na , and then n are successively random-
ized.
– On the first hazard level, N is assumed to follow the Poisson law
P oi(n).
– On the second hazard level, it is assumed that Na follows the binomial
law B(n, pa ), conditionally to N = n.
– On the third hazard level, and conditionally to N = n and Na = na ,
it is assumed that Nab follows the binomial law B(na , pb ). In this case,
Nab follows the Poisson law P oi(npa pb ).
The statistical and probabilistic measures based on Nab̄ are built as follows:
• by establishing the law of Nab and Nab under the null hypothesis (H0 ) fol-
lowing the chosen modeling, we can express a centered and reduced index4
under H0 (CR notation). In order to have a decreasing quality measure with
respect to nab̄ , the statistical index is defined by SI(i) = −Nab
CR
, where i
refers to the corresponding modeling.
• under standard conditions, the law of this index can be approximated
by the normal distribution, leading to the definition of a probabilistic
measure, defined as the complement to 1 of the surprise of observing
such an exceptional value of the index under H0 . This probabilistic index
4 x−µ
Given a random variable X, its centered and reduced expression is xCR = √ ,
v
where µ is the mean of X and v its variance.
Generalized intensity of implication 427
√
The statistical measure obtained with modeling 1 is SI (1) = r n. The
√
corresponding probabilistic measure PI(1) = P (N (0, 1) < r n) is the com-
plement to 1 of the p-value of r. It is to be noted that in the boolean case,
nr2 = χ2 . Hence PI(1) is the unilateral counterpart of the complement to 1
associated with the χ2 test of independency.
Figure 3 shows that IntImp is an anamorphosis of -ImpInd ([13], see SI (3)
in Table 1) through the normal distribution function.
Although it has many good properties [27, 29], one of the major drawbacks of
IntImp (drawback shared by the other statistical and probabilistic measures)
is the loss of discriminating power: by its definition, it will evaluate rules sig-
nificantly different from independence between 0.95 and 1. If n becomes large,
which is particularly true in a data mining context, the slightest divergence
from an independence situation becomes highly significant, thus leading to
high and homogeneous values of the measure, close to 1. It is thus difficult to
428 B. Vaillant et al.
select the best rules. For example, we computed the values taken by IntImp
on rules extracted from three classical datasets [40]. The Breast Cancer, Con-
traceptive Method Choice and Housing datasets are available from the UCI
repository (http://www.ics.uci.edu/~mlearn/databases/).
On the Breast Cancer data, containing n = 683 entries, 3079 rules have
an IntImp value above 0.99, out of the 3095 rules generated by Apriori,
with σs = 0.10 and σc = 0.70. On the Contraceptive Method Choice data,
containing n = 1473 entries, we extracted 1035 rules having an IntImp value
above 0.99, out of the 2378 rules generated (with σs = 0.05 and σc = 0.60).
Finally, on the Housing data, containing n = 506 entries, 156 rules out of 263
were evaluated above 0.99 by IntImp, Apriori being run with σs = 0.02 and
σc = 0.55.
This phenomenon of loss of discriminant√power is illustrated in Figure 4,
in which we represent PI(1) = P (N (0, 1) < nr) for various values of n. This
figure shows the loss of discriminant power of the measure as n rises, although
r is not affected by such changes. For example, with n = 323, there are 991
rules evaluated above 0.999, 3540 when n is multiplied by 10, and 4205 when
n is multiplied by 100, out of the 5402 rules.
Using the third modeling does not solve the issue as can be seen in Figure 5.
In this situation almost all rules are evaluated above 0.95. On other rule sets,
as presented in Figure 6, the range of values that IntImp takes is wider.
In order to counter-balance this loss of discriminating power, [30] intro-
duce a contextual approach where ImpInd is centered and reduced on a case
database B, thus leading to the definition of the probabilistic discriminant
index (a monotonically increasing transformation of IntImp contextualized
on the data).
This index is defined as follows:
h i
PDI = P N (0, 1) > ImpIndCR/B
Solarflare database
1.0
100 n
n 10 n
0.8
n/10
0.6
PI(1)
0.4
0.2
0.0
0 1000 2000 3000 4000 5000
We proposed two adaptations of EII in order to cope with the above mentioned
issues: Revised EII, denoted REII and Truncated EII, denoted TEII [25,26].
Our first proposal involves replacing IntImp by IntImp∗ in EII where:
IntImp∗ = max{2IntImp − 1; 0}
This will solve the previously highlighted problems, but has the drawback
of modifying the entire spectrum of values taken by EII:
1
REII = [IntImp∗ · i(A ⊂ B)] 2
Figures 10 and 11 show the joint distribution of EII and REII in function
of pab̄ . Three families of rules are presented, for the first and the last ones there
is no observable difference between the measures. On the contrary we see the
Generalized intensity of implication 431
impact of the correction added in REII on the spectrum of values of EII for
the second family. In Figure 11, n is ten times smaller than in Figure 10. Here
for all three families, there are observable differences.
Our second proposal only nullifies the values of EII when pa pb̄ ≤ pab ≤
min{ p2a , p2b̄ }, without modifying its values otherwise. To achieve this, we in-
troduce Ht∗ (X) an adequate truncated version of H(X), and it a truncated
version of the inclusion index i. In order to take into account both predictive
and targeting strategies, a rule will have a non null evaluation by the inclusion
index, and hence by TEII when the following conditions are jointly met:
• pb/a > 0.5 (prediction) and pb/a > pb (targeting); i.e. pb/a > max(0.5, pb )
• pa/b > 0.5 (prediction) and pa/b > pa (targeting); i.e. pa/b > max(0.5, pa )
With these new conditions, TEII is null whenever the proportion of
counter-examples is above min pa pb̄ ; p2a ; p2b̄ :
1
TEII = [IntImp(A → B) × it (A ⊂ B)] 2
with:
1
• it (A ⊂ B) = (1 − Ht∗ (B/A)α ) 1 − Ht∗ (A/B)α 2α ,
Fig. 10. Joint distributions of EII and REII in function of pab̄ , n = 2000
432 B. Vaillant et al.
Fig. 11. Joint distributions of EII and REII in function of pab̄ , n = 200
n − 0.5na
IPEE = P B(na , 0.5) > nab ≈ P N (0, 1) > ab √
0.5 · na
Under normal approximation, IPEE equals 0.5 at indetermination. This
measure corresponds to the probabilistic index associated with modeling 10
(see Table 1), where pb is replaced by 0.5. IPEE will hence inherit the weak
discriminating power of this kind of measure.
As shown in Figure 12, IPEE and EII may take significantly different
values for some rules.
expression 0.5(1 + i(A ⊂ B)). This expression takes its values between 0.5
and 1, and equals 0.5 at indetermination. Hence, in this situation, the value
of IP3E is not nullified, as was the case for EII. As shown in Figure 13 the
contribution of this index is of less importance in this case. This can also be
seen in Figure 14 which compares IP3E to TEII.
Nab ≡ B(na , θ)
The results of the thus adapted modelings 1 and 10 are immediate, and
those of modelings 2 and 3 are easily obtained through the use of the proba-
bility generating functions as detailed in [26] and recalled Table 1.
From these results, we propose a range of generalized measures (see
Table 1), which are constructed in the same way as described in section 2.
(i) CR
Generalized statistical measures are defined by GSI|θ = −Nab , while gener-
(i)
alized probabilistic measures are defined by GPI|θ = P (N (0, 1) > nCR
ab
):
434 B. Vaillant et al.
• by establishing the law of Nab and Nab under the null hypothesis (H0 )
following the chosen modeling i, we can express a centered and reduced
(i)
index under H0 . This statistical index is denoted by GSI|θ .
• under standard conditions, the law of this index can be approximated to
the normal distribution, leading to the definition of a probabilistic mea-
sure, defined as the complement to 1 of the surprise of observing such
an exceptional value of the index under H0 . This probabilistic index is
(i)
denoted by GPI|θ .
(10 )
We will focus on two of these. The first one, GPI|θ , is associated with
modeling 10 and generalizes IPEE. For clarity reasons, it will be denoted
GIPE|θ (we here removed the last E, since the generalized measure no longer
makes reference to equilibrium). It corresponds to the chi-square goodness
of fit test, assessing whether or not the B/A distribution comes from the
(3)
distribution related to (θ; 1 − θ). The second one, GPI|θ , is associated with
modeling 3, and generalizes IntImp. It will thus be denoted by GIntImp|θ .
Using θ = 0.9 as lower reference value the generalized measures should
focus on rules having a confidence above this threshold. Clearly, we see in
Figures 15 and 16 that probabilistic indices stress the differences of evaluations
near this value, discarding rules far below to a null evaluation. On the contrary
rules above the reference tend to have a very good evaluation. Once more we
here see the importance of the use of a discriminant version of the probabilistic
measure.
(3) (3)
Fig. 15. Values taken by GSI|θ=0.9 in Fig. 16. Values taken by GPI|θ=0.9 in
function of Conf function of Conf
436
Statistical and probabilistic indices summary
generalized counterparts
B. Vaillant et al.
Statistical index:
N −np p 0
CR ab a b Nab −npa pb Nab −npa pb Nab −npa pb
SI = −Nab SI(1) = − √np SI(1 ) = − √ SI(2) = − q SI(3) = −ImpInd = − √
a pa pb p b
npa pb pb npa pb (1−pa pb ) npa pb
Probabilistic index:
√ 0
PI = P (N (0, 1) > nCR
ab
) PI(1) = P (N (0, 1) < r n) PI(1 ) PI(2) PI(3) = IntImp = P (N (0, 1) > ImpInd)
PI = P (N (0, 1) < SI)
Probabilistic index:
0
(1) (1 ) (2) (3)
GPI|θ = P (N (0, 1) > nCR
ab
) GPI|θ GPI|θ = GIPE|θ GPI|θ GPI|θ = GIntImp|θ
= P (N (0, 1) > GIndImp|θ )
GPI|θ = P (N (0, 1) < GSI|θ )
Table 1. Modeling of the various statistical and probabilistic indices, and their
Generalized intensity of implication 437
e |θ (X) = −e
H px log2 pex − (1 − pex ) log2 (1 − pex )
where pex is:
px px + 1 − 2θ
pex = if px ≤ θ, pex = otherwise (see Figure 17)
2θ 2(1 − θ)
We call this He |θ (X) index the off-centered entropic index. It is clear that
H|θ (X) is not a strict entropy and that it must be seen as a penalization
e
function.
The behavior of this off-centered entropy index, illustrated in Figure 18
for a B/A distribution, leads to interesting perspectives in data mining.
It could, for example, be used in a tree induction process to assess the
quality of the prediction of the class variable conditionally to the predictive
variables, when such a class is boolean and has a very unbalanced distribu-
tion [22].
From the definition of this new index, we build H e |θ (B/A) and H e |θ (A/B)
as:
• to obtain H e |θ (B/A) from H(B/A), pb/a is replaced by peb/a defined as
follows:
pb/a pb/a + 1 − 2θ
peb/a = if pb/a ≤ θ, peb/a = otherwise
2θ 2(1 − θ)
438 B. Vaillant et al.
pa/b pa/b + 1 − 2θ
pea/b = if pa/b ≤ θ, pea/b = otherwise
2θ 2(1 − θ)
This first possibility generalizes the inclusion index proposed in [13], which
can be retrieved using θ = 0.5.
• He |θ (A/B) could also be obtained from H(A/B), by using 1− pa ×(1−θ) as
pb
the reference, since pa/b = 1− ppa ×(1−pb/a ). In this case, when considering
b
e ∗ (X) = H
H e ∗ (X) = 1 otherwise
e |θ (X) if px > θ, H
|θ |θ
• Modeling 10 :
i 12 1
(10 ) (10 )
h
EGIPE|θ = EGPI|θ = GPI|θ × gi|θ = GIPE|θ × gi|θ 2
440 B. Vaillant et al.
(3)
It must be noticed that both components of EGPI|θ = EGINTIMP|θ
refer to the same θ, which ensures the coherence of the measure. In particular,
(3) (3)
for θ = pb , GPI|pb = GIntImp|pb corresponds to IntImp and EGPI|pb =
EGINTIMP|pb is more coherent than EII.
(10 )
In the case θ = 0.5, GPI|θ = GIPEE|θ corresponds to IPEE and gi|θ
corresponds to i. It appears that EGIPE|0.5 is slightly different from IP3E =
1
[IPEE × 0.5 (i(A ⊂ B) + 1)] 2 , the entropic version of IPEE proposed by [3].
Their behavior, compared to their original counterparts, is represented in
Figure 19 (for n = 1000, pa = 0.05 and pb = 0.10). They were obtained
using 3 different values for θ, θ = pb = 0.1 (thus targeting independence),
θ = 2 pb = 0.2 (targeting situations in which B happens twice as often when A
is true) and θ = 0.5 (prediction).
Fig. 19. Behavior of the measures, as functions of pb/a for n = 1000, pa = 0.05 and
pb = 0.10
Figure 19 well illustrates how the θ parameter choice controls the behav-
ior of the measures. Furthermore, we can see the effectiveness of the para-
metrization of the statistical or probabilistic measures, making them more
discriminant.
In the specific case where the reference value is θ = pb , one could prefer
the second version of the inclusion index.
Figures 20 and 21 which compare both alternatives for the third modeling
to TEII put forward the best fitness of the second index.
Generalized intensity of implication 441
Indeed in the first case all rules having pb ≥ pā /pb̄ ≥ 0.5 have a null
(3)
evaluation for EGPI|θ=pb .
(3) (3)
Fig. 20. Variations of EGPI|θ=pb in Fig. 21. Variations of EGPI|θ=pb in
function of TEII, first version of the en- function of TEII, second version of the
tropic coefficient entropic coefficient
0 0
If we consider now modeling 1 , and compare EGPI1|θ=0.5 to IP3E, we see
that its value is null under indetermination whereas IP3E still varies. The
range is therefore all the larger in our proposal (see Figure 22).
(10 )
Fig. 22. Variations of EGPI|θ=0.5 in function of IP3E
Fig. 23. Comparison of the entropic and contextual approach (modeling 3), for
θ = 0.5
rule to a user defined reference parameter. We extended this concept and de-
fined an off-centered entropy. Its behavior within a supervised learning context
is currently under study, and should lead to new perspectives.
Based on a sound and comprehensive framework, this chapter illustrates
the use of parameterized measures within a data mining process. The behavior
of the parameterized measures is illustrated using classical datasets, and these
measures are compared to their original counter-parts. This study highlights
the different properties of each of them. Our proposal leads to the definition
of a set of measures, for which one may choose the most adapted one to user
needs and data specificities.
References
1. R. Agrawal, T. Imielinski, and A.N. Swami. Mining association rules between
sets of items in large databases. In P. Buneman and S. Jajodia, editors, ACM
SIGMOD International Conference on Management of Data, pages 207–216,
Washington, D.C., USA, 1993.
2. M. Bailleul and R. Gras. L’implication statistique entre variables modales.
Mathématiques, informatique et sciences humaines, (128):41–57, 1995.
3. J. Blanchard, F. Guillet, H. Briand, and R. Gras. Assessing the interestingness
of rules with a probabilistic measure of deviation from equilibrium. In J. Janssen
444 B. Vaillant et al.
25. S. Lallich, B. Vaillant, and P. Lenca. Parametrised measures for the evaluation
of association rule interestingness. In J. Janssen and P. Lenca, editors, The
XIth International Symposium on Applied Stochastic Models and Data Analysis,
pages 220–229, Brest, France, 2005.
26. S. Lallich, B. Vaillant, and P. Lenca. A probabilistic framework towards the
parameterization of association rule interestingness measures. Methodology and
Computing in Applied Probability, 9(3):447–463, 2007.
27. P. Lenca, P. Meyer, B. Vaillant, and S. Lallich. On selecting interestingness
measures for association rules: user oriented description and multiple criteria
decision aid. European Journal of Operational Research, 184(2):610–626, 2008.
28. P. Lenca, P. Meyer, B. Vaillant, P. Picouet, and S. Lallich. Évaluation et analyse
multicritère des mesures de qualité des règles d’association. Revue des Nouvelles
Technologies de l’Information (Mesures de Qualité pour la Fouille de Données),
(RNTI-E-1):219–246, 2004.
29. P. Lenca, B. Vaillant, P. Meyer, and S. Lallich. Quality Measures in Data
Mining, volume 43 of Studies in Computational Intelligence, Guillet, F. and
Hamilton, H.J., Eds., chapter Association rule interestingness measures: exper-
imental and theoretical studies, pages 51–76. Springer-Verlag Berlin Heidelberg,
2007.
30. I.C. Lerman and J. Azé. Une mesure probabiliste contextuelle discriminante de
qualité des règles d’association. In M.-S. Hacid, Y. Kodratoff, and D. Boulanger,
editors, Extraction et gestion des connaissances, volume 17 of RSTI-RIA, pages
247–262. Lavoisier, 2003.
31. I.C. Lerman, R. Gras, and H. Rostam. Elaboration d’un indice d’implication
pour les données binaires, i et ii. Mathématiques et Sciences Humaines, (74,
75):5–35, 5–47, 1981.
32. K. McGarry. A survey of interestingness measures for knowledge discovery.
Knowledge Engineering Review Journal, 20(1):39–61, 2005.
33. G. Piatetsky-Shapiro. Discovery, analysis and presentation of strong rules. In
G. Piatetsky-Shapiro and W.J. Frawley, editors, Knowledge Discovery in Data-
bases, pages 229–248. AAAI/MIT Press, 1991.
34. G. Ritschard. De l’usage de la statistique implicative dans les arbres de classifi-
cation. In R. Gras, F. Spagnolo, and J. David, editors, The third International
Conference Implicative Statistic Analysis, pages 305–315, Palermo, Italia, 2005.
Supplément num. 15 de la Revue Quaderni di Ricerca in Didattica.
35. G. Ritschard and D.A. Zighed. Implication strength of classification rules. In
F. Esposito, Z.W. Ras, D. Malerba, and G. Semeraro, editors, 16th International
Symposium on Methodologies for Intelligent Systems, volume 4203 of LNAI,
pages 463–472, Bari, Italy, 2006. Springer.
36. E. Suzuki. In pursuit of interesting patterns with undirected discovery of ex-
ception rules. In S. Arikawa and A. Shinohara, editors, Progresses in Discovery
Science, volume 2281 of Lecture Notes in Computer Science, pages 504–517.
Springer-Verlag, 2002.
37. E. Suzuki and Y. Kodratoff. Discovery of surprising exception rules based on
intensity of implication. In J. M. Zytkow and M. Quafafou, editors, Principles
of Data Mining and Knowledge Discovery, volume 1510 of Lecture Notes in
Artificial Intelligence, pages 10–18, Nantes, France, September 1998. Springer-
Verlag.
38. P-N. Tan, V. Kumar, and J. Srivastava. Selecting the right objective measure
for association analysis. Information Systems, 4(29):293–313, 2004.
Generalized intensity of implication 447
Summary. Our aim is to put into practice the principle of test value percent crite-
rion to the counterexamples statistic, which is the basis of the well-known statistical
implicative analysis approach. We show how to compute the test value in this con-
text; what is the connection with the intensity of implication measure, on the one
hand; and the index of implication, on the other hand. We evaluate the behavior
of these measures on a large dataset comprising several hundred of thousands of
transactions. We evaluate especially the discriminating capacity of the measures, in
relation to specialized measure such as the entropic intensity of implication.
1 Introduction
Since the work of Agrawal and Srikant (1994) [1], the association rule mining
has received a great deal of attention and is became one of the most popular
method in the knowledge discovery community. This approach allows to pro-
duce implication rules such as “If A Then C”, where A and C are sets of items
or products in the analysis of market basket data. The meaning of the rule is
“whenever a set of transactions contains A, than it probably contains also C”.
Even if the association rule mining is very powerful, there is a pitfall which
can call into question its use: the number of generated rules can be very high,
it becomes difficult to distinguish the most interesting rules [13]. In this con-
text, it is important to have a numerical indicator which makes it possible to
propose the most relevant rules quickly, but also to validate them, so as to keep
only the rules which show a real causation. There are many proposals of rule
quality measurements these last years. Among them, we are interested in the
intensity of implication measure based on the counterexamples statistic [8, 9].
In order to transform the regularity, the concomitant occurrence of the
itemsets, into causation, the implication rule, we count the counterexamples
R. Rakotomalala and A. Morineau: The TVpercent principle for the counterexamples statistic,
Studies in Computational Intelligence (SCI) 127, 449–462 (2008)
www.springerlink.com © Springer-Verlag Berlin Heidelberg 2008
450 Ricco Rakotomalala and Alain Morineau
of the rule. The rule “If A Then C” is all the more relevant as it has few
counterexamples. The intensity of implication is a measurement that is based
on this idea. It was implemented in many domains. It is available on several
bench test softwares [14, 23].
The intensity of implication measure relies on a classical hypothesis-testing
scheme. The idea is not to test the absence or the presence of a real link be-
tween A and C, but rather to what extent we deviate from the reference
situation described by the null hypothesis. In this context, we often compute
the p-value of the test. It shares the property of any classical statistical index
in data mining context: for the same constant proportion, the value inop-
portunely increases with the number of observations. In certain situations,
when the number of examples is very high, the p-value cannot be computed
correctly because we exceed the accuracy of the common statistical libraries.
Thus, all the rules correspond to the maximum value of measurement. It is
not possible to distinguish the relevant rules.
In this paper, we put into practice the test-value percent principle on the
counterexamples statistic. The test-value is the expression of the p-value into
the number of standard deviation of the Gaussian distribution [15, 17]. In
a recent paper, we proposed a normalized version of the Test-Value named
TVpercent criterion [18]. Applications to association rules proved it to be
an interesting criterion to eliminate statistically uninteresting rules without
being influenced by the number of occurrences. This criterion remains com-
prehensible and discriminating even if we treat a huge database. It suggests a
threshold that allows to eliminate irrelevant rules. It enables to compare rules
from different databases.
The organization of this paper is as follows. In section 2, we recall the
computation formulas for the intensity of implication which is associated to
the counterexamples statistic. In the section 3, we quickly show the TVper-
cent framework. Then, we extrapolate the calculation of the TVpercent to the
counterexamples statistic. A new measure, ve , is described. In section 4, we
study the behavior of this measure on a large size database (340183 transac-
tions and 468 items). We compare the TVpercent to the standard intensity
of implication and the entropic intensity of implication which is designed for
rules with high supports [2]. We conclude in the section 5.
Rule A Ā Total
C
C̄ nac̄ nc̄
Total na n
Table 1. Contingency table — Number of transactions for a rule “If A then C”
parameter estimate under the null hypothesis; and then, an indicator which
measures the deviation of the observed data from the null hypothesis.
In the Gras’ approach [7], the statistical parameter is the number of coun-
terexamples Nac̄ to a rule. Its estimation is naturally the number of observed
counterexamples nac̄ (Table 1). The null hypothesis is the independence be-
tween the antecedent and the consequent of the rule. In this situation, the
probability πac̄ , i.e. e probability for obtaining counterexamples if the an-
tecedent of the rule is true, is equal to πa × πc̄ . It is the product of two
marginal probabilities, they can be estimated by nna × nnc̄ . The expectation
of the random variable Nac̄ under the null hypothesis is Λ = n × πa × πc̄ ,
estimated by λ = n × nna × nnc̄ .
Although we have a hypothesis testing framework, the goal is not to ac-
cept or reject the null hypothesis, but the characterization of the deviation
from this reference. In our situation, we want to characterize in what extent
the observed number of counterexamples nac̄ is less than λ. Various modeling
approaches are available. We can use a hypergeometric, binomial or Poisson
distribution. We mainly study here the third model with the Poisson distrib-
ution [16]. More than a simple approximation of the other distributions, this
sampling scheme is interesting because it treats in a non symmetrical way the
positive and negative associations, which enables to show a causation.
The p-value, the probability of obtaining a result at least as extreme or
impressive as that obtained for the data, assuming the null hypothesis is true,
is computed with a Poisson distribution with the parameter λ. The critical
region for Nac̄ is defined as the interval [0, nac̄ ]. The intensity of implication,
Ie , is the complement to 1 of the p-value of the test:
nac̄
X λm −λ
Ie = 1 − e (1)
m=0
m!
Numerical example We use an example described in the Ritschard’s pa-
per [20]. We want to characterize a rule where n = 273, na = 76, nc̄ = 153,
nac̄ = 28 (Table 2), and λ = 42.59, the intensity of implication is Ie = 0.9884.
Rule A Ā Total
C
C̄ 28 153
Total 76 273
Table 2. Example of a contingency table from Ritschard [20]
compute the index of implication, which is the standardized value of the ob-
served number of counterexamples: we use a continuity correction factor.
nac̄ + 0.5 − λ
ie = √ (2)
λ
The approximation of the intensity of implication using the Gaussian CDF
(cumulative distribution function) Φ is defined as follows:
Ia = 1 − Φ[ie ] (3)
Numerical example We take again the above example (Table 2). The index
of implication is ie = 28+0.5−12.59
√
42.59
= −2.16. The approximated intensity of
implication is Ia = 0.9846. We note that the true intensity (Ie ) and the
approximated intensity (Ia ) are similar. Very often, only the approximate
formulation is referred in publications.
There is a drawback for the utilization of the intensity of implication Ia if
the support of the rule nac is large: the computed value of Ia is mechanically
equal to 1 because the available libraries to compute the Gaussian cumula-
tive distribution function (CDF) are not enough accurate. For instance, if the
index of implication ie is less than −6.2, the Excel© spreadsheet gives system-
atically an intensity of implication equal to 1. We tested several libraries [11],
none could not significantly exceed this limitation.
Using the original formulation —Ie , Poisson distribution, equation (1)—
can slightly improve the results. But we deal with very small values of which
are badly handled by computation libraries. In our example (Table 1), if we
multiply all the values in the table by 4, counts would be equal to n = 1092 and
nac̄ = 112. These values are not excessive. However, we find with the exact for-
mula Ie = 0.99999879, and with the approximate formula Ia = 0.999995367.
It becomes very difficult to distinguish the interesting rules. This problem is
designed by the loss of the discriminating power of measurements. Elegant
solutions were suggested, especially with the concept of the entropic intensity
of implication where we balance the intensity of implication with an index of
inclusion [10]. We present this approach below in the experimentation section.
The TVpercent principle for the counterexamples statistic 453
Test value The test value also relies on a statistical framework. We try to
compare the observed parameter with the theoretical parameter under the
null hypothesis, which is the independence between the antecedent and the
consequent of a rule.
We mainly use the p-value p to characterize the strength of the deviation
between the observed number of counterexamples and the theoretical num-
ber of counterexamples under the reference situation. The p-value can take
very small values, close to zero, thus not very comprehensible as soon as one
deviates from the situation of reference with a large database size. In order
to obtain a better adapted measure, easily interpretable, we replace it by
the number of standard deviations of the standardized Gaussian distribution
which should be exceeded to cover the computed p-value (Figure 1). We call
the test value this criterion (equation 4).
T V = Φ−1 (1 − p) (4)
This criterion is often used to compare proportions or conditional average
for the characterization of the clusters built with clustering process [15].
Numerical example In our example (Table 2), the computed p-value for
nac̄ = 28 and λ = 42.59 with the Poisson distribution is p = 1 − Ie = 0.0116,
the corresponding test value is T V = 2.2701. This value is comparable to the
index of implication of which the negative, ve , can be considered as a rough
approximation of the test value ve = −ie = 2.16.
a well known phenomena by the statisticians, i.e. when the sample size in-
creases, a small deviation from the values of the parameter under the null
hypothesis becomes significant, even if it corresponds to a statistical artifact.
In order to avoid this pitfall, we have proposed a normalized test value [18].
The measure becomes independent of the real size of the database. We do not
forget that the main goal of the measure is to rank the rules in decreasing
relevance, and secondarily to suggest a cut value below which we can consider
that the rule does not bring relevant information.
The main idea is to set a priori the size of the dataset to 100. This value
corresponds to a reasonable size of the samples used when statistical inference
and hypothesis testing were historically developed. The value 100 is surely
an arbitrary value. But it is not more arbitrary than the usual confidence
level used in statistical inference (e.g. 5%, 1%, etc.). These confidence levels
are the results of the experiments of Fisher [6]. Indeed, in a not well-known
process depicted by Poitevineau [19], Fisher had hesitated for setting the
right value of the confidence level [5, 6]. In effect, the appropriate value of the
confidence level relies on the studied problem, the goal of the statistician, and
the characteristics of the dataset, especially the dataset size. From this point
of view, a criterion which allows to sorting the rules is surely essential. Using
the same criterion in order to mechanically accept or reject a rule is doubtful.
The original process to compute the normalized test value is a Monte Carlo
sampling approach. We draw randomly with replacement 100 examples from
the database and we compute the p-value p = 1 − Ie from equation (1) for the
corresponding 2 × 2 contingency table (Poisson approximation). We repeat
this process and then compute the average of the p-values p̄. In the last step,
the normalized test value is computed from this average T Vnorm = Φ−1 (1− p̄).
If we use a sufficient number of repetition (e.g. 2000 samples of 100 examples
with replacement), we obtain a stabilized value of the test value.
This process is known also as the bootstrap procedure, but the size of the
sample is arbitrarily set to 100 here, and not equal to the dataset sample size.
This criterion makes it possible to rank the rules computed on a database.
It has also the advantage, since it proceeds to the evaluation of the rules
in an unique reference (100 examples), of allowing the comparison of rules
computed on several similar databases, for example, on databases of different
size extracted on successive dates.
Practical computation of the TVpercent
The TVpercent principle for the counterexamples statistic 455
Règle A Ā Total
C
C̄ 10.25 56.04
Total 27.83 100
Table 3. Table brought back to 100 of table 2
In our first work, we use the TVpercent criterion for the co-occurrence of the
antecedent and the consequent (nac ). We used a hypergeometric distribution
456 Ricco Rakotomalala and Alain Morineau
4 Experiments
Computing the rules and the measures We use the Borgelt’s imple-
mentation for the rule generation [3]. It is available on the web site of the
author3 . Its implementation is very efficient but it computes rules with one
item only in the consequent. The parameters of the software are classical, we
can choose the minimum support, the minimum confidence and the maximum
length (number of items) of the rules.
The rules are then post processed in the Excel© spreadsheet. We compute
the various measures that we want to evaluate in this paper (Ie , Ia , IEa ,
T V percent, ie ). In spite of some doubt about the accuracy of this spreadsheet,
our background shows that the available very competitive specialized libraries4
are not really more accurate. This spreadsheet is considered to be adequate
in our exploratory study.
Evaluation framework Our aim is to check the concordances and the
discordances between our measure (T V percent) and the state of the art mea-
sures [10]. The first approach is to check if the various measures rank the rules
in the same way. A scatter plot allows to check that. We can also compute
3
http://fuzzy.cs.uni-magdeburg.de/~borgelt/apriori.html
4
e.g. STATLIB library — http://lib.stat.cmu.edu/index.php
458 Ricco Rakotomalala and Alain Morineau
Indicator Value
Number of transactions 340183
Number of items 468
Maximum length of rules 3
Support minimum 10%
Confidence minimum 75%
Number of rules 17212
Table 5. Characteristics of our experiments on the ACCIDENT dataset
The characteristics of the database and the computed rule set are described
in the Table 5. The parameters of the algorithm have been chosen after sev-
eral attempts. We note that the results of the various attempts are not in
contradiction with the results presented here.
With a minimum support of 10%, the support of the rules runs from
34018 to 340183 transactions. The number of counterexamples nac̄ of a rule
runs from 0 (no counterexample) to 81742 (the support of the rule is 258408
in this situation). In this context, the accuracy of the computation is very
important for the used measures.
The exact intensity of implication Ie : The exact formulation of the
intensity of implication, equation (1), can be computed only on 2864 rules
(among 17212 rules). The Excel© implementation of the Poisson CDF cannot
handle some values. Even if we use some tricks to improve the accuracy, we
doubt we can really improve the exact formulation on a large database.
The approximated intensity of implication Ia : The approximate for-
mulation using the Gaussian CDF, equation (3), is more robust. It can be
computed on all the rules. Another drawback appears. When the index of
implication is very small (in some situations it can be equal to −209.9), the
Gaussian CDF implemented in the spreadsheet is mechanically equal to 0. So
the approximate intensity of implication is 1 for 9960 rules among 12712.
At the beginning, we have thought that there was a specific problem of the
spreadsheet. But we found the same limitations with the specialized libraries
The TVpercent principle for the counterexamples statistic 459
5
e.g. STATLIB library (http://lib.stat.cmu.edu/index.php). The available imple-
mentations are described in a book [11].
460 Ricco Rakotomalala and Alain Morineau
When we deeply studied the results, we found that we have not a symmet-
rical situation. On the 20 first rules according to the TVpercent, 12 have the
maximal value of entropic intensity of implication IEa = 1. At the opposite,
there are 79 rules with IEa = 1, the best rules according to the TVpercent
are hidden among these rules.
TVpercent and the normalized index of implication We had al-
ready noted above the similarity between the index of implication and the
TVpercent, we had also noticed that the approximation was not very precise
on small dataset. What is the case when we bring back the values to n = 100.
Nevertheless, we calculated the negative of the index of implication directly
on the sample size brought back to 100. Indeed, even if the approximation of
the TVpercent with the index of implication is bad, perhaps they rank the
rules in a similar way?
The scatterplot shows that there is little discordance between the two mea-
sures, even if the approximation is not really accurate. The relation between
these measure is clearly non linear but monotonic (Figure 4). This visual im-
pression is corroborated with the correlation computed between the rank of
the two measures (the Spearman rank correlation) which is equal to 0.999.
When we focus on the 20 first rules according to the normalized index of impli-
cation, we have found the 18 best rules according to the TVpercent criterion.
In a real situation where we present the rules to a human expert, these two
criteria will approximately propose the same rules.
The principal difference between these two indicators is in the determina-
tion of the statistical valid rules. If we use critical values associated to usual
significance levels (e.g. 2.32 for a significance level of 1%, etc.), because the
TVpercent is always larger, we keep more rules with the TVpercent than the
normalized index of implication. It is not really a disadvantage of the nor-
malized index of implication. We have seen that using an arbitrary threshold
value in order to keep or remove rules must be made with caution. If the
user wants nevertheless to use this procedure, he must take into account this
The TVpercent principle for the counterexamples statistic 461
5 Conclusion
In this paper, we have generalized the TVpercent criterion to the counterex-
amples statistic. The main improvement of this new measure is the handling of
the rules computed on large databases. We preserve the discrimination power
of the measure i.e. the ability to rank rules without ties according to the mea-
sure. We can rank a great number of rules. In this way, it extends the field
of application of the intensity of implication and constitutes an alternative to
the entropic intensity of implication.
The second main result of this work is the similarity between the TVper-
cent and the normalized index of implication. Although the index underesti-
mates the true value of the TVpercent, it ranks the rules in same way, and
most of all, it points up the same rules if we are interested in the best rules
according to these criteria.
Of course, these conclusions rely mainly on experimental evaluation. We
studied the behavior of these measures on various parameters settings of the
rule extraction algorithm, corresponding to the post processing of more or
less large number of rules. The results described above were not called into
question. However, it would be interesting to complete this study on other
databases with different characteristics, for example with few transactions
and a very large number of items.
References
1. R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In
Proceedings of the 20th VLDB Conference, pages 487–499, 1994.
2. J. Blanchard, P. Kuntz, F. Guillet, and R. Gras. Mesures de qualité pour
la fouille de données, volume E-1, chapter Mesure de qualité des règles
d’association par l’intensité d’implication entropique. Revue des Nouvelles Tech-
nologies de l’Information, 2004.
3. C. Borgelt and R. Kruse. Induction of association rules: A priori implementation.
In 15th Conference on Computational Statistics, 2002.
4. T. Fawcett. Roc graphs: Notes and practical considerations for researchers.
Technical Report - HP Laboratories, 2003.
5. R.A. Fisher. Statistical Methods, Experimental Design, and Scientific Inference,
chapter The design of experiments. Oxford University Press, 1990. 1st edition,
1935, London, Oliver and Boyd.
6. R.A. Fisher. Statistical Methods for Research Workers. Oxford University Press,
14 edition, 1990. 1st edition, 1925, London, Oliver and Boyd.
7. R. Gras. Contribution à l’étude expérimentale et à l’analyse de certaines ac-
quisitions cognitives et de certains objets didactiques en mathématiques. Thèse
d’Etat, 1979.
462 Ricco Rakotomalala and Alain Morineau
1 Introduction
Among KDD techniques, association rules [2] allow the capture and the
representation of implicative patterns that tolerate a small set of counter-
examples —e.g. birds that cannot fly or sport cars that are not red. Associa-
tion rules can be enhanced with statistical evaluations and filters such as the
Intensity of Implication family of indices.
Association rule discovery is motivated by the exploitation of operational
databases to discover a new knowledge, that was unknown before the discovery
and that is potentially exploitable in a decision making process [19]. Many
performant algorithms have been published to optimize the association rules
search [8, 16] but they mainly focus on algorithmic optimization rather than
on knowledge usability.
One of the fundamental hypothesis of association rule discovery is that
the user does not specify the goal of the search. Because of the intrinsically
combinatorial nature of the search and the lack of the goals, the classical
use of these algorithms, chaining data selection, data formatting, frequent
sets induction, rules calculation and rule presentation to the user, generally
outputs quantities of rules, without order of any kind, which is in contradiction
with the principle of knowledge readability and usability for a decision process.
Experiments using a direct application of association rules algorithms like A
Priori, resulted in thousands of rules. We can then seriously contest the quality
of the vision of the studied domain provided by the association rules to the
user if he has to explore thousands of rules. We can contest as well the quality
of the induction itself if the energy that the user has to involve to interpret
the association rules is nearly the same as the energy he would have to deploy
to get the same domain understanding by directly browsing the database.
A classical answer to this problem is to set high thresholds on quality
indices that evaluate individual rules, to eliminate the least pertinent rules as
measured by these indices. But there are cases where this strategy cannot be
applied: when the user doesn’t know where to set thresholds corresponding
to the kind of knowledge he is looking for or when he’s looking for knowledge
with properties other than those matched by the available indices, or when
there are a lot of hidden dependencies in the data. Further global criteria have
been proposed, in addition to those measured on individual rules:
• operational criteria, in precise decision making tasks [6], but, these criteria
are specific and domain-dependent.
• Readability criteria, whose precise evaluation is based on a cognitive qual-
ification of the user’s perception of the represented knowledge. This cri-
terion depends of the visualization interfaces and their adaptation to the
decision making tasks. If we assume a linear, non-interactive, acquisition
of the knowledge by a decision maker, the limitation of the amount of rep-
resented rules, in association with a reading convention, is an important
factor of improvement of these criteria.
User-System Interaction for Redundancy-Free KD in Data 465
• The exploitation of the rules for automated tasks such as inference engines.
In this case, specific properties of the rules for inference, for example the
respect of logical properties, are evaluated as knowledge quality.
To meet those criteria, we propose to limit the number of association
rules by not representing rules that can be inferred by the user himself in
a logical reasoning. So eliminated rules are then considered as redundant as
opposed to the other represented rules. This redundancy elimination considers
a global criterion that is a complement to the evaluation of the quality of each
individual rule.
The redundancy elimination strategy strongly depends on the theory of
the represented knowledge, including a definition of the inferences that can
be made by the user reading the knowledge representation. Several models
have been proposed and some of these models are coupled with discovery
algorithms [12,13,15,25]; other models are coupled with specific representation
models such as Galois Lattices [4,5,10,23,24,29], or with measures that allow
approximating frequent itemsets [1].
Our proposed representation model is based on logical properties with the
assumption of an implicative behavior of association rules. It is based on a
closed itemset algorithm (see Ceglar and Roddick, 2006 [8] for a definition of
this class of association rule mining algorithms), with a pure logical construc-
tion of the closure relation. This hypothesis is enforced with the assumption
that a user or an automated deduction system will make logical deductions
using the ruleset during the interpretation of the rules.
Efficient methods have been proposed for redundancy elimination in func-
tional dependencies sets and functional dependencies are known to support
logical properties [27]. Thus we apply one of these methods, the minimal cov-
ers, to association rules filtering. This filtering is very efficient as it gives very
minimal representations, but there exists some issues where association rules
do not respect the logical assumptions of our model and for which the redun-
dancy reduction gives over-generalized rules, that, while being in respect with
a logical reasoning, are in contradiction with some statistical measures of the
rules. We detail in this paper these cases, and show, using synthetic examples,
the information loss encountered using our method.
2.1 Definitions
Closure on a FD set :
3 Minimal covers
The minimal covers is the minimal FD set, F̂ computed from FDs F such
as F̂ + = F + and F̂ is minimal. F̂ is minimal if it does not contain neither
redundant FDs, nor superfluous attributes. A FD is said to be redundant if
it can be written using the Armstrong’s axioms on the FD system with the
exception of this FD: X → Y of F is redundant if F ⊂ (F \ {X → Y })+ . This
condition is satisfied if (X → Y ) ∈ (F \ {X → Y })+ . By using the definition
of the closure of an attribute set over a FD set (4), a FD can be qualified
+ +
as redundant if Y ⊂ X(F \{X→Y }) . Ullman [28] shows that if Y ⊂ XF , then
(X → Y ) ∈ F + .
An attribute x of the left hand side of a FD X → Y is superfluous if the
FD (X \ x) → Y can be computed using the Armstrong’s axioms on the FD
system, i.e.
+
F ⊂ ((F \ {X → Y }) ∪ {(X \ x) → Y }) . This condition is satisfied if ((X \
x) → Y ) ∈ F + or Y ⊂ (X \ x)+ F.
For example, the minimal covers of F = {a → b, ab → c, ac → d, a → c} is
F̂ = {a → b, ab → c, a → d}, because {a → b, ab → c} allows to infer a → c,
then c is superfluous in ac → d and a → c is redundant.
| > (A ∪ A) → A
(6)
(A ∪ A) → A | > A → A;
therefore, y ∈ XF+ if y ∈ X; because x is in the right hand side of a FD
belonging to F + for which the left hand side is included into {x}.
Armstrong’s augmentation axiom (2) allows the rewriting of a FD (A →
B) ∈ F into
A→B | > (A ∪ A) → (A ∪ B)
(7)
and (A ∪ A) → (A ∪ B) | > A → (A ∪ B);
the addition of the FD ((A ∪ B) → C) ∈ F , the transitivity axiom (3)
allows to write
A → (A ∪ B) (7)
| > A → C. (8)
(A ∪ B) → C
The same rewritings can be achieved by the application of Armstrong’s
axioms on FDs whose left hand side is included in the attribute set A ∪ B.
The only demonstration of B ⊂ A+ F is enough to determine that if (A ∪ B →
C) ∈ F , then C ⊂ (A ∪ B)+F . Furthermore, there aren’t any other rewriting
starting from A → B giving FDs with right hand side that contain only
subsets of A ∪ B, then
y∈X
y∈ XF+ if (10)
or ∃A → B ∈ F | A ⊂ X and y ∈ (X ∪ A)+
(F \{A→B}) .
As rules are only evaluated once, the maximum number of iterations of the
closure main loop (lines 17 to 25) is the number of rules in the ruleset. Thus
this algorithm performs linearly with the number of rules of the computed
ruleset.
Data : A ruleset F .
Result : The minimal covers F̂ of F .
1 Let Fk = ∅;
2 foreach (X → Y ) ∈ F do
3 Let Xk = ∅;
4 foreach x ∈ X do
5 if Y 6⊂ (X \ x)+
F then
6 Xk = Xk ∪ {x};
7 if Xk 6= ∅ then
8 Fk = Fk ∪ {Xk → Y };
9 F̂ = Fk ;
10 foreach (X → Y ) ∈ Fk do
11 if Y ⊂ X(+F̂ \{X→Y }) then
12 F̂ = (F̂ \ {X → Y });
3.2 Examples
4 Related work
4.1 Propositional logic
Armstrong’s axioms are theorems of the propositional logic [28]. It can then be
proven that every computable expression using Armstrong’s axioms are true
formulae for the propositional logic. Kaufman gave the proof that the theory
of FD redundancy is valid for logical implications [17], therefore, that a FD
system F = (A, ∪, →) shares its properties with a world in propositional logic
w = (A, ∧, →) where A is a set of propositions, ∧ is the logical conjunction
and → is the logical implication.
470 R. Lehn et al.
Data :• F : a ruleset.
• X : a set of attributes.
• y : an attribute.
Result : a boolean : true if y ∈ XF+ , false else.
13 Let Fi = F ;
14 Let Xi = X;
15 Let Fk = ∅;
16 closed = false;
17 while not closed and y 6∈ Xi do
18 closed = true;
19 Fk = ∅;
20 foreach A → B ∈ Fi do
21 if A ⊂ Xi then
22 Xi = Xi ∪ B;
23 closed = false;
else
24 Fk = Fk ∪ {A → B};
25 Fi = Fk ;
26 if y ∈ Xi then
27 y ∈ XF+ !;
else
28 y 6∈ XF+ !;
Minimal covers and conceptual lattices share the use of inclusions between
extensions of the represented attribute combinations to limit the amount of
represented knowledge.
inference of the whole association rule system [10, 23, 29] with the following
reading conventions:
1. the support of a non-closed description (pseudo-intent) which is not rep-
resented on the Galois lattice is equal to the support of the closed set in
which it is included; in the example given figure 4, the support of a (non
closed) is the same as the support of a ∧ b ∧ c (closed). [10, 23];
472 R. Lehn et al.
9 redundant FD: F̂ = Fk = a → b, a → c, b → c
11 redundant FD: (X → Y ) = (a → b)
11 redundant FD: F̂ \ (X → Y ) = a → c, b → c
closure: b ∈ XF+ ?
closure: F = {b → c, a → c}, X = a
15 closure: Fk = ∅
20 closure: (A → B) = (b → c)
24 closure: A 6⊂ Xi
closure: Fk = {b → c}
closure: Xi = {a}
20 closure: (A → B) = (a → c)
22 closure: A ⊂ Xi
closure: Fk = {b → c}
closure: Xi = {a, c}
25 closure: Fi = Fk
closure: Fi = {b → c}
15 closure: Fk = ∅
20 closure: (A → B) = (b → c)
24 closure: A 6⊂ Xi
closure: Fk = {b → c}
closure: Xi = {a, c}
28 closure: b 6∈ XF+ !
11 redundant FD: b 6∈ ({a})+ (F̂ \(X→Y )
!
12 redundant FD: F̂ = a → b, a → c, b → c
11 redundant FD: (X → Y ) = (b → c)
11 redundant FD: F̂ \ (X → Y ) = a → b, a → c
closure: c ∈ XF+ ?
...
(in a similar way, it can be proven that:)
11 redundant FD: c 6∈ ({b})+ (F̂ \(X→Y )
!
12 redundant FD: F̂ = a → b, a → c, b → c
1
The frontier of the search is the set of the most specific intents.
User-System Interaction for Redundancy-Free KD in Data 473
11 redundant FD: (X → Y ) = (a → c)
11 redundant FD: F̂ \ (X → Y ) = a → b, b → c
closure: c ∈ XF+ ?
closure: F = {a → b, b → c}, X = a
15 closure: Fk = ∅
20 closure: (A → B) = (a → b)
22 closure: A ⊂ Xi
closure: Fk = {}
closure: Xi = {a, b}
20 closure: (A → B) = (b → c)
22 closure: A ⊂ Xi
closure: Fk = {}
closure: Xi = {a, b, c}
25 closure: Fi = Fk
closure: Fi = {}
27 closure: c ∈ Xi !
closure: c ∈ XF+ !
11 redundant FD: c ∈ ({a})+ F̂ \(X→Y )
!
redundant FD: a → c is redundant !
12 redundant FD: F̂ = a → b, b → c
implication. In this case, every discovered description is closed, then there are
no pseudo-intent and then the Galois lattice is equivalent with the description
inclusion lattice.
Algorithms are proposed to compute frequent closed itemsets [29], as a
replacement for the original frequent itemsets step in the classical association
rule discovery process.
δ-free sets :
Boulicaut et al. [5] proposed a new notion, the δ-free descriptions, that meet
this latter problem. It is an enhancement of the closure definition, taking into
account quasi-inclusions2 . A description is said to be δ-free if there aren’t
any rule between subsets of these description, that is invalidated by at most
δ examples. The δ factor is supposed to be small. The set of δ-free frequent
descriptions allows to approximate the set of the descriptions, with an error
rate defined by support and confidence thresholds. It reduces the number of
represented descriptions while reducing the time required for their computa-
tion.
2
that can be seen as extents statistical implications.
474 R. Lehn et al.
2 33 logical implications :
a b c a^b a^c b^c a^b^c left hand sides
a X X X X
b X X X X X X
c X X X X X X
a^b X X X X
a^c X X X X
b^c X X X X X
a^b^c X X X X
3 The minimal covers of the initial 33 rules 2 can be used to infer the logical
rules 2 .
4 The conceptual lattice of the relation can be used to infer the whole set of
rules, including the minimal covers. In this example, to infer the minimal
covers, we use the following reasoning: an attribute a appears in the closed
set a ∧ b ∧ c, but it does not appear on any other represented closed set.
It means that descriptions a, a ∧ b and a ∧ c, which are described by a are
described by b and c too and then that a → b ∧ c.
Armstrong axioms are not valid for every kind of implicative systems where
statistical implications are considered. The figure 5 is an illustration of the
limits of the transitivity axiom for statistical implications.
There are the same limits for the augmentation axiom.
The use of logical propositions for association rules are interesting in three
categories of applications:
User-System Interaction for Redundancy-Free KD in Data 475
C
A
B
(b, . . . , respectively)
1 Both statistical implications a → b and b → c are observable, as well is
observed the statistical implication a → c.
⇒ valid transitivity !
A
B
A
B
1. the hypothesis that association rules behave like logical propositions can
be investigated with the validation of an expert. This is required to use
the rules as a knowledge base for inference engines of an expert system.
2. Setting thresholds on quality indices of the rules (e.g. a low support and
an high confidence) aims at getting a behavior of association rules that
is near logical propositions. There isn’t unfortunately any formula giving
the correct thresholds that will ensure a complete logical behavior, except
the trivial case of a confidence threshold of 1.
3. The redundancy elimination can be considered as a part of an interactive
process, providing the user a way to infirm or confirm his hypothesis dur-
ing his reasoning [18]. This interaction can be useful in finding exceptions
to the logical behavior of rules or sets of rules.
The first two points are confirmed by experiments whose results are pre-
sented by figure 6 and table 73 . These represents experiments on 100 rulesets,
varying min_conf 4 in {0.8, 0.9, 1} and ϕ5 in {0.8, 0.9, 1}. For each ruleset, we
check the minimal covers and the closure with the same quality criteria to de-
termine the ratio of valid inferences. Here, we assume that the user is able to
infer the closure himself from either the original ruleset or its minimal covers.
These experiments show that the higher the min_conf threshold is set, the
better the minimal covers and inferences on the basis on the minimal covers
are correct. A similar but weaker behavior is observed with the intensity of
implication.
6 Conclusion
3
The program used to present these results can be downloaded at the following
URL : http://www.fc.univ-nantes.fr/~remi/felix/min-covers.
4
threshold on confidence.
5
Here, the original definition of intensity of implication is used [14].
User-System Interaction for Redundancy-Free KD in Data 477
100
% rules in minimal covers / total discovered
% valid rules in minimal covers
90 % valid rules among inferred rules
80
70
60
% rules
50
40
30
20
10
0
0 500 1000 1500 2000 2500 3000 3500
number of rules
References
1. F.-N. Afrati, A. Gionis, and H. Mannila. Approximating a collection of frequent
sets. In Proceedings of the Tenth ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining, pages 12–19. ACM, 2004.
2. R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, and A. Inkeri Verkamo. Fast
discovery of association rules. In Fayyad et al. [11], pages 307–328.
3. J. Atkins. A note on minimal covers. SIGMOD RECORD, 17(4):16–21,
December 1988.
478 R. Lehn et al.
Fig. 7. Experimental results : confirmation of the inferred rules using tests with
quality measures.
Maurice Bernadet
1 Introduction
Fuzzy logics are extensions of classical logics, which allow intermediate truth-
values between True and False [31]. They may express knowledge in a more
natural way than classical Boolean logics, allowing graduated attributes as
in the sentence “X is rather high” (for “X is high” is rather true) and then
assigning, for instance, a truth value of 0.8 to the proposition X is high. Fuzzy
logics offer many logical operators, which permits a good translation of various
kinds of knowledge. In the domain of knowledge discovery [13], considering
that crisp intervals on continuous attributes are difficult to interpret and that
strict thresholds are often too abrupt, one may think that fuzzy logics could
improve the expressiveness of extracted knowledge.
Let us first recall that fuzzy logics evaluate the truth-value of a fuzzy propo-
sition “X is A”, as the degree to which X belongs to the fuzzy set A: if µA (X)
is the membership (or characteristic) function of the fuzzy set A, one may
write Truth(“X is A”) = µA (X),
Fuzzy sets allow the definition of fuzzy C-partitions or “pseudo partitions”
in which each value of a continuous attribute may be classified into several
fuzzy classes, with a total membership of 1. These fuzzy pseudo partitions
allow the conversion of continuous attributes into fuzzy ones, then giving
the truth-value of fuzzy propositions. For a continuous attribute CA, varying
from minCA to maxCA, one can define a fuzzy pseudo partition in several
ways [7, 22].
The simplest method divides the interval [minCA, maxCA] in n sub-
intervals, with a small percentage of coverage between two adjacent ones,
giving each sub-interval a symbolic name related to their position. For in-
stance, one may divide the interval [minCA, maxCA] in 5 sub-intervals with
an overlap of 20%, then giving 5 fuzzy modalities for this attribute, such as
strong negative, rather negative, medium, rather positive and strong positive
(Fig. 1).
The fuzzy classes may also be defined by experts; otherwise one may choose
3 or 5 classes as standard options. Different numbers of classes may also
be used, but too high a number of classes might too heavily slow down the
knowledge discovery process. It is often interesting to try several numbers of
classes because it is difficult a priori choosing the number of classes that will
give a good partition is difficult.
Another kind of method extracts the number of classes and defines the
fuzzy C-classes from the database. These methods consider values of the at-
tributes giving the same conclusion and, whenever possible, cluster these val-
ues into the same fuzzy sets, with a membership value equal to the rate of
samples giving this conclusion. These methods often use histograms of at-
tribute values for each possible conclusion. Moreover, it is possible to develop
a more satisfactory method, by generalizing optimal discretization methods
such as those studied in [33] to fuzzy logics; we have recently defined such
a method, based on clustering, which gives more satisfying results than the
previous ones, but with the drawback of needing more computing time [5].
Once the fuzzy classes have been defined for each attribute, one may convert
the related values of each item by mapping these values to the membership val-
ues of each fuzzy class associated to the considered classical attribute (Fig. 2).
Fig. 2. Mapping from the value V of the continuous attribute CA into membership
values of fuzzy attributes (here, only µ3 and µ4 are non zero).
Several indexes may be used to evaluate classical rules [1], and we have chosen
three of them [3, 6]: the confidence, the support and a less known index, the
intensity of implication. Let us consider two propositions a and b associated
respectively with A and B, the sets of elements which verify them.
Fuzzy Knowledge Discovery Based on Statistical Implication Indexes 485
• The support of a rule “if a then b” may be defined as the rate of occurrences
of items verifying “a and b”, related to all items of the database; calling
na∧b the number of items verifying “a and b” and nE the total number of
items in the database, the support may be evaluated by
na∧b
Support(a ⇒ b) = (2)
nE
More often, however, we use Zadeh’s crisp cardinalities of fuzzy sets. The
confidence of a rule, its support and its intensity of implication are then ex-
pressed by the same formulas as above, by replacing cardinalities of crisp sets
by cardinalities of fuzzy sets. Thus, if one calls > (t-norm) the fuzzy “and”
operator with a fuzzy complement µA(x) = 1 − µA(x), one can write:
X
nA = Card(A) = µA(x), (6)
x∈E
X X
nA = Card(A) = µA(x) = (1 − µA(x)), (7)
x∈E x∈E
X X
nB = Card(B) = µB(x) = (1 − µB(x)), (8)
x∈E x∈E
X X
nA ∩ B = Card(A ∩ B) = µA ∩ B(x) = >(µA(x), µB(x))), (9)
x∈E x∈E
X X
nA∩B = Card(A∩B) = µA ∩ B(x) = >(µA(x), (1 − µB(x))) (10)
x∈E x∈E
This algorithm builds all the rules that can be constructed from a set of propo-
sitions, computes their confidence, their support and their intensity of impli-
cation and keeps the rules for which these indexes are above three respective
thresholds. To limit the number of rules studied and the exploration depth,
we also restrict to a maximum the number of propositions in the premises of
a rule. So, we use 4 thresholds α, β, γ, δ and one rule is kept if its confidence
is greater than α, its support greater than β, its intensity of implication over
γ and if its premises have at most δ propositions. The thresholds α, β, γ, δ
are chosen by the users in accordance to the number of rules that they wish
to obtain.
Rules are structured in a tree; the root of this tree (level 0) is the “rule
with the empty premises”, level 1 has rules using only 1 proposition in their
premises, . . . , level i has rules using i propositions in their premises, and so
on. The algorithm uses a depth first strategy; this search is not carried deeper
when the current rule has not the minimal support γ, or when the size of the
current rule is above δ.
Let us call
- c, the confidence of a rule, which must be greater than the threshold α;
- s, the support of a rule, which must be over the threshold β;
- i, the intensity of implication, which must be greater than the threshold
γ;
- l, the length of the rule (the number of propositions in its premises, which
must not be more than the threshold δ);
- E = e1 , e2 , . . . , en , the learning set;
- P = p1 , p2 , . . . , pn , the set of propositions describing examples in E;
- C, the set of propositions associated to the conclusions;
- D = a1 , a2 , . . . , am , the set of attributes in the possible propositions of
the premises;
- Fdecision , the fuzzy partition associated to the attribute of the classifying
decision;
- nFdecision , the cardinality of this partition;
- R, the set of rules produced.
Our algorithm, described below, uses two scanning procedures:
-“Forward” adds, when possible, a fuzzy proposition not used yet at this level
to the premises of the rule,
-“Backward” removes the last fuzzy proposition of the premises (the one on
the further right), and, if possible, replaces it by the following one. When this
proposition has no following one (because it uses the last modality of the last
attribute), the new most to the right proposition is removed and replaced, if
possible, and so on. When there are no more propositions in the premises, the
tree of premises has been completely explored and the algorithm ends.
490 M. Bernardet
For instance, let us consider three attributes {a, b, c}, each with three
modalities: {L=low, M=medium, H=high}; the rule tree will be explored by
successively considering premises of rules accordingly to Table 1.
With the indexes we have chosen only a basic fuzzy conjunction (“and” oper-
ator) is needed, but if the extracted rules are to be processed by a knowledge
492 M. Bernardet
based system, a fuzzy implication is also necessary, often with a fuzzy disjunc-
tion (“or”) and a fuzzy complement (“not”). Fuzzy logics offer a great choice
of logical operators [21]. Let us summarize the main fuzzy operators.
When there is no ambiguity, we will simplify our notation µ(a) into a,
representing both the proposition a and its truth value.
Our algorithms also need one aggregation operator , which may be defined
in numerous ways. However, since we want an averaging evaluation of the
implication and since we need a mechanism to allow exclusion of abnormal
records, we have chosen the arithmetic mean, which allows the use of standard
deviations:
n
Aggregation(µ1 (x, y), . . . , µn (x, y)) = 1/n
X
µi (x, y). (23)
i=1
Y is B’
For one implication µa⇒b (µa (x), µb (y)) = I(µa (x), µb (y)) and one t-norm >,
one can write:
µb0 (y) = sup {>(µa0 (x), I(µa (x), µb (y) )} (24)
x∈A0
First Benchmark
In a first benchmark, the items that do not belong to any class and which
may be considered as noisy data make up 5% of all data. We have studied the
GMP-pertinences for the 4 possible rules associating one fuzzy modality of
the first attribute (size) to one modality of the second attribute (shoe size).
Figure 4 describes the data set, in which points outside the ellipses represents
noisy data.
Second Benchmark
In a second benchmark (Fig. 5) we have increased the rate of noisy data,
which accounts now for 37% of all data.
Confidence levels calculated with different t-norms are rather close: 0.726 for
bold intersection, 0.700 for probabilistic intersection and 0.688 for Zadeh’s.
So, this rule is interesting with a conditional probability about 70%.
We have tried out our algorithms on several databases found in the UCI repos-
itory [19], in particular “Wisconsin Breast Cancer Database”, which consists
of 699 items with 10 attributes and two classes, “Wine Recognition Database”
with 178 items, 13 attributes and three classes and “Ionosphere Database”
with 351 items, 39 attributes and 2 classes. The results are similar to those
highlighted by our previous example, but the differences on GMP-pertinences
of operators are less strong than in our second benchmark, for which the pro-
portion of noisy data had been deliberately strengthened. Let us consider a
few examples extracted from our results.
For “Wisconsin Breast Cancer Database”, with 3 fuzzy partitions on each
attribute, a minimal confidence of 0.8, a minimal intensity of implication of
0.9, a support of 5% and at most 3 propositions in the premises, we get 331
rules; if one pushes the search to 6 propositions, we obtain 814 rules. Going
to 9 propositions brings few supplementary rules with a total of 870.
Considering the evolution of the rule numbers according to the maximum
number of premises and the number of classes (Table 7.3), one remarks that
the number of rules decreases when the number of classes increases, until 8
classes.
Fuzzy Knowledge Discovery Based on Statistical Implication Indexes 499
The profusion of rules with small numbers of classes is offset by the im-
precision of the rules: the average confidence of rules with 2 classes is much
weaker than that obtained with more classes. With 2 classes and at most 6
premises, only 35 rules (7%) have a confidence of 1, while with 9 classes and
6 premises, 388 rules (57%) have the same confidence of 1. Increasing the
number of premises beyond 6 brings little improvement because addition of
attributes to rules only specializes the rules with a confidence under 1.
For example, the rule If Clump Thickness is “very small” then Class is
“benign”, appears with a confidence of 0.964, a support of 28% and a GMP-
pertinence of 91.1%, 90.3% or 89.6% depending on the fuzzy operators. Spe-
cialization of this rule by adding supplementary attributes increases confidence
by reducing the support, until it reaches a confidence of 1; the rule is then: If
Clump Thickness is “very small” and Single Epithelial Cell Size is “very small”
and Bare Nucleii=“very little” then Class is “benign”. Its support is then 26%
with a GMP-pertinence of 1.
Results on other databases are similar, but, due to the highest number of
attributes, the numbers of generated rules are much larger. Using the same
thresholds and 3 classes by attribute, one obtains for “Wine Data Base” 1092
rules of at most 3 premises and 13470 of at most 6 premises. With “Ionosphere
Data Base” one extracts 13824 rules with at most 3 premises. A more severe
choice of thresholds is then needed to reduce these high numbers of rules.
Let us consider another example, from the results on “Wine Data Base”.
The rule If Magnesium is “very little” then Wine is “type2” appears with a
confidence of 0.894, and when specializing it by adding one attribute, we
get 3 new rules with a confidence of 1, such as the rule If Magnesium is
“very little” and Flavanoids is “little” then Wine is “type2”. Adding one more
attribute gives 9 more rules with a full confidence of 1. Comparisons between
qualities of operators using GMP-pertinences confirm again our conclusions
on the choice of operators, the association between Lukasiewicz’ implication
and Lukasiewicz’ bounded sum and difference appearing slightly better.
500 M. Bernardet
Proof
A) With Zadeh’s >Z (a, b) = min(a, b) and ⊥Z (a, b) = max(a, b), one may take
into account the fact that fuzzy implications are monotonously decreasing with
their first argument: the truth-value of any fuzzy implications must increase
as the truth of its antecedent decreases. So, for any fuzzy implication I
- when α ≤ β
I(α, γ) ≥ I(β, γ), so >Z (I(α, γ), I(β, γ)) = min(I(α, γ), (β, γ)) = I(β, γ),
while I(⊥Z (α, β), γ) = I(max(α, β), γ) = I(β, γ).
Therefore, in this case, >Z (I(α, γ), I(β, γ)) = I(β, γ) = I(⊥Z (α, β), γ), and
by symmetry on α and β, this result is always true:
>Z (I(α, γ), I(β, γ)) ≡ I(⊥Z (α, β), γ).
Proof
A) With Zadeh’s >Z (a, b) = min(a, b) and ⊥Z (a, b) = max(a, b), one may take
into account the fact that fuzzy implications are monotonously increasing with
their second argument: the truth-value of any fuzzy implication must increase
when the truth of its conclusion increases. So, for any fuzzy implication I
- when β ≤ γ
I(α, β) ≤ I(α, γ), so >Z (I(αβ), (α, γ)) = min(I(α, β), I(α, γ)) = I(α, β),
while I(α, >Z (β, γ)) = I(α, min(β, γ)) = I(α, β).
Fuzzy Knowledge Discovery Based on Statistical Implication Indexes 503
Therefore, in this case, >Z (I(α, β), I(α, γ)) = I(α, β) = I(α, >Z (β, γ)), and
by symmetry on β and γ, this result is always true:
>Z (I(α, β), I(α, γ)) ≡ I(α, >Z (β, γ)).
One may remark here that among the sets of fuzzy operators that appear
as the best with the GMP, we have found Gödel’s t-norm (Zadeh’s minimum)
and Gödel-Brouwer’s implication. Therefore, this set of fuzzy operators may
be considered as the most interesting when one wants to extract rules for a
knowledge based system and also to reduce the extracted rules within a same
application. These results also illustrate that fuzzy rules cannot generally be
treated as classical rules.
10 Conclusion
We have described a generalization of statistical implication indexes to fuzzy
knowledge discovery. The first operation needed to compute these indexes is
the choice of fuzzy partitions to convert numerical or symbolic attributes into
fuzzy ones. We have justified our choice of three classical statistic indexes:
the support, the confidence and the less common, but powerful, intensity of
implication. We have then explained how we have adapted these indexes to
504 M. Bernardet
References
1. R. Agrawal, T. Imielinski, and A. N. Swami. Mining association rules between
sets of items in large databases. In Peter Buneman and Sushil Jajodia, editors,
Proceedings of the 1993 ACM SIGMOD International Conference on Manage-
ment of Data, pages 207–216, Washington, D.C., 1993.
2. J. Aguilar-Martin and R. Lopez De Mantaras. The process of classification and
learning the meaning of linguistic descriptors of concepts. In M. M. Gupta and
E. Sanchez, editors, Approximate reasoning in decision analysis, pages 165–175.
North Holland, 1982.
3. M. Bernadet. Basis of a fuzzy knowledge discovery system. In Conf. PKDD’2000
- LNAI 1910, pages 24–33. Springer-Verlag, 2000.
Fuzzy Knowledge Discovery Based on Statistical Implication Indexes 505
25. J. Rives. FID3: Fuzzy induction decision tree. In ISUMA’90, pages 457–462,
December 1990.
26. J. A. Roubos, M. Setnes, and J. Abonyi. Learning fuzzy classification rules from
data. In Developments in Soft Computing, pages 108–115. Springer-Verlag, 2001.
27. F. Spagnolo and R. Gras. A new approach in Zadeh classification: fuzzy im-
plication through statistic implication. In NAFIPS-IEEE 3rd Conference of
the North American Fuzzy Information Processing Society, pages 425–429, June
2004.
28. R. Weber. Fuzzy-ID3: a class of methods for automatic knowledge acquisition.
In 2nd International Conference on Fuzzy Logic and Neural Networks, pages
265–268, July 1992.
29. M. Wygralak. Questions of cardinality of finite fuzzy sets. Fuzzy Sets and
Systems, 102:185–210, 1999.
30. L. A. Zadeh. Probability measures of fuzzy events. Journal of Mathematical
Analysis and Applications, 23:421–427, 1968.
31. L. A. Zadeh. Fuzzy logic and its application to approximate reasoning. Infor-
mation Processing, 74:591–594, 1974.
32. J. Zeidler and M. Schlosser. Continuous valued attributes in fuzzy decision trees.
In Conf. IPMU’96 (Information Processing and Management of Uncertainty in
Knowledge-Based Systems), pages 395–400, 1996.
33. D. A. Zighed, S. Rabaseda, R. Rakotomalala, and F. Feschet. Discretization
methods in supervised learning. In Encyclopedia of Computer Science and Tech-
nology, volume 40, pages 35–50. Marcel Dekker, 1999.
About the editors
1
http://www.polytech.univ-nantes.fr/associationEGC
Index
H
e ,theta (X)437 KDD, see Knowledge Discovery in
Haberman’s adjusted residual, 404 Databases
HAMB, 207 Knowledge Discovery in Databases, 227,
Hical, 231 463
hierarchical classification, 26
hierarchical clustering, 147 latent variables, 136
hierarchical similarity diagram, 138 leaf, 400
hierarchy tree, 48 Likelihood Linkage Analysis, 14, 239
High ranking, 206 LLA, see Likelihood Linkage Analysis
Historic-epistemological Representa- Loevinger’s coefficient, 16
tions, 248 logic implication, 397
Hypergeometric, 14 Logic of Bayesian inference, 175
logical rule, 14
ILE, see Interactive Learning Environ- low ranking, 206
ments Lukasiewicz’s bounded, 492
IM, see Interestingness Measures
implication intensity, 13, 16 Main Components Analysis, 102
implicative masculine form, 127
distance, 31 material context, 100
512 Index