100% found this document useful (1 vote)

2K views475 pages

Knowledge Engineering PDF

Uploaded by

Lera Denisova

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

2K views475 pages

Knowledge Engineering PDF

Uploaded by

Lera Denisova

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

KNOWLEDGE ENGINEERING

Building Cognitive Assistants for Evidence-Based Reasoning

This book presents a signiﬁcant advancement in the theory and practice

of knowledge engineering, the discipline concerned with the development
of intelligent agents that use knowledge and reasoning to perform problem-
solving and decision-making tasks. It covers the main stages in the develop-
ment of a knowledge-based agent: understanding the application domain,
modeling problem solving in that domain, developing the ontology, learning
the reasoning rules, and testing the agent. The book focuses on a special
class of agents: cognitive assistants for evidence-based reasoning that learn
complex problem-solving expertise directly from human experts, support
experts, and nonexperts in problem solving and decision making, and teach
their problem-solving expertise to students.
A powerful learning agent shell, Disciple-EBR, is included with the book,
enabling students, practitioners, and researchers to develop cognitive
assistants rapidly in a wide variety of domains that require evidence-based
reasoning, including intelligence analysis, cybersecurity, law, forensics,
medicine, and education.

Gheorghe Tecuci (PhD, University of Paris-South, July 1988, and Poly-

technic Institute of Bucharest, December 1988) is Professor of Computer
Science and Director of the Learning Agents Center in the Volgenau School
of Engineering of George Mason University, Member of the Romanian
Academy, and former Chair of Artiﬁcial Intelligence in the Center for
Strategic Leadership of the U.S. Army War College.

Dorin Marcu (PhD, George Mason University, 2009) is Research Assistant

Professor, as well as Senior Software and Knowledge Engineer, in the
Learning Agents Center, Volgenau School of Engineering, George Mason
University.

Mihai Boicu (PhD, George Mason University, 2003) is Associate Professor

of Information Sciences and Technology, and Associate Director of the
Learning Agents Center, Volgenau School of Engineering, George Mason
University.

David A. Schum (PhD, Ohio State University, 1964) is Emeritus Professor

of Systems Engineering, Operations Research, and Law, as well as Chief
Scientist of the Learning Agents Center at George Mason University. He is
also Honorary Professor of Evidence Science at University College London.
Knowledge Engineering
Building Cognitive Assistants for
Evidence-Based Reasoning

GHEORGHE TECUCI
George Mason University

DORIN MARCU
George Mason University

MIHAI BOICU
George Mason University

DAVID A. SCHUM
George Mason University
One Liberty Plaza, New York, NY 10006

Cambridge University Press is part of the University of Cambridge.

It furthers the University’s mission by disseminating knowledge in the pursuit of

education, learning, and research at the highest international levels of excellence.

www.cambridge.org
Information on this title: www.cambridge.org/9781107122567
© Gheorghe Tecuci, Dorin Marcu, Mihai Boicu, and David A. Schum 2016

This publication is in copyright. Subject to statutory exception

and to the provisions of relevant collective licensing agreements,
no reproduction of any part may take place without the written
permission of Cambridge University Press.

First published 2016

Printed in the United States of America by Sheridan Books, Inc.

A catalog record for this publication is available from the British Library.
Library of Congress Cataloging-in-Publication Data
Names: Tecuci, Gheorghe, author. | Marcu, Dorin, author. | Boicu, Mihai, author. |
Schum, David A., author.
Title: Knowledge engineering: building cognitive assistants for evidence-based reasoning /
Gheorghe Tecuci, George Mason University, Dorin Marcu, George Mason University,
Mihai Boicu, George Mason University, David A. Schum, George Mason University.
Description: New York NY : Cambridge University Press, 2016. | Includes bibliographical references and index.
Identifiers: LCCN 2015042941 | ISBN 9781107122567 (Hardback : alk. paper)
Subjects: LCSH: Expert systems (Computer science) | Intelligent agents (Computer software) | Machine learning |
Artificial intelligence | Knowledge, Theory of–Data processing.
Classification: LCC QA76.76.E95 T435 2016 | DDC 006.3/3–dc23 LC record available at
http://lccn.loc.gov/2015042941

ISBN 978-1-107-12256-7 Hardback

Cambridge University Press has no responsibility for the persistence or accuracy

of URLs for external or third-party Internet Web sites referred to in this publication
and does not guarantee that any content on such Web sites is, or will remain,
accurate or appropriate.
Contents

Preface page xv
Acknowledgments xxi
About the Authors xxiii

1 Introduction . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Understanding the World through Evidence-based
Reasoning 1
1.1.1 What Is Evidence? 1
1.1.2 Evidence, Data, and Information 1
1.1.3 Evidence and Fact 2
1.1.4 Evidence and Knowledge 2
1.1.5 Ubiquity of Evidence 5
1.2 Abductive Reasoning 5
1.2.1 From Aristotle to Peirce 5
1.2.2 Peirce and Sherlock Holmes on Abductive
Reasoning 6
1.3 Probabilistic Reasoning 9
1.3.1 Enumerative Probabilities: Obtained by Counting 9
1.3.1.1 Aleatory Probability 9
1.3.1.2 Relative Frequency and Statistics 9
1.3.2 Subjective Bayesian View of Probability 11
1.3.3 Belief Functions 13
1.3.4 Baconian Probability 16
1.3.4.1 Variative and Eliminative Inferences 16
1.3.4.2 Importance of Evidential Completeness 17
1.3.4.3 Baconian Probability of Boolean
Expressions 20
1.3.5 Fuzzy Probability 20
1.3.5.1 Fuzzy Force of Evidence 20
1.3.5.2 Fuzzy Probability of Boolean Expressions 21
1.3.5.3 On Verbal Assessments of Probabilities 22
1.3.6 A Summary of Uncertainty Methods and What
They Best Capture 23
1.4 Evidence-based Reasoning 25
1.4.1 Deduction, Induction, and Abduction 25
1.4.2 The Search for Knowledge 26
1.4.3 Evidence-based Reasoning Everywhere 27

13:46:00,
vi Contents

1.5 Artiﬁcial Intelligence 29

1.5.1 Intelligent Agents 30
1.5.2 Mixed-Initiative Reasoning 32
1.6 Knowledge Engineering 33
1.6.1 From Expert Systems to Knowledge-based Agents
and Cognitive Assistants 33
1.6.2 An Ontology of Problem-Solving Tasks 35
1.6.2.1 Analytic Tasks 36
1.6.2.2 Synthetic Tasks 36
1.6.3 Building Knowledge-based Agents 37
1.6.3.1 How Knowledge-based Agents Are Built
and Why It Is Hard 37
1.6.3.2 Teaching as an Alternative to
Programming: Disciple Agents 39
1.6.3.3 Disciple-EBR, Disciple-CD, and
TIACRITIS 40
1.7 Obtaining Disciple-EBR 41
1.8 Review Questions 42

2 Evidence-based Reasoning: Connecting the Dots . . . . 46

2.1 How Easy Is It to Connect the Dots? 46
2.1.1 How Many Kinds of Dots Are There? 47
2.1.2 Which Evidential Dots Can Be Believed? 48
2.1.3 Which Evidential Dots Should Be Considered? 50
2.1.4 Which Evidential Dots Should We Try to
Connect? 50
2.1.5 How to Connect Evidential Dots to Hypotheses? 52
2.1.6 What Do Our Dot Connections Mean? 54
2.2 Sample Evidence-based Reasoning Task: Intelligence
Analysis 56
2.2.1 Evidence in Search of Hypotheses 56
2.2.2 Hypotheses in Search of Evidence 58
2.2.3 Evidentiary Testing of Hypotheses 60
2.2.4 Completing the Analysis 62
2.3 Other Evidence-based Reasoning Tasks 64
2.3.1 Cyber Insider Threat Discovery and Analysis 64
2.3.2 Analysis of Wide-Area Motion Imagery 68
2.3.3 Inquiry-based Teaching and Learning in a Science
Classroom 70
2.3.3.1 Need for Inquiry-based Teaching and
Learning 70
2.3.3.2 Illustration of Inquiry-based Teaching
and Learning 71
2.3.3.3 Other Examples of Inquiry-based
Teaching and Learning 74
2.4 Hands On: Browsing an Argumentation 76
2.5 Project Assignment 1 81
2.6 Review Questions 81

13:46:00,
Contents vii

3 Methodologies and Tools for Agent Design and

Development . . . . . . . . . . . . . . . . . . . . . 83
3.1 A Conventional Design and Development Scenario 83
3.1.1 Conventional Design and Development Phases 83
3.1.2 Requirements Specification and Domain
Understanding 83
3.1.3 Ontology Design and Development 85
3.1.4 Development of the Problem-Solving Rules or
Methods 86
3.1.5 Verification, Validation, and Certification 87
3.2 Development Tools and Reusable Ontologies 88
3.2.1 Expert System Shells 88
3.2.2 Foundational and Utility Ontologies and Their
Reuse 89
3.2.3 Learning Agent Shells 90
3.2.4 Learning Agent Shell for Evidence-based
Reasoning 91
3.3 Agent Design and Development Using Learning
Technology 93
3.3.1 Requirements Specification and Domain
Understanding 93
3.3.2 Rapid Prototyping 93
3.3.3 Ontology Design and Development 100
3.3.4 Rule Learning and Ontology Refinement 101
3.3.5 Hierarchical Organization of the Knowledge
Repository 104
3.3.6 Learning-based Design and Development Phases 105
3.4 Hands On: Loading, Saving, and Closing Knowledge
Bases 107
3.5 Knowledge Base Guidelines 111
3.6 Project Assignment 2 111
3.7 Review Questions 112

4 Modeling the Problem-Solving Process . . . . . . . . . 113

4.1 Problem Solving through Analysis and Synthesis 113
4.2 Inquiry-driven Analysis and Synthesis 113
4.3 Inquiry-driven Analysis and Synthesis for Evidence-based
Reasoning 119
4.3.1 Hypothesis Reduction and Assessment Synthesis 119
4.3.2 Necessary and Sufﬁcient Conditions 120
4.3.3 Sufﬁcient Conditions and Scenarios 120
4.3.4 Indicators 121
4.4 Evidence-based Assessment 122
4.5 Hands On: Was the Cesium Stolen? 124
4.6 Hands On: Hypothesis Analysis and Evidence Search
and Representation 130
4.7 Believability Assessment 133
4.7.1 Tangible Evidence 133
4.7.2 Testimonial Evidence 135

13:46:00,
viii Contents

4.7.3 Missing Evidence 137

4.7.4 Authoritative Record 137
4.7.5 Mixed Evidence and Chains of Custody 138
4.8 Hands On: Believability Analysis 140
4.9 Drill-Down Analysis, Assumption-based Reasoning,
and What-If Scenarios 143
4.10 Hands On: Modeling, Formalization, and Pattern
Learning 144
4.11 Hands On: Analysis Based on Learned Patterns 146
4.12 Modeling Guidelines 147
4.13 Project Assignment 3 151
4.14 Review Questions 152

5 Ontologies . . . . . . . . . . . . . . . . . . . . . . 155
5.1 What Is an Ontology? 155
5.2 Concepts and Instances 156
5.3 Generalization Hierarchies 157
5.4 Object Features 158
5.5 Deﬁning Features 158
5.6 Representation of N-ary Features 160
5.7 Transitivity 161
5.8 Inheritance 162
5.8.1 Default Inheritance 162
5.8.2 Multiple Inheritance 162
5.9 Concepts as Feature Values 163
5.10 Ontology Matching 164
5.11 Hands On: Browsing an Ontology 165
5.12 Project Assignment 4 168
5.13 Review Questions 168

6 Ontology Design and Development . . . . . . . . . . 174

6.1 Design and Development Methodology 174
6.2 Steps in Ontology Development 174
6.3 Domain Understanding and Concept Elicitation 176
6.3.1 Tutorial Session Delivered by the Expert 177
6.3.2 Ad-hoc List Created by the Expert 177
6.3.3 Book Index 177
6.3.4 Unstructured Interviews with the Expert 177
6.3.5 Structured Interviews with the Expert 177
6.3.6 Protocol Analysis (Think-Aloud Technique) 178
6.3.7 The Card-Sort Method 179
6.4 Modeling-based Ontology Speciﬁcation 179
6.5 Hands On: Developing a Hierarchy of Concepts and
Instances 180
6.6 Guidelines for Developing Generalization Hierarchies 186
6.6.1 Well-Structured Hierarchies 186
6.6.2 Instance or Concept? 187
6.6.3 Speciﬁc Instance or Generic Instance? 188
6.6.4 Naming Conventions 188
6.6.5 Automatic Support 189

13:46:00,
Contents ix

6.7 Hands On: Developing a Hierarchy of Features 189

6.8 Hands On: Deﬁning Instances and Their Features 192
6.9 Guidelines for Deﬁning Features and Values 195
6.9.1 Concept or Feature? 195
6.9.2 Concept, Instance, or Constant? 196
6.9.3 Naming of Features 196
6.9.4 Automatic Support 197
6.10 Ontology Maintenance 197
6.11 Project Assignment 5 198
6.12 Review Questions 198

7 Reasoning with Ontologies and Rules . . . . . . . . . 202

7.1 Production System Architecture 202
7.2 Complex Ontology-based Concepts 203
7.3 Reduction and Synthesis Rules and the Inference Engine 204
7.4 Reduction and Synthesis Rules for Evidence-based
Hypotheses Analysis 206
7.5 Rule and Ontology Matching 207
7.6 Partially Learned Knowledge 212
7.6.1 Partially Learned Concepts 212
7.6.2 Partially Learned Features 213
7.6.3 Partially Learned Hypotheses 214
7.6.4 Partially Learned Rules 214
7.7 Reasoning with Partially Learned Knowledge 215
7.8 Review Questions 216

8 Learning for Knowledge-based Agents . . . . . . . . . 222

8.1 Introduction to Machine Learning 222
8.1.1 What Is Learning? 222
8.1.2 Inductive Learning from Examples 223
8.1.3 Explanation-based Learning 224
8.1.4 Learning by Analogy 225
8.1.5 Multistrategy Learning 226
8.2 Concepts 227
8.2.1 Concepts, Examples, and Exceptions 227
8.2.2 Examples and Exceptions of a Partially Learned
Concept 228
8.3 Generalization and Specialization Rules 229
8.3.1 Turning Constants into Variables 230
8.3.2 Turning Occurrences of a Variable into Different
Variables 230
8.3.3 Climbing the Generalization Hierarchies 231
8.3.4 Dropping Conditions 231
8.3.5 Extending Intervals 231
8.3.6 Extending Ordered Sets of Intervals 232
8.3.7 Extending Symbolic Probabilities 232
8.3.8 Extending Discrete Sets 232
8.3.9 Using Feature Deﬁnitions 233
8.3.10 Using Inference Rules 233

13:46:00,
x Contents

8.4 Types of Generalizations and Specializations 234

8.4.1 Deﬁnition of Generalization 234
8.4.2 Minimal Generalization 234
8.4.3 Minimal Specialization 235
8.4.4 Generalization of Two Concepts 236
8.4.5 Minimal Generalization of Two Concepts 236
8.4.6 Specialization of Two Concepts 237
8.4.7 Minimal Specialization of Two Concepts 237
8.5 Inductive Concept Learning from Examples 238
8.6 Learning with an Incomplete Representation Language 242
8.7 Formal Deﬁnition of Generalization 243
8.7.1 Formal Representation Language for Concepts 243
8.7.2 Term Generalization 245
8.7.3 Clause Generalization 245
8.7.4 BRU Generalization 246
8.7.5 Generalization of Concepts with Negations 247
8.7.6 Substitutions and the Generalization Rules 247
8.8 Review Questions 247

9 Rule Learning . . . . . . . . . . . . . . . . . . . . 252

9.1 Modeling, Learning, and Problem Solving 252
9.2 An Illustration of Rule Learning and Reﬁnement 253
9.3 The Rule-Learning Problem 257
9.4 Overview of the Rule-Learning Method 258
9.5 Mixed-Initiative Example Understanding 260
9.5.1 What Is an Explanation of an Example? 260
9.5.2 Explanation Generation 262
9.6 Example Reformulation 264
9.7 Analogy-based Generalization 265
9.7.1 Analogical Problem Solving Based on
Explanation Similarity 265
9.7.2 Upper Bound Condition as a Maximally General
Analogy Criterion 266
9.7.3 Lower Bound Condition as a Minimally General
Analogy Criterion 268
9.8 Rule Generation and Analysis 270
9.9 Generalized Examples 270
9.10 Hypothesis Learning 271
9.11 Hands On: Rule and Hypotheses Learning 275
9.12 Explanation Generation Operations 279
9.12.1 Guiding Explanation Generation 279
9.12.2 Fixing Values 280
9.12.3 Explanations with Functions 280
9.12.4 Explanations with Comparisons 283
9.12.5 Hands On: Explanations with Functions and
Comparisons 285
9.13 Guidelines for Rule and Hypothesis Learning 285
9.14 Project Assignment 6 289
9.15 Review Questions 289

13:46:00,
Contents xi

10 Rule Reﬁnement . . . . . . . . . . . . . . . . . . . 294

10.1 Incremental Rule Refinement 294
10.1.1 The Rule Refinement Problem 294
10.1.2 Overview of the Rule Refinement Method 295
10.1.3 Rule Refinement with Positive Examples 296
10.1.3.1 Illustration of Rule Refinement with a
Positive Example 296
10.1.3.2 The Method of Rule Refinement with
a Positive Example 298
10.1.3.3 Summary of Rule Refinement with a
Positive Example 300
10.1.4 Rule Refinement with Negative Examples 300
10.1.4.1 Illustration of Rule Refinement with
Except-When Conditions 300
10.1.4.2 The Method of Rule Refinement with
Except-When Conditions 305
10.1.4.3 Illustration of Rule Refinement
through Condition Specialization 305
10.1.4.4 The Method of Rule Refinement
through Condition Specialization 307
10.1.4.5 Summary of Rule Refinement with a
Negative Example 308
10.2 Learning with an Evolving Ontology 309
10.2.1 The Rule Regeneration Problem 309
10.2.2 On-Demand Rule Regeneration 310
10.2.3 Illustration of the Rule Regeneration Method 312
10.2.4 The Rule Regeneration Method 316
10.3 Hypothesis Refinement 316
10.4 Characterization of Rule Learning and Refinement 317
10.5 Hands On: Rule Refinement 319
10.6 Guidelines for Rule Refinement 321
10.7 Project Assignment 7 322
10.8 Review Questions 322

11 Abstraction of Reasoning . . . . . . . . . . . . . . . 329

11.1 Statement Abstraction 329
11.2 Reasoning Tree Abstraction 331
11.3 Reasoning Tree Browsing 331
11.4 Hands On: Abstraction of Reasoning 331
11.5 Abstraction Guideline 334
11.6 Project Assignment 8 335
11.7 Review Questions 335

12 Disciple Agents . . . . . . . . . . . . . . . . . . . . 338

12.1 Introduction 338
12.2 Disciple-WA: Military Engineering Planning 338
12.2.1 The Workaround Planning Problem 338
12.2.2 Modeling the Workaround Planning Process 341
12.2.3 Ontology Design and Development 343

13:46:00,
xii Contents

12.2.4 Rule Learning 345

12.2.5 Experimental Results 346
12.3 Disciple-COA: Course of Action Critiquing 348
12.3.1 The Course of Action Critiquing Problem 348
12.3.2 Modeling the COA Critiquing Process 351
12.3.3 Ontology Design and Development 352
12.3.4 Training the Disciple-COA Agent 355
12.3.5 Experimental Results 360
12.4 Disciple-COG: Center of Gravity Analysis 364
12.4.1 The Center of Gravity Analysis Problem 364
12.4.2 Overview of the Use of Disciple-COG 367
12.4.3 Ontology Design and Development 376
12.4.4 Script Development for Scenario Elicitation 376
12.4.5 Agent Teaching and Learning 380
12.4.6 Experimental Results 383
12.5 Disciple-VPT: Multi-Agent Collaborative Planning 387
12.5.1 Introduction 387
12.5.2 The Architecture of Disciple-VPT 388
12.5.3 The Emergency Response Planning Problem 389
12.5.4 The Disciple-VE Learning Agent Shell 390
12.5.5 Hierarchical Task Network Planning 394
12.5.6 Guidelines for HTN Planning 396
12.5.7 Integration of Planning and Inference 400
12.5.8 Teaching Disciple-VE to Perform Inference Tasks 403
12.5.9 Teaching Disciple-VE to Perform Planning Tasks 409
12.5.9.1 Why Learning Planning Rules Is
Difﬁcult 409
12.5.9.2 Learning a Set of Correlated Planning
Rules 409
12.5.9.3 The Learning Problem and Method for
a Set of Correlated Planning Rules 413
12.5.9.4 Learning Correlated Planning Task
Reduction Rules 413
12.5.9.5 Learning Correlated Planning Task
Concretion Rules 414
12.5.9.6 Learning a Correlated Action
Concretion Rule 415
12.5.10 The Virtual Experts Library 416
12.5.11 Multidomain Collaborative Planning 420
12.5.12 Basic Virtual Planning Experts 421
12.5.13 Evaluation of Disciple-VPT 422
12.5.14 Final Remarks 422

13 Design Principles for Cognitive Assistants . . . . . . . 426

13.1 Learning-based Knowledge Engineering 426
13.2 Problem-Solving Paradigm for User–Agent
Collaboration 427
13.3 Multi-Agent and Multidomain Problem Solving 427
13.4 Knowledge Base Structuring for Knowledge Reuse 427
13.5 Integrated Teaching and Learning 428

13:46:00,
Contents xiii

13.6 Multistrategy Learning 428

13.7 Knowledge Adaptation 429
13.8 Mixed-Initiative Modeling, Learning, and Problem
Solving 429
13.9 Plausible Reasoning with Partially Learned Knowledge 430
13.10 User Tutoring in Problem Solving 430
13.11 Agent Architecture for Rapid Agent Development 430
13.12 Design Based on a Complete Agent Life Cycle 431

References 433
Appendixes 443
Summary: Knowledge Engineering Guidelines 443
Summary: Operations with Disciple-EBR 444
Summary: Hands-On Exercises 446
Index 447

13:46:00,
13:46:00,
Preface

BOOK PURPOSE

This is a book on knowledge engineering, the discipline concerned with the

development of intelligent agents that use knowledge and reasoning to
perform problem-solving and decision-making tasks. The book covers the
theory and practice of the main stages in the development of a knowledge-
based agent: understanding the application domain, modeling problem
solving in that domain, developing the ontology, learning the reasoning
rules, and testing the agent. However, it does this by focusing on a special
class of agents: cognitive assistants that learn complex problem-solving
expertise directly from human experts, support experts, and nonexperts in
problem solving and decision making and teach their problem-solving
expertise to students. These are learning agents that are taught by their
users in ways that are similar to how a student, an apprentice, or a new
collaborator is taught, through problem-solving examples and explanations
and by supervising and correcting their behavior. Because such agents
learn to replicate the problem-solving behavior of their users, we have
called them Disciple agents.
This book presents a signiﬁcant advancement in the theory and practice
of knowledge engineering, where many tasks are performed by a typical
computer user and a learning agent, with only limited support from a
knowledge engineer. To simplify further the development of the cognitive
assistants by typical users, we have focused on the development of cogni-
tive assistants for evidence-based reasoning. Evidence-based reasoning is
at the core of many problem-solving and decision-making tasks in a wide
variety of domains, including intelligence analysis, cybersecurity, law,
forensics, medicine, physics, chemistry, history, archaeology, education,
and many others. Nevertheless, the last part of the book presents Disciple
agents for applications that did not involve evidence-based reasoning.
Because knowledge engineering is a practical activity, it is best learned
by doing. Therefore, this book presents the theory and methodology of
developing cognitive assistants in conjunction with a practical tool,
Disciple-EBR, a learning agent shell for evidence-based reasoning (EBR).
Consequently, each chapter typically contains a theoretical part presenting
general concepts and methods, a methodological part with guidelines on
the application of the methods, and a practical part on the actual employ-
ment of these methods with Disciple-EBR. It also includes project assign-
ments and review questions.

13:47:03,
.001
xvi Preface

This book addresses issues of interest to a large spectrum of readers

from academia, research, and industry. We have used drafts of this book in
our computer science courses on knowledge engineering, expert systems,
and knowledge-based agents, at both undergraduate and graduate levels,
because it covers the theory and practice of the main stages in the devel-
opment of knowledge-based agents. We have also used some parts of the
book in introductory courses on artificial intelligence, and other parts in the
courses on knowledge acquisition and machine learning. These are all
examples of courses where this book could be used.
Researchers in knowledge engineering will find in this book an inte-
grated approach that advances the theory and practice in the field. We
believe that further research and development of this approach will enable
typical computer users to develop their own cognitive assistants without
any knowledge engineering assistance. Thus, non–computer scientists will
no longer be only users of generic programs developed by others (such as
word processors or Internet browsers), as they are today, but also agent
developers themselves. They will be able to train their personal assistants to
help them with their increasingly complex tasks in the knowledge society,
which should have a significant beneficial impact on their work and life.
Practitioners that develop various types of knowledge-based systems
will find in this book a detailed, yet intuitive, presentation of an agent
development methodology and tool, as well as several case studies of
developing intelligent agents that illustrate different types of agents that
are relevant to a wide variety of application domains. In fact, a comple-
mentary book, Intelligence Analysis as Discovery of Evidence, Hypotheses,
and Arguments: Connecting the Dots (Tecuci et al., 2016), presents the
practical application of Disciple-CD, an agent developed with Disciple-EBR
for intelligence analysis problems.

BOOK CONTENTS

Here is a route or map we will follow in the learning venture you will have
with the assistance of Disciple-EBR. Chapter 1 is a general introduction to
the topics that form the basis of this book. It starts with the problem
of understanding the world through evidence-based reasoning. It then
presents abductive reasoning, ﬁve different conceptions of probability
(enumerative, subjective Bayesian, Belief Functions, Baconian, and Fuzzy),
and how deductive, abductive, and inductive (probabilistic) reasoning are
used in evidence-based reasoning. After that, it introduces artiﬁcial intelli-
gence and intelligent agents, and the challenges of developing such agents
through conventional knowledge engineering. Afterward, it introduces the
development of agents through teaching and learning, which is the
approach presented in this book.
Chapter 2 is an overview of evidence-based reasoning, which is a focus
of this book. It starts with a discussion of the elements that make evidence-
based reasoning an astonishingly complex task. It then introduces a sys-
tematic approach that integrates abduction, deduction, and induction to
solve a typical evidence-based reasoning task, using intelligence analysis as
an example. Finally, it shows the application of the same approach to other

13:47:03,
.001
Preface xvii

evidence-based reasoning tasks in cybersecurity, geospatial intelligence,

and critical thinking education.
Chapter 3 is an overview of the main methodologies and tools for the
design and development of knowledge-based agents. It first presents the
conventional approach of developing such agents by a knowledge engineer
working with a subject matter expert. It then introduces different types of
agent shells, which are the typical tools for building agents, and discusses the
use of foundational and utility ontologies. The more advanced of these tools,
the learning agent shells, are at the basis of a new and more powerful
approach to agent design and development, an overview of which is pre-
sented in the second part of Chapter 3. This learning-based approach is
illustrated with the development of a cognitive assistant for assessing a
potential PhD advisor for a student, an example that is used in the following
chapters to present in detail each of the main agent development stages.
Chapter 4 presents the modeling of the problem-solving process
through analysis and synthesis, which is the most challenging task in the
development of a knowledge-based agent. This chapter starts with a more
general version of this process for any type of problems, which is used in
the Disciple agents presented in Chapter 12. After that, the chapter presents
an easier, customized version for evidence-based hypothesis analysis,
which is used in the chapters that follow. Chapter 4 also introduces an
ontology of evidence and presents the modeling of the believability assess-
ment for different types of evidence.
Chapters 5 and 6 present the representation of knowledge through
ontologies, as well as their design and development. They discuss the
representation of the objects from the agent’s application domain and their
relationships. These chapters also discuss the basic reasoning operations of
transitivity, inheritance, and matching. They cover several approaches to
concept elicitation, as well as a systematic approach to modeling-based
ontology specification. The chapters also address the complexity of ontol-
ogy maintenance.
Chapter 7 presents the agent’s reasoning based on ontologies and rules,
in the context of a production system architecture. It starts with defining
the representation of complex ontology-based concepts and the use of
these concepts in the reasoning rules. It defines the reduction and synthe-
ses rules in general, and the special case of these rules for evidence-based
reasoning. It then presents the process of problem solving through analysis
and synthesis, which is accomplished through rule and ontology matching.
This chapter also introduces the representation of and reasoning with
partially learned concepts, features, hypotheses, and rules.
Chapter 8 starts with a general introduction to machine learning and to
several learning strategies that are most relevant for knowledge engineer-
ing. It continues with the definition of the generalization and specializa-
tions of concepts through the use of inductive rules. After that, the chapter
defines the basic operations of minimal and maximal generalizations and
specialization of concepts, which are at the basis of rule learning and
refinement discussed in Chapters 9 and 10. The chapter ends with the
presentation of a formal definition of generalization.
Chapter 9 presents the mixed-initiative rule learning method that
enables an agent to learn a general rule from a specific example of a

13:47:03,
.001
xviii Preface

reasoning step. It starts with an overview of the integration of modeling,

learning, and problem solving and an illustration of this process. The
chapter then introduces the rule learning problem and method. After that,
it presents in detail the phases of rule learning, such as the mixed-initiative
understanding of the example, and its analogy-based generalizations,
which result in a minimal and a maximal generalization forming the
plausible version space of the rule. This chapter also discusses the learning
of rules involving functions and relational operators.
The refinement of a partially learned rule is presented in Chapter 10.
After introducing the incremental rule refinement problem, this chapter
presents the refinement of the rule based on positive examples, and then
the refinement based on negative examples. This may result in a very
complex rule characterized by a main applicability condition and several
“except-when” conditions, which capture the reasoning of a subject matter
expert. There are various refinement strategies, depending on the position
of the examples with respect to the bounds of the rule’s conditions and also
on the possible explanations of these examples, but in all cases, they
involve simple interactions with the subject matter expert. Changes in the
ontology require the learned rules to be updated, the corresponding
method being presented and illustrated in the second part of this chapter.
The chapter ends with a characterization of rule learning and refinement,
which enable a non–computer scientist to teach an agent.
The chapters dedicated to the individual phases of agent development
end with Chapter 11, which discusses the abstractions of individual hypoth-
eses and of the reasoning tree to facilitate its browsing, understanding, and
further development by the end-user.
As previously indicated, this book focuses on the development of agents
for evidence-based reasoning tasks. However, the presented theory and
methodology are also applicable to other types of agents. These agents have
been developed with learning agent shells representing previous imple-
mentations of the Disciple theory and methodology. Four of these agents
are presented in Chapter 12. Disciple-WA is an agent that uses expert
knowledge from military engineering manuals to develop alternative plans
of actions that a military unit can use to work around (WA) damages to a
transportation infrastructure, such as a damaged, destroyed, or mined
bridge, road, or tunnel. The goal is to find the quickest way for the military
unit to bypass the encountered obstacle. There were several cases where
the Disciple-WA agent generated better solutions than those of the human
expert who evaluated the developed systems, as well as cases where the
agent generated new solutions that this expert did not consider.
Disciple-COA is an agent trained by a military expert to critique courses
of action (COA) with respect to the principles of war and the tenets of Army
operations. It receives as input the description of a course of action and
returns a list of strengths and weaknesses of various levels, together with
their justifications. A remarkable feature of this agent is that it was judged
to exceed the performance of the subject matter experts that defined the
problems for its evaluation.
Disciple-COG is an agent that was trained to identify and assess the
center of gravity (COG) candidates of the opposing forces in a military
scenario. Correctly identifying the centers of gravity of the opposing forces

13:47:03,
.001
Preface xix

is of highest importance in any conﬂict, and Disciple-COG has been used

for many years in courses at the U.S. Army War College, as well as at other
military institutions, to teach students how to perform a strategic center-of-
gravity analysis of a scenario of interest.
The last agent described in Chapter 12 is Disciple-VPT (Virtual Planning
Team). Disciple-VPT consists of virtual planning experts that collaborate to
develop plans of actions requiring expertise from multiple domains. They
are assembled from an extensible library of such agents. The basic com-
ponent of Disciple-VPT is the Disciple-VE learning agent shell that can be
taught how to plan directly by a subject matter expert. Copies of the
Disciple-VE shells can be used by experts in different domains to rapidly
populate the library of virtual experts (VEs) of Disciple-VPT.
The learning-based approach to knowledge engineering presented in
this book illustrates the application of several design principles that are
useful in the development of cognitive assistants in general. Therefore, as a
conclusion, Chapter 13 summarizes these principles, which are illustrated
throughout this book.
The book also includes several appendices that summarize important
aspects from the chapters: the list of the knowledge-engineering guidelines
for each of the main stages of agent development, the list of operations of
Disciple-EBR, and the list of the hands-on exercises. Answers to selected
questions from each chapter are made available to the instructors.

BACKGROUND

The learning-based knowledge-engineering theory, methodology, and tools

presented in this book are the result of many years of research and experi-
mentation that produced increasingly more general and powerful versions.
During this evolution, one may identify four main stages. The first stage
corresponds to the PhD work of Gheorghe Tecuci presented in his thesis
“Disciple: A Theory, Methodology and System for Learning Expert Know-
ledge” (Thèse de Docteur en Science, Université de Paris-Sud, July 1988).
The main emphasis of that work was on rule learning. The developed
methods are among the first multistrategy approaches to learning that later
grew into the subfield of multistrategy machine learning (Tecuci, 1993;
Michalski and Tecuci, 1994). They also are among the first attempts to
integrate machine learning and knowledge acquisition (Tecuci et al., 1994;
Tecuci and Kodratoff, 1995). This work benefited from the collaboration of
Yves Kodratoff and the support of Mihai Drăgănescu. It was done with the
support of the Romanian Research Institute for Informatics, the Romanian
Academy, and the French National Center for Scientific Research.
The second stage in the evolution of the Disciple approach is presented
in the book Building Intelligent Agents: An Apprenticeship Multistrategy
Learning Theory, Methodology, Tool and Case Studies, published by Aca-
demic Press in 1998. It includes an improvement of Disciple’s multistrategy
learning methods and their extension with guided knowledge-elicitation
methods. It also includes ontology development tools and illustrations of
the application of the Disciple rule-learning approach to a variety of
domains, such as teaching of higher-order thinking skills in history and

xix

13:47:03,
.001
xx Preface

statistics, engineering design, and military simulation. This work beneﬁted

from the collaboration of several of Tecuci’s PhD students, particularly
Thomas Dybala, Michael Hieb, Harry Keeling, and Kathryn Wright, and it
was partially supported by George Mason University, the National Science
Foundation, and the Defense Advanced Research Projects Agency.
The third stage in the evolution of the Disciple approach is represented
by the Disciple agents described in Chapter 12 of this book. This represents
a significant extension of the Disciple approach, with methods for problem
solving through analysis and synthesis, modeling of the problem solving
process, ontology development, and scenario elicitation. At this stage,
Disciple has become a complete, end-to-end agent development method-
ology that has been applied to develop powerful agents, such as Disciple-
COG, used for a period of ten years in courses at the U.S. Army War College
and at other institutions. This work also benefited from the collaboration of
Tecuci’s students, first of all Mihai Boicu and Dorin Marcu, and also
Michael Bowman, Vu Le, Cristina Boicu, Bogdan Stanescu, and Marcel
Barbulescu. This work was partially supported by George Mason University,
the Air Force Office of Scientific Research, the Air Force Research Labora-
tory, the Defense Advanced Research Projects Agency, the National Science
Foundation, and others.
Finally, the latest stage in the evolution of the Disciple approach is
represented by the rest of this book. A main conceptual advancement over
the previous stage consists in its extension with a theory of evidence-based
reasoning, greatly facilitated by the collaboration of David A. Schum. This
resulted in a very powerful theory, methodology, and tool for the develop-
ment of agents for complex evidence-based reasoning applications, such as
intelligence analysis. Two of such agents are TIACRITIS (Teaching Intelli-
gence Analysts Critical Thinking Skills) and Disciple-CD (Disciple cognitive
assistant for Connecting the Dots). This work was partially supported by
George Mason University and by several agencies of the U.S. government,
including the Department of Defense.

13:47:03,
.001
Acknowledgments

We are very grateful to the many individuals who, in various ways,

supported our research, including Kelcy Allwein, Cindy Ayers, Murray
Burke, Douglas Campbell, William Cleckner, Jerry Comello, John Donelan,
Jim Donlon, Susan Durham, Keri Eustis, Michael Fletcher, Erin Gibbens,
Lloyd Grifﬁths, David Gunning, Ben Hamilton, Sharon Hamilton, Robert
Herklotz, Phillip Hwang, Eric Jones, Alex Kilpatrick, David Luginbuhl, Joan
McIntyre, Jean-Michel Pomrade, Michelle Quirk, Peter Rocci, William
Rzepka, Kimberly Urban, Joan Vallancewhitacre, and Ben Wible.
We also want to thank Heather Bergman, who invited us to write this
book for the Cambridge University Press, as well as to senior editor Lauren
Cowles, who is a great professional to work with.

xxi

13:47:10,
13:47:10,
About the Authors

Gheorghe Tecuci (PhD, Université de Paris-Sud, July 1988, and Polytechnic

Institute of Bucharest, December 1988) is Professor of Computer Science
and Director of the Learning Agents Center in the Volgenau School of
Engineering of George Mason University, Member of the Romanian Acad-
emy, and former Chair of Artificial Intelligence in the Center for Strategic
Leadership of the U.S. Army War College. He has published around two
hundred papers, including eleven books, with contributions to artificial
intelligence, knowledge engineering, cognitive assistants, machine learn-
ing, evidence-based reasoning, and intelligence analysis. He has received
the U.S. Army Outstanding Civilian Service Medal (for “groundbreaking
contributions to the application of artificial intelligence to center of gravity
determination”) and the Innovative Application Award from the American
Association for Artificial Intelligence.

Dorin Marcu (PhD, George Mason University, 2009) is Research Assistant

Professor, as well as Senior Software and Knowledge Engineer, in the
Learning Agents Center, Volgenau School of Engineering, George Mason
University. He has published more than forty papers, including ﬁve books,
with contributions to adaptive user interfaces, mixed-initiative human–
computer interaction, and cognitive assistants. He has received the Innova-
tive Application Award from the American Association for Artiﬁcial
Intelligence.

Mihai Boicu (PhD, George Mason University, 2002) is Associate Professor

of Information Sciences and Technology and Associate Director of the
Learning Agents Center, in the Volgenau School of Engineering of George
Mason University. He has published over ninety papers, including ﬁve
books, with contributions to problem solving and multistrategy learning
in dynamic and evolving representation spaces, mixed-initiative inter-
action, multi-agent systems architecture, collaboration and coordination,
abstraction-based reasoning, knowledge representation, and knowledge
acquisition. He has received the Innovative Application Award from the
American Association for Artiﬁcial Intelligence.

David A. Schum (PhD, Ohio State University, 1964) is Emeritus Professor of

Systems Engineering, Operations Research, and Law, as well as Chief
Scientist of the Learning Agents Center at George Mason University, and
Honorary Professor of Evidence Science at University College London.

xxiii

13:47:11,
xxiv About the Authors

He has followed a career-long interest in the study of the properties,

uses, discovery, and marshaling of evidence in probabilistic reasoning.
Dr. Schum has published more than one hundred papers in a variety of
journals and eight books, including The Evidential Foundations of Probabil-
istic Reasoning, Analysis of Evidence, Evidence and Inference for the Intelli-
gence Analyst, and Probabilistic Analysis of the Sacco and Vanzetti Evidence,
being recognized as one of the founding fathers of the Science of Evidence.

13:47:11,
1 Introduction

1.1 UNDERSTANDING THE WORLD THROUGH

EVIDENCE-BASED REASONING

We can try to understand the world in various ways, an obvious one being the employment of
empirical methods for gathering and analyzing various forms of evidence about phenomena,
events, and situations of interest to us. This will include work in all of the sciences, medicine,
law, intelligence analysis, history, political affairs, current events, and a variety of other contexts
too numerous to mention. In the sciences, this empirical work will involve both experimental
and nonexperimental methods. In some of these contexts, notably in the sciences, we are able
to devise mathematical and logical models that allow us to make inferences and predictions
about complex matters of interest to us. But in every case, our understanding rests on our
knowledge of the properties, uses, discovery, and marshaling of evidence. This is why we begin
this book with a careful consideration of reasoning based on evidence.

1.1.1 What Is Evidence?

You might think this question is unnecessary since everyone knows what evidence is. How-
ever, matters are not quite that simple, since the term evidence is not so easy to define and its
use often arouses controversy. One problem with the definition of evidence is that several
other terms are often used synonymously with it, when in fact there are distinctions to be made
among these terms that are not always apparent. Quite unnecessary controversy occurs since
some believe that the term evidence arises and has meaning only in the field of law.
Consulting a dictionary actually does not assist us much in defining the term. For
example, look at the Oxford English Dictionary under the term evidence and you will be led
in a circle; evidence is ultimately defined as being evidence.
A variety of terms are often used as synonyms for the term evidence: data, items of
information, facts, and knowledge. When examined carefully, there are some valid and
important distinctions to be made among these terms, as we will now discuss.

1.1.2 Evidence, Data, and Information

Consider the terms data and items of information.
Data are uninterpreted signals, raw observations, measurements, such as the number 6,
the color “red,” or the sequence of dots and lines “. . .–. . .”.

13:51:08,
.002
2 Chapter 1. Introduction

Information is data equipped with meaning provided by a certain context, such as

“6 am,” “red traffic light,” “red tomato,” or the “S O S” emergency alert.
Untold trillions of data and items of information exist that will almost certainly never
become evidence in most inferences. Here’s an item of information for you: Professor
Schum has a long and steep driveway in front of his house that makes shoveling snow off
of it very difficult in the winter. Can you think of any situation in which this item of
information could become evidence? About the only matter in which this information
could become interesting evidence involves the question: Why did Schum and his wife,
Anne, ever purchase this house in the first place? As we will discuss, items of information
become evidence only when their relevance is established regarding some matter to be
proved or disproved.

1.1.3 Evidence and Fact

Now consider the term fact; there are some real troubles here as far as its relation to the
term evidence is concerned. How many times have you heard someone say, “I want all
the facts before I draw a conclusion or make a decision,” or, “I want to know the facts in
this matter”? The ﬁrst question is easily answered: We will never have all the facts in any
matter of inferential interest. Answers to the second question require a bit of careful
thought. Here is an example of what is involved:
Suppose we are police investigators interviewing Bob, who is a witness of a car accident
that just happened at a particular intersection. Bob tells us that the Ford car did not stop at
the red light signal. Now we regard it as fact that Bob gave us this information: We all just
heard him give it to us. But whether the Ford car did not stop at the red light is only an
inference and is not a fact. This is precisely why we need to distinguish carefully between
an event and evidence for this event.
Here is what we have: Bob has given us evidence E*, saying that event E occurred,
where E is the event that the Ford car did not stop at the red light signal. Whether this
event E did occur or not is open to question and depends on Bob’s competence and
credibility. If we take it as fact that event E did occur, just because Bob said it did, we
would be overlooking the believability foundation for any inference we might make from
his evidence E*. Unfortunately, it so often happens that people regard the events reported
in evidence as being facts when they are not. Doing this suppresses all uncertainties we
may have about the source’s credibility and competence if the evidence is testimonial in
nature. We have exactly the same concerns about the credibility of tangible evidence. For
example, we have been given a tangible object or an image as evidence E* that we believe
reveals the occurrence of event E. But we must consider whether this object or image is
authentic and it is what we believe it to be. In any case, the events recorded in evidence
can be regarded as facts only if provided by perfectly credible sources, something we
almost never have. As another example, any information we ﬁnd on the Internet should be
considered as only a claim by its source rather than as fact, that is, as evidence about a
potential fact rather than a fact.

1.1.4 Evidence and Knowledge

Now consider the term knowledge and its relation with evidence. Here is where things get
interesting and difﬁcult. As you know, the ﬁeld of epistemology is the study of knowledge,

13:51:08,
.002
1.1. Evidence-based Reasoning 3

what we believe it may be, and how we obtain it. Two questions we would normally ask
regarding what Bob just told us are as follows:

Does Bob really know what he just told us, that the Ford car did not stop at the red light
signal?
Do we ourselves then also know, based on Bob’s testimony, that the Ford car did not
stop at the red light signal?

Let’s consider the ﬁrst question. For more than two millennia, some very learned people
have troubled over the question: What do we mean when we say that person A knows that
event B occurred? To apply this question to our source Bob, let’s make an assumption that
will simplify our answering this question. Let’s assume that Bob is a competent observer in
this matter. Suppose we have evidence that Bob was actually himself at the intersection
when the accident happened. This is a major element of Bob’s competence. Bob’s
credibility depends on different matters, as we will see.
Here is what a standard or conventional account says about whether Bob knows
that the car did not stop at the red light signal. First, here is a general statement of the
standard account of knowledge: Knowledge is justiﬁed true belief. Person knows that
event B occurred if:

Event B did occur [True];

got nondefective evidence that B occurred [Justiﬁed]; and
believed this evidence [Belief].

This standard analysis ﬁrst says that event B must have occurred for to have knowledge
of its occurrence. This is what makes ’s belief true. If B did not occur, then could not
know that it occurred. Second, ’s getting nondefective evidence that B occurred is
actually where ’s competence arises. could not have gotten any evidence, defective
or nondefective, if was not where B could have occurred. Then, believed the evidence
received about the occurrence of event B, and was justiﬁed in having this belief by
obtaining nondefective evidence of B’s occurrence.
So, in the case involving Bob’s evidence, Bob knows that the Ford car did not stop at the
red light signal if:

The car did not stop at the red light signal,

Bob got nondefective evidence that the car did not stop at the red signal, and
Bob believed this evidence.

If all of these three things are true, we can state on this standard analysis that Bob knows
that the Ford car did not stop at the red light signal.
Before we proceed, we must acknowledge that this standard analysis has been very
controversial in fairly recent years and some philosophers claim to have found alleged
paradoxes and counterexamples associated with it. Other philosophers dispute these claims.
Most of the controversy here concerns the justification condition: What does it mean to say
that A is justified in believing that B occurred? In any case, we have found this standard
analysis very useful as a heuristic in our analyses of the credibility of testimonial evidence.
But now we have several matters to consider in answering the second question: Do we
ourselves also know, based on Bob’s testimony, that the Ford car did not stop at the red
light signal? The first and most obvious fact is that we do not know the extent to which any
of the three events just described in the standard analysis are true. We cannot get inside

13:51:08,
.002
4 Chapter 1. Introduction

Bob’s head to obtain necessary answers about these events. Starting at the bottom, we do
not know for sure that Bob believes what he just told us about the Ford car not stopping at
the red light signal. This is a matter of Bob’s veracity or truthfulness. We would not say that
Bob is being truthful if he told us something he did not believe.
Second, we do not know what sensory evidence Bob obtained on which to base his
belief and whether he based his belief at all on this evidence. Bob might have believed that
the Ford car did not stop at the red light signal either because he expected or desired this
to be true. This involves Bob’s objectivity as an observer. We would not say that Bob was
objective in this observation if he did not base his belief on the sensory evidence he
obtained in his observation.
Finally, even if we believe that Bob was an objective observer who based his belief
about the accident on sensory evidence, we do not know how good this evidence was.
Here we are obliged to consider Bob’s sensory sensitivities or accuracy in the conditions
under which Bob made his observations. Here we consider such obvious things as Bob’s
visual acuity. But there are many other considerations, such as, “Did Bob only get a
ﬂeeting look at the accident when it happened?” “Is Bob color-blind?“ “Did he make this
observation during a storm?” and, “What time of day did he make this observation?” For a
variety of such reasons, Bob might simply have been mistaken in his observation: The light
signal was not red when the Ford car entered the intersection.
So, what it comes down to is that the extent of our knowledge about whether the Ford
car did not stop at the red light signal, based on Bob’s evidence, depends on these three
attributes of Bob’s credibility: his veracity, objectivity, and observational sensitivity. We
will have much more to say about assessing the credibility of sources of evidence, and how
Disciple-EBR can assist you in this difﬁcult process, in Section 4.7 of this book.
Now, we return to our role as police investigators. Based on evidence we have about
Bob’s competence and credibility, suppose we believe that the event he reports did occur;
we believe that “E: The Ford car did not stop at the red light signal,” did occur. Now we
face the question: So what? Why is knowledge of event E of importance to us? Stated more
precisely: How is event E relevant in further inferences we must make? In our investigation
so far, we have other evidence besides Bob’s testimony. In particular, we observe a Toyota
car that has smashed into a light pole at this intersection, injuring the driver of the Toyota,
who was immediately taken to a hospital. In our minds, we form the tentative chain of
reasoning from Figure 1.1.
This sequence of events, E ➔ F ➔ G ➔ H, is a relevance argument or chain of reasoning
whose links represent sources of doubt interposed between the evidence E* and the
hypothesis H. An important thing to note is that some or all of these events may not be
true. Reducing our doubts or uncertainties regarding any of these events requires a variety

H: The driver of the Ford car bears the responsibility for the accident.

G: The driver of the Toyota swerved to avoid the Ford car and smashed into a light pole.

F: The driver of the Toyota car, having a green light at this intersection, saw the Ford car running the red light.

E: The Ford car did not stop at the red light signal at this intersection.

E*: Bob’s testimony that the Ford car did not stop at the red light signal at this intersection.

Figure 1.1. Tentative chain of reasoning.

13:51:08,
.002
1.2. Abductive Reasoning 5

of additional evidence. The extent of our knowledge about the relative probabilities of our
ﬁnal hypothesis depends on the believability of our evidence and on the defensibility
and strength of our relevance arguments, as discussed in Section 2.2. The whole point here
is that the relation between evidence and knowledge is not a simple one at all.

1.1.5 Ubiquity of Evidence

Finally, we must consider the controversy over the use of the term evidence instead of
the other terms we just examined. The mistake made by some people is to consider that
evidence concerns only objects, testimony, or other items introduced in a court trial. This
controversy and confusion has been recognized by eminent evidence scholars in the ﬁeld
of law. For example, in his marvelous book Evidence, Proof, and Facts: A Book of Sources,
Professor Peter Murphy (2003, p. 1) notes the curious fact that the term evidence is so
commonly associated only with the ﬁeld of law:

The word “evidence” is associated more often with lawyers and judicial trials
than with any other cross-section of society or form of activity. . . . In its
simplest sense, evidence may be deﬁned as any factual datum which in some
manner assists in drawing conclusions, either favorable or unfavorable, to
some hypothesis whose proof or refutation is being attempted.

Murphy notes that this term is appropriate in any field in which conclusions are reached
from any relevant datum. Thus, physicians, scientists of any ilk, historians, and persons of
any other conceivable discipline, as well as ordinary persons, use evidence every day in
order to draw conclusions about matters of interest to them.
We believe there is a very good reason why many persons are so often tempted to
associate the term evidence only with the field of law. It happens that the Anglo-American
system of laws has provided us with by far the richest legacy of experience and scholarship
on evidence of any field known to us. This legacy has arisen as a result of the development
of the adversarial system for settling disputes and the gradual emergence of the jury
system, in which members of the jury deliberate on evidence provided by external
witnesses. This legacy has now been accumulating over at least the past six hundred years
(Anderson et al., 2005).
Evidence-based reasoning involves abductive, deductive, and inductive (probabilistic)
reasoning. The following sections briefly introduce them.

1.2 ABDUCTIVE REASONING

1.2.1 From Aristotle to Peirce

Throughout history, some of the greatest minds have tried to understand the world
through a process of discovery and testing of hypotheses based on evidence. We have
found the metaphor of an arch of knowledge to be very useful in summarizing the many
ideas expressed over the centuries concerning the generation of new thoughts and new
evidence. This metaphor comes from the work of the philosopher David Oldroyd in a
valuable work having this metaphor as its title (Oldroyd, 1986). Figure 1.2 shows this
metaphor applied in the context of science. Based upon existing records, it seems that

13:51:08,
.002
6 Chapter 1. Introduction

Possible Hypotheses
or Explanations

Hypothesis
Generation
Hypothesis

Testing
? Deduction

Observations of New Observable

Events in Nature Phenomena

Figure 1.2. The “arch of knowledge” in science.

Aristotle (384 bc–322 bc) was the first to puzzle about the generation or discovery of new
ideas in science. From sensory observations, we generate possible explanations, in the
form of hypotheses, for these observations. It was never clear from Aristotle's work what
label should be placed on the upward, or discovery-related, arm of the arch in Figure 1.2.
By some accounts, Aristotle’s description of this act of generating hypotheses is called
"intuitive induction" (Cohen and Nagel, 1934; Kneale, 1949). The question mark on the
upward arm of the arch in Figure 1.2 simply indicates that there is still argument about
what this discovery-related arm should be called. By most accounts, the downward arm of
the arch concerns the deduction of new observable phenomena, assuming the truth of a
generated hypothesis (Schum, 2001b).
Over the millennia since Aristotle, many people have tried to give an account of the
process of discovering hypotheses and how this process differs from ones in which existing
hypotheses are justified. Galileo Galilei (1564–1642) thought that we “reason backward”
inductively to imagine causes (hypotheses) from observed events, and we reason deduct-
ively to test the hypotheses. A similar view was held by Isaac Newton (1642–1727), John
Locke (1632–1704), and William Whewell (1794–1866). Charles S. Peirce (1839–1914) was
the first to suggest that new ideas or hypotheses are generated through a different form of
reasoning, which he called abduction and associated with imaginative reasoning (Peirce,
1898; 1901). His views are very similar to those of Sherlock Holmes, the famous fictional
character of Conan Doyle (Schum, 1999).

1.2.2 Peirce and Sherlock Holmes on Abductive Reasoning

Until the time of Peirce, most persons interested in discovery and investigation supposed
that the discovery-related arms of the arch in Figure 1.2 involved some form of inductive
reasoning that proceeds from particulars, in the form of observations, to generalities,
in the form of hypotheses. But inductive reasoning is commonly associated with the
process of justifying or trying to prove existing hypotheses based on evidence. The
question remains: Where did these hypotheses come from? Pondering such matters,
Peirce relied on a ﬁgure of reasoning he found in the works of Aristotle. The reasoning
proceeds as follows:

If H were true, then E, F, and G would follow as a matter of course.

But E, F, and G have been observed.
Therefore, we have reason to believe that H might possibly be true.

As an illustration, let us assume that we observe E*: “Smoke in the East building” (E* being
evidence that event E occurred).

13:51:08,
.002
1.2. Abductive Reasoning 7

Based on our prior knowledge of contexts in which things like E: “Smoke in a building”
have occurred, we say: “Whenever something like H: ‘There is ﬁre in a building’ has
occurred, then something like E: ‘Smoke in the building’ has also occurred.” Thus, there is
reason to suspect that H: “There is ﬁre in the East building” may explain the occurrence of
the clue E*: “Smoke in the East building.” In other words, the clue E* points to H as a
possible explanation for its occurrence.
To summarize:

We observe smoke in the East building.

Fire causes smoke.
We hypothesize that there is a ﬁre in the East building.

Peirce was unsure about what to call this form of reasoning. At various points in his work,
he called it “abduction,” “retroduction,” and even just “hypothesis” (Pierce, 1898; 1901).
The essential interpretation Peirce placed on the concept of abduction is illustrated in
Figure 1.3. He often used as a basis for his discussions of abduction the observation of an
anomaly in science. Let us suppose that we already have a collection of prior evidence in
some investigation and an existing collection of hypotheses H1, H2, . . . , Hn. To varying
degrees, these n hypotheses explain the evidence we have so far. But now we make an
observation E* that is embarrassing in the following way: We take E* seriously, but we cannot
explain it by any of the hypotheses we have generated so far. In other words, E* is an
anomaly. Vexed by this anomaly, we try to find an explanation for it. In some cases, often
much later when we are occupied by other things, we experience a “flash of insight” in which
it occurs to us that a new hypothesis Hn+1 could explain this anomaly E*. It is these “flashes of
insight” that Peirce associated with abduction. Asked at this moment to say exactly how Hn+1
explains E*, we may be unable to do so. However, further thought may produce a chain of
reasoning that plausibly connects Hn+1 and E*. The reasoning might go as follows:

I have evidence E* that event E happened.

If E did happen, then F might be true.
If F happened, then G might be true.
And if G happened, then Hn+1 might be true.

It is possible, of course, that the chain of reasoning might have started at the top with Hn+1
and ended at E*. This is why we have shown no direction on the links between E* and Hn+1
in Figure 1.3.

Existing hypotheses New hypothesis New hypothesis

{H1, H2, … , Hn} Hn+1 Hn+1

Reasoning G Later thought

stages not suggests a
immediately F plausible chain
Insight ! obvious of reasoning
E

Prior evidence bearing E: Evidence not E

on these hypotheses explainable by existing
hypotheses (an anomaly)
Figure 1.3. Peirce’s interpretation of abductive reasoning.

13:51:08,
.002
8 Chapter 1. Introduction

But our discovery-related activities are hardly over just because we have explained
this anomaly. Our new hypothesis Hn+1 would not be very appealing if it explained
only anomaly E*. Figure 1.4 shows the next steps in our use of this new hypothesis.
We first inquire about the extent to which it explains the prior evidence we collected
before we observed E*. An important test of the suitability of the new hypothesis Hn+1
involves asking how well this new hypothesis explains other observations we have
taken seriously. This new hypothesis would be especially valuable if it explains our
prior evidence better than any of our previously generated hypotheses. But there is
one other most important test of the adequacy of a new hypothesis Hn+1: How well does
this new hypothesis suggest new potentially observable evidence that our previous
hypotheses did not suggest? If Hn+1 would be true, then B, I, and K would also be true;
and if B would be true, then C would be true. Now if C would be true, then we would
need to observe D.
In the illustrations Peirce used, which are shown in Figures 1.3 and 1.4, we entered
the process of discovery at an intermediate point when we already had existing hypoth-
eses and evidence. In other contexts, we must of course consider abductive reasoning
from the beginning of an episode of fact investigation when we have no hypotheses and
no evidence bearing on them. Based on our initial observations, by this process of
abductive or insightful reasoning, we may generate initial guesses or hypotheses to
explain even the very first observations we make. Such hypotheses may of course be
vague, imprecise, or undifferentiated. Further observations and evidence we collect may
allow us to make an initial hypothesis more precise and may of course suggest entirely
new hypotheses.
It happens that at the very same time Peirce was writing about abductive reasoning,
insight, and discovery, across the Atlantic, Arthur Conan Doyle was exercising his
fictional character Sherlock Holmes in many mystery stories. At several points in the
Sherlock Holmes stories, Holmes describes to his colleague, Dr. Watson, his inferential
strategies during investigation. These strategies seem almost identical to the concept of
abductive reasoning described by Peirce. Holmes did not, of course, describe his investi-
gative reasoning as abductive. Instead, he said his reasoning was “backward," moving
from his observations to possible explanations for them. A very informative and enjoy-
able collection of papers on the connection between Peirce and Sherlock Holmes
appears in the work of Umberto Eco and Thomas Sebeok (1983). In spite of the similarity
of Peirce's and Holmes's (Conan Doyle's) views of discovery-related reasoning, there is
no evidence that Peirce and Conan Doyle ever shared ideas on the subject.

Hn+1 Hn+1 Hn+1

G
B I K
F
C J L
E How well does
D Observable
Hn+1 explain
E* prior evidence How well does Hn+1 suggest new
taken seriously kinds of observable evidence?
Figure 1.4. Putting an abduced hypothesis to work.

13:51:08,
.002
1.3. Probabilistic Reasoning 9

1.3 PROBABILISTIC REASONING

A major trouble we all face in thinking about probability and uncertainty concerns the fact
that the necessity for probability calculations, estimations, or judgments arises in different
situations. In addition, there are many different attributes of our judgments that we would
like to capture in assessments of uncertainty we are obliged to make. There are situations
in which you can estimate probabilities of interest by counting things. But there are many
other situations in which we have uncertainty but will have nothing to count. These
situations involve events that are singular, unique, or one of a kind. In the following, we
will brieﬂy discuss several alternative views of probability, starting with two views of
probability that involve processes in which we can obtain probabilities or estimates of
them by enumerative or counting processes.

1.3.1 Enumerative Probabilities: Obtained by Counting

1.3.1.1 Aleatory Probability
According to Laplace (1814, p. cv), “probability theory is nothing but common sense
reduced to calculation.” There are two conceptions of probability that involve counting
operations. The ﬁrst is termed aleatory probability. This term has its roots in the Latin term
alea, meaning chance, game of chance, or devices such as dice involved in games of
chance. Games of chance have two important ground rules:

There is a ﬁnite number n(S) of possible outcomes

All outcomes in S are assumed to have equal probability

For example, in a game involving a pair of fair six-sided dice, where we roll and add the
two numbers showing up, there are thirty-six ways in which the numbers showing up will
have sums between two and twelve, inclusive. So, in this case, n(S) = 36. Suppose you wish
to determine the probability that you will roll a seven on a single throw of these dice. There
are exactly six ways in which this can happen. If E = “the sum of the numbers is seven,”
then n(E) = 6. The probability of E, P(E), is simply determined by dividing n(E) by n(S),
which in this example is P(E) = 6/36 = 1/6. So, aleatory probabilities are always determined
by dividing n(E) by n(S), whatever E and S are, as long as E is a subset of S.

1.3.1.2 Relative Frequency and Statistics

Another way of assessing probabilities involves the many situations in which aleatory
ground rules will not apply, but empirical methods are at hand to estimate probabilities.
These situations arise when we have replicable or repeatable processes in which we can
count the number of times events have occurred in the past. Suppose that, employing a
defensible method for gathering information about the number of times event E has
occurred, we determine the relative frequency of an occurrence of E by counting the
number of times E has occurred, n(E), and then dividing this number by N, where N is
the number of observations we have made, or the sample size we have taken. In this case,
the relative frequency of E, f(E), equals n(E)/N. You recognize that this is a statistical
process that can be performed in many situations, provided that we assume processes that
are replicable or repeatable. It is true, of course, that a relative frequency f(E) is just an

13:51:08,
.002
10 Chapter 1. Introduction

estimate of the true probability of E, P(E). The reason, of course, is that the number N of
observations we have made is always less than the total number of observations that could
be made. In some cases, there may be an infinite number of possible observations. If you
have had a course in probability theory, you will remember that there are several formal
statements, called the laws of large numbers, for showing how f(E) approaches P(E) when
N is made larger and larger.
Probability theory presents an interesting paradox. It has a very long history but a very
short past. There is abundant evidence that people as far back as Paleolithic times used
objects resembling dice either for gambling or, more likely, to foretell the future (David,
1962). But attempts to calculate probabilities date back only to the 1600s, and the first
attempt to develop a theory of mathematical probability dates back only to 1933 in the
work of A. N. Kolmogorov (1933). Kolmogorov was the first to put probability on an
axiomatic basis. The three basic axioms he proposed are the following ones:

Axiom 1: For any event E, P(E) 0.

Axiom 2: If an event is sure or certain to occur, which we label S, P(S) = 1.0.
Axiom 3: If two events, E and F, cannot occur together, or are mutually exclusive, the
probability that one or the other of these events occurring is the sum of their separate
probabilities. In symbols, P(E or F) = P(E) + P(F).

All Axiom 1 says is that probabilities are never negative. Axioms 1 and 2, taken together,
mean that probabilities are numbers between 0 and 1. An event having 0 probability is
commonly called an “impossible event.” Axiom 3 is called the additivity axiom, and it
holds for any number of mutually exclusive events.
Certain transformations of Kolmogorov’s probabilities are entirely permissible and are
often used. One common form involves odds. The odds of event E occurring to its not
occurring, which we label Odds(E, ¬E), is determined by Odds(E, ¬E) = P(E)/(1 – P(E)). For
any two mutually exclusive events E and F, the odds of E to F, Odds(E, F), are given by Odds
(E, F) = P(E)/P(F). Numerical odds scales range from zero to an unlimited upper value.
What is very interesting, but not always recognized, is that Kolmogorov had only
enumerative probability in mind when he settled on the preceding three axioms. He
makes this clear in his 1933 book and in his later writings (Kolmogorov, 1969). It is easily
shown that both aleatory probabilities and relative frequencies obey these three axioms.
But Kolmogorov went an important step further in defining conditional probabilities that
are necessary to show how the probability of an event may change as we learn new
information. He defined the probability of event E, given or conditional upon some other
event F, as P(E given F) = P(E and F)/P(F), assuming that P(F) is not zero. P(E given F) is
also written as P(E|F). He chose this particular definition since conditional probabilities, so
defined, will also obey the three axioms just mentioned. In other words, we do not need
any new axioms for conditional probabilities.
Now comes a very important concept you may have heard about. It is called Bayes’
rule and results directly from applying the definition of the conditional probability. From
P(E* and H) = P(H and E*), you obtain P(E*|H) P(H) = P(H|E*)P(E*). This can then be
written as shown in Figure 1.5.
This rule is named after the English clergyman, the Reverend Thomas Bayes (1702–1761),
who first saw the essentials of a rule for revising probabilities of hypotheses, based on new
evidence (Dale, 2003). He had written a paper describing his derivation and use of this rule
but he never published it; this paper was found in his desk after he died in 1761 by Richard

13:51:08,
.002
1.3. Probabilistic Reasoning 11

Probability of given H Prior probability of hypothesis H

(Likelihood) (Prior)

Probability of H given
(Posterior)

Prior probability of e
(Normalizer)
Figure 1.5. The Bayes’ rule.

Price, the executor of Bayes’ will. Price realized the importance of Bayes’ paper and
recommended it for publication in the Transactions of the Royal Society, in which it
appeared in 1763. He rightly viewed Bayes’ rule as the first canon or rule for inductive or
probabilistic reasoning. Bayes’ rule follows directly from Kolmogorov’s three axioms and his
definition of a conditional probability, and is entirely uncontroversial as far as its derivation
is concerned. But this rule has always been a source of controversy on other grounds. The
reason is that it requires us to say how probable a hypothesis is before we have gathered
evidence that will possibly allow us to revise this probability. In short, we need prior
probabilities on hypotheses in order to revise them, when they become posterior
probabilities. Persons wedded to enumerative conceptions of probability say we can never
have prior probabilities of hypotheses, since in advance of data collection we have nothing
to count. Statisticians are still divided today about whether it makes sense to use Bayes’ rule
in statistical inferences. Some statisticians argue that initial prior probabilities could be
assessed only subjectively and that any subjective assessments have no place in any area
that calls itself scientific. Bayes’ rule says that if we are to talk about probability revisions in
our beliefs, based on evidence, we have to say where these beliefs were before we obtained
the evidence.
The Bayes’ rule is useful in practice because there are many cases where we have good
probability estimates for three of the four probabilities involved, and we can therefore
compute the fourth one (see, e.g., Question 1.9).
It is time for us to consider views of probability in situations where we will have nothing
to count, either a priori or anywhere else: the Subjective Bayesian view, Belief Functions,
Baconian probabilities, and Fuzzy probabilities. We provide a look at only the essentials of
these four views, focusing on what each one has to tell us about what the force or weight of
evidence on some hypothesis means. More extensive comparisons of these four views
appear in (Schum, 1994 [2001a], pp. 200–269).

1.3.2 Subjective Bayesian View of Probability

We refer to our ﬁrst nonenumerative view as an epistemic view, since it assumes that
probabilities in any case are based on some kind of knowledge, whatever form it may take.
In short, probabilities are the result of informed judgments. Many statisticians now favor the
use of Bayes’ rule for combining subjective assessments of all the prior and likelihood
ingredients of Bayes’ rule. But what these persons require is that these assessments be entirely
consistent with Kolmogorov’s three axioms and his deﬁnition of conditional probabilities we

13:51:08,
.002
12 Chapter 1. Introduction

previously noted. Since Bayes’ rule rests on these axioms and deﬁnition, we must adhere to
them in order to say that our assessment process is coherent or consistent.
As we will show, the likelihoods and their ratios are the ingredients of Bayes’ rule that
concern the inferential force of evidence. Suppose we have two hypotheses, H and ¬H,
and a single item of evidence, E , saying that event E occurred. What we are interested in
determining are the posterior probabilities: PðHjE Þ and P ð¬HjE Þ. Using the Bayes’ rule
from Figure 1.5, we can express these posterior probabilities as:

PðE jH ÞPðH Þ
PðHjE Þ ¼
P ðE Þ

PðE j¬H ÞPð¬H Þ

P ð¬HjE Þ ¼
P ðE Þ

The next step is to divide PðHjE Þ by Pð¬HjE Þ, which will produce three ratios; in the
process the term PðE Þ will drop out. Here are the three ratios that result:

PðHjE Þ PðH Þ P ðE jH Þ
¼
Pð¬HjE Þ Pð¬H Þ P ðE j¬H Þ
P ðHjE Þ
The left-hand ratio P ð¬HjE Þ is called the posterior odds of H to ¬H, given evidence E . In
symbols, we can express this ratio as OddsðH : ¬HjE Þ. The ﬁrst ratio on the right, PPðð¬H
HÞ
Þ, is
called the prior odds of H to ¬H. In symbols, we can express this ratio as OddsðH : ¬H Þ.

The remaining ratio on the right, PPððEE j¬H
jH Þ
Þ, is called the likelihood ratio for evidence E ; we
give this ratio the symbol LE . In terms of these three ratios, Bayes' rule applied to this
situation can be written simply as follows:

OddsðH : ¬HjE Þ ¼ OddsðH : ¬H ÞLE

This simple version of Bayes' rule is called the odds-likelihood ratio form. It is also called,
somewhat unkindly, “idiot’s Bayes.” If we divide both sides of this equation by the prior
odds, OddsðH : ¬H Þ, we observe that the likelihood ratio LE is simply the ratio of posterior
odds to prior odds of H to ¬H. This likelihood ratio shows us how much, and in what
direction (toward H or ¬H), our evidence E has caused us to change our beliefs from
what they were before we obtained evidence E . In short, likelihood ratios grade the force
of evidence in Bayesian analyses.
Here is an example of how likelihoods and their ratios provide a method for grading the
force of an item of evidence on some hypothesis. This is an example of a situation
involving a singular evidence item where we have nothing to count. Suppose we are
interested in determining whether or not the Green state is supplying parts necessary for
the construction of shaped explosive devices to a certain insurgent militia group in the
neighboring Orange state. Thus we are considering two hypotheses:

H: “The Greens are supplying parts necessary for the construction of shaped explosive
devices.”
¬H: “The Greens are not supplying parts necessary for the construction of shaped
explosive devices.”

Suppose we believe, before we have any evidence, that the prior probability of H is
PðH Þ ¼ 0:20. Because we must obey the rules for enumerative probabilities, we must also

13:51:08,
.002
1.3. Probabilistic Reasoning 13

say that Pð¬H Þ ¼ 0:80. This follows from the third axiom we discussed in Section 1.3.1. So,
our prior odds on H relative to ¬H have a value OddsðH : ¬H Þ ¼ PPðð¬H
HÞ 0:20 1
Þ ¼ 0:80 ¼ 4.
Suppose now that we receive the item of evidence E : A member of the Green’s
military was captured less than one kilometer away from a location in Orange at which
parts necessary for the construction of these shaped explosives were found.
We ask ourselves how likely is this evidence E if H were true, and how likely is this
evidence E if H were not true. Suppose we say that PðE jH Þ ¼ 0:80 and P ðE j¬H Þ ¼ 0:10.
We are saying that this evidence is eight times more likely if H were true than if H were not

true. So, our likelihood ratio for evidence E is LE ¼ PPððEE j¬H
jH Þ 0:8
Þ ¼ 0:1 ¼ 8.
We now have all the ingredients necessary in Bayes' rule to determine the posterior
odds and posterior probability of hypothesis H:

1
OddsðH : ¬HjE Þ ¼ OddsðH : ¬H ÞLE ¼ 8 ¼ 2:
4

This means that we now believe the posterior odds favoring H over ¬H are two to one. But
we started by believing that the prior odds of H to ¬H were one in four, so the evidence E
changed our belief by a factor of 8.
We could just as easily express this inference in terms of probabilities. Our prior
probability of H was PðH Þ ¼ 0:20. But our posterior probability PðHjE Þ ¼ 1þ2
2
¼ 23 ¼ 0:67.

So, in terms of probabilities, evidence E caused us to increase the probability of H by 0.47.
So, using this subjective Bayesian approach, we would be entitled to express the extent
of our uncertainty in an analysis using numerical probability assessments provided only
that they conform to Kolmogorov’s axioms.

1.3.3 Belief Functions

Both the enumerative and the subjective Bayesian interpretations of probability conform
to Kolmogorov’s three axioms. We asserted that these axioms rest on the investigation of
replicable or repeatable processes such as statistical analysis of the results obtained in a
sample of observations. But there are many reasons for wondering whether these three
axioms remain self-evident concerning subjective probability judgments we all make from
time to time involving unique events for which no enumerative process can be involved. In
a very influential work, the probabilist Professor Glenn Shafer pointed to an array of
difficulties associated with Axiom 3 concerning the additivity of enumerative probabilities
for mutually exclusive events (Shafer, 1976). In particular, Shafer asserts that this axiom
places various constraints on our judgments or beliefs about uncertainty that we may
not be willing to accept. Here it is only necessary to mention two of the difficulties
Shafer mentions:

Indecisions we routinely face concerning ambiguities in our evidence

Instances in which we encounter what historically has been called “pure” evidence

In so many instances, we may not be sure what evidence is telling us, and so we wish to be
able to withhold a portion of our beliefs and not commit it to any particular hypothesis or
possible conclusion. A very important element in what Shafer terms Belief Functions is that
the weight of evidence means the degree of support evidence provides to hypotheses we are
considering. Shafer allows that we can grade the degree of support s on a 0 to 1 scale

13:51:08,
.002
14 Chapter 1. Introduction

similar to the scale for Kolmogorov probabilities; but we can do things with support
assignment s that the Kolmogorov additivity Axiom 3 does not allow.
To illustrate, suppose we revisit the issue discussed in the previous section about
whether or not the Green state is supplying parts necessary for the construction of shaped
explosive devices to a certain insurgent militia group in the neighboring Orange state. At
some stage, we are required to state our beliefs about the extent to which the evidence
supports H or ¬H. Here is our assessment:

{H} {¬H} {H or ¬H}

s 0.5 0.3 0.2

What does this support assignment mean? We are saying that we believe the evidence
supports H exactly to degree s = 0.5, and that this evidence also supports ¬H exactly to
degree s = 0.3. But there is something about this evidence that makes us unsure about
whether it supports H or ¬H. So, we have left the balance of our s assignment, s = 0.2,
uncommitted among H or ¬H. In other words, we have withheld a portion of our beliefs
because we are not sure what some element of our evidence is telling us.
If we were required to obey Kolmogorov Axiom 3, we would not be allowed to be
indecisive in any way in stating our beliefs. Here is what our support assignment would
have to look like:

{H} {¬H}

s a 1a

In this case, we would be required to say that the evidence supports H to degree s = a,
and supports ¬H to degree s = (1 – a) in agreement with Axiom 3 since H and ¬H are
mutually exclusive and exhaustive. In short, Kolmogorov Axiom 3 does not permit us any
indecision in stating our beliefs; we must commit all of it to H and to ¬H. This, we believe,
would not be a faithful or accurate account of our beliefs.
But Shafer’s Belief Function approach allows us to cope with another difﬁculty
associated with Kolmogorov’s axioms. For centuries, it has been recognized that a
distinction is necessary between what has been termed mixed evidence and pure evi-
dence. Mixed evidence has some degree of probability under every hypothesis we are
considering. But pure evidence may support one hypothesis but say nothing at all about
other hypotheses. In other words, we may encounter evidence that we believe offers zero
support for some hypothesis. Here is another example involving our Green-Orange
situation. Suppose we encounter an item of evidence we believe supports H to a degree,
but we believe offers no support at all for ¬H. Here is our support assignment s for this
evidence:

{H} {¬H} {H or ¬H}

s 0.5 0 0.5

In this situation, we are saying that the evidence supports H to degree s = 0.5, but offers
no support at all to ¬H. The rest of our support we leave uncommitted between H and ¬H.
But now we have to examine what s = 0 for ¬H means; does it mean that ¬H could not be
supported by further evidence? The answer is no, and the reason why it is no allows us to

13:51:08,
.002
1.3. Probabilistic Reasoning 15

compare what ordinary probabilities mean in comparison with what support s means.
This comparison is shown in Figure 1.6.
The (a) scale in Figure 1.6, for conventional or Kolmogorov probabilities, has a lower
boundary with a meaning quite different from the meaning of this lower boundary on
Shafer’s support scale shown in (b). The value 0 in conventional probability refers to an
event judged to be impossible and one you completely disbelieve. But all 0 means on
Shafer’s s scale is lack of belief, not disbelief. This is very important, since we can go from
lack of belief to some belief as we gather more evidence. But we cannot go from disbelief
to some belief. On a conventional probability scale, a hypothesis once assigned the
probability value 0 can never be resuscitated by further evidence, regardless of how strong
it may be. But some hypothesis, assigned the value s = 0, can be revised upward since we
can go from lack of belief to some belief in this hypothesis when and if we have some
further evidence to support it. Thus, s allows us to account for pure evidence in ways that
ordinary probabilities cannot do. We will refer to this scale again in Section 1.3.4 when
discussing Baconian probability.
Consider the evidence in the dirty bomb example which will be discussed in Section 2.2.
We begin by listing the hypotheses we are considering at this moment:

H 1: A dirty bomb will be set off somewhere in the Washington, D.C., area.
¬H 1 : A dirty bomb will not be set off in the Washington, D.C., area (it might be set off
somewhere else or not at all).

In the Belief Functions approach, we have just speciﬁed what is called a frame of
discernment, in shorthand a frame F. What this frame F ¼ fH 1 ; ¬H 1 g shows is how we
are viewing our hypotheses right now. We might, on further evidence, wish to revise our
frame in any one of a variety of ways. For example, we might have evidence suggesting
other speciﬁc places where a dirty bomb might be set off, such as in Annapolis, Maryland,
or in Tysons Corner, Virginia. So our frame F in this case might be:

H 1: A dirty bomb will be set off in Washington, D.C.

H 2: A dirty bomb will be set off in Annapolis, Maryland.
H 3: A dirty bomb will be set off in Tysons Corner, Virginia.

All that is required in the Belief Functions approach is that the hypotheses in a frame be
mutually exclusive; they might or might not be exhaustive. The hypotheses are required to
0
be exhaustive in the Bayesian approach. So this revised frame F ¼ fH 1 ; H 2 ; H 3 g, as stated,
is not exhaustive. But we are assuming, for the moment at least, that these three

0 Conventional Probability 1
(a)
Disbelief or Impossible Certain or Complete Belief

0 Degree of Support or Belief 1

(b)
Lack of Support or Belief Total Support or Belief

0 Baconian Probability
(c)
Lack of Proof Proof
Figure 1.6. Different probability scales.

13:51:08,
.002
16 Chapter 1. Introduction

hypotheses are mutually exclusive: The dirty bomb will be set off at exactly one of these
three locations. But, on further evidence, we might come to believe that dirty bombs will
be set off in both Washington, D.C., and in Tysons Corner, Virginia. We know that the
terrorists we are facing have a preference for simultaneous and coordinated attacks. So,
00
our revised frame F might be:

H 1: A dirty bomb will be set off in Washington, D.C., and in Tysons Corner, Virginia.
H 2: A dirty bomb will be set off in Annapolis, Maryland.

The point of all this so far is that the Belief Functions approach allows for the fact that our
hypotheses may mutate or change as a result of new evidence we obtain. This is a major
virtue of this approach to evidential reasoning.
The next thing we have to consider is the power set of the hypotheses in a frame. This
power set is simply the list of all possible combinations of the hypotheses in this frame.
When we have n hypotheses in F, there are 2n possible combinations of our hypotheses,
including all of them and none of them. For example, when F ¼{H 1 , ¬H 1 }, the power set
consists of {H 1 }, {¬H 1 }, {H 1 , ¬H 1 }, and ∅, where ∅ = the empty set (i.e., none of the
0
hypotheses). For F ¼ fH 1 ; H 2 ; H 3 g, as just deﬁned, there are 23 = 8 possible combin-
ations: {H 1 }, {H 2 }, {H 3 }, {H 1 , H 2 }, {H 1 , H 3 }, {H 2 , H 3 ), {H 1 , H 2 , H 3 } and ∅. Now, here comes
an important point about support function s: The assigned values of s for any item or body
of evidence must sum to 1.0 across the power set of hypotheses in a frame. The only
restriction is that we must set s{∅} = 0. We cannot give any support to the set of none of
the hypotheses we are considering.
More details about the Belief Functions approach are provided in Schum (1994 [2001a],
pp. 222–243).

1.3.4 Baconian Probability

1.3.4.1 Variative and Eliminative Inferences
Here is a view of probabilistic reasoning that puts particular emphasis on a very important
matter not speciﬁcally addressed in any other view of probability. In this view, the
probability of a hypothesis depends on how much relevant and believable evidence we
have and on how complete is our coverage of existing evidence on matters we ourselves
have recognized as being relevant in the analysis at hand.
This Baconian view of probability rests on the work of Professor L. Jonathan Cohen
(1977, 1989). The label “Baconian” on this system of probability acknowledges the work
of Sir Francis Bacon (1561–1626), who revolutionized the process of inference in science.
Bacon argued that attempting to prove some hypothesis by gathering instances favorable
to it is mistaken, since all it would take to refute the generality of this hypothesis was
one unfavorable instance of it. What Bacon argued was that we ought to design research
with the objective of eliminating hypotheses. The hypothesis that best resists our
eliminative efforts is the one in which we should have the greatest conﬁdence. As this
eliminative process proceeds, it is obvious that we should not keep performing the same
test over and over again. What we need is an array of different tests of our hypotheses.
The hypothesis that holds up under the most varied set of tests is the one having
the greatest probability of being correct. So, Baconian inferences are eliminative and
variative in nature.

13:51:08,
.002
1.3. Probabilistic Reasoning 17

Baconian probabilities have only ordinal properties and cannot be combined algebra-
ically in any way. The Baconian probability scale is shown as (c) in Figure 1.6, to be
compared with the conventional probability scale shown as (a) in Figure 1.6. On the
conventional probability scale, 0 means disproof; but on the Baconian scale, 0 simply
means lack of proof. A hypothesis now having zero Baconian probability can be revised
upward in probability as soon as we have some evidence for it. As noted, we cannot revise
upward in probability any hypothesis disproved, or having zero conventional probability.

1.3.4.2 Importance of Evidential Completeness

Figure 1.7 illustrates a major point of interest in the Baconian system. Professor Cohen
(1977, 1989) argued that in any evidential reasoning situation, we are always out on an
inferential limb that might be longer and weaker than we may believe it to be. Suppose
you have generated three hypotheses {H1, H2, and H3}. You have examined a body of
evidence and have used Bayes’ rule to combine the likelihoods for this evidence together
with stated prior probabilities. The result is that Bayes’ rule shows the posterior probability
of H3, in light of the evidence, to be 0.998, very close to certainty on the Kolmogorov
probability scale. Therefore, you conﬁdently report your conclusion that H3 is true,
together with its very large posterior probability you have determined. A short time passes,
and you hear the depressing news that H3 is not true. What could have gone wrong? After
all, you performed an analysis that is highly respected by many persons.
A person having knowledge of Cohen’s Baconian probability (1977, 1989) arrives on the
scene of your distress and makes the following comments:

You gathered some evidence, fair enough, quite a bit of it, in fact. But, how
many relevant questions you can think of were not answered by the evidence
you had? Depending upon the number of these unanswered questions, you
were out on an inferential limb that was longer and weaker than you
imagined it to be (see Figure 1.7). If you believed that these unanswered
questions would supply evidence that also favored H3, you were misleading

Your conclusion
that H3 is true

Questions unanswered
by your existing evidence

P(H3 on evidence) = 0.998

(e.g., from Bayes’ rule)

Your existing
evidence

Figure 1.7. A Baconian inferential limb.

13:51:08,
.002
18 Chapter 1. Introduction

yourself since you did not obtain any answers to them. The posterior probabil-
ity you determined by itself is not a good indicator of the weight of evidence.
What makes better sense is to say the weight of evidence depends on the
amount of favorable evidence you have and how completely it covers matters
you said were relevant. In your analysis, you completely overlooked the infer-
ential importance of questions your existing evidence did not answer.

Apart from the Baconian system, no other probability view focuses on evidential com-
pleteness and the importance of taking into account questions recognized as being
relevant that remain unanswered by the evidence we do have. This is why Jonathan
Cohen’s Baconian system is so important (Cohen, 1977; 1989). What we do not take into
account in our analyses can hurt us very badly.
In many instances, such as reasoning in intelligence analysis, we frequently have to
make inferences about matters for which we have scant evidence, or no evidence at all. In
other instances in which there may be available evidence, we may have no time to search
for it or consider it carefully. In such cases, we are forced to make assumptions or
generalizations that license inferential steps. But this amounts to giving an assumption
or a generalization the benefit of the doubt (without supporting it in any way), to believing
as if some conclusion were true (absent any evidential support for it), or to taking
something for granted without testing it in any way. All of these situations involve the
suppression of uncertainties.
It happens that only the Baconian probability system provides any guidance about how
to proceed when we must give benefit of doubt, believe as if, or take things for granted.
The major reason is that it acknowledges what almost every logician says about the
necessity for asserting generalizations and supplying tests of them in evidential reasoning.
Search the Bayesian or Belief Functions literature, and you will find almost no discussion
of generalizations (assumptions) and ancillary tests of them. Suppose we are interested in
inferring F from E, that is, P(F|E). Bayes’ rule grinds to a halt when we have no basis for
assessing the likelihoods P(E|F) and P(E|¬F). Bayesians counter by saying that we will
always have some evidence on which to base these judgments. But they never say what
this evidence is in particular cases and how credible it might or might not be. The Belief
Functions approach comes closer by saying that we can assess the evidential support for a
body of evidence that may include both directly relevant and at least some ancillary
evidence (i.e., evidence about other evidence). Following is an account of the Baconian
license for giving an assumption or generalization benefit of doubt, believing as if it were
true, or taking it for granted, provided that we are willing to mention all of the uncertain-
ties we are suppressing when we do so. Stated another way, we must try to account for all
of the questions we can think of that remain unanswered by the absence, or very scant
amount, of evidence.
Here are the essentials of Cohen’s Baconian approach to reasoning based on little or no
ancillary evidence to either support or undermine a generalization (Cohen 1977; 1989).
The first step, of course, is to make sure the generalization is not a non sequitur, that is,
that it makes logical sense. In the simplest possible case, suppose we are interested in
inferring proposition or event F from proposition or event E. The generalization G in doing
so might read, “If E has occurred, then probably F has occurred.” We recognize this if-then
statement as an inductive generalization since it is hedged. Second, we consider various
tests of this generalization using relevant ancillary evidence. Third, we consider how many
evidential tests of this generalization there might be. Suppose we identify N such tests. The

13:51:08,
.002
1.3. Probabilistic Reasoning 19

best case would be when we perform all N of these tests and they all produce results
favorable to generalization G. But we must not overlook generalization G itself; we do so
by assigning it the value 1; so we have N + 1 things to consider. Now we are in a position to
show what happens in any possible case.
First, suppose we perform none of these N evidential tests. We could still proceed by
giving generalization G the benefit of the doubt and detach a belief that F occurred (or will
occur) just by invoking this generalization G regarding the linkage between events
E and F. So, when no evidential tests are performed, we are saying: “Let’s believe as if
F occurred based on E and generalization G.” This would amount to saying that the
Baconian probability of event F is B(F) = 1/(N + 1). This expression is never a ratio; all it
says is that we considered just one thing in our inference about F from E, namely just the
generalization G. We could also say, “Let’s take event F for granted and believe that
F occurred (or will occur) because E occurred, as our generalization G asserts.” However,
note that in doing so, we have left all N ancillary evidential questions unanswered.
This we represent by saying that our inference of F from E has involved only one of the
N + 1 considerations and so we have (N + 1 – 1) = N, the number of questions we have left
unanswered. As far as evidential completeness is concerned, this is when the evidence we
have is totally incomplete. But the Baconian system allows us to proceed anyway based on
giving a generalization the benefit of doubt. But our confidence in this result should
be very low.
Now suppose we have performed some number k of the N possible ancillary evidential
tests of generalization G, as asserted previously, and they were all passed. The Baconian
probability of F in this situation is given by B(F) = (k + 1)/(N +1). The difference between
the numerator and denominator in such an expression will always equal the number
of unanswered questions as far as the testing of G is concerned. In this case, we have
(N + 1) – (k + 1) = N – k questions that were unanswered in a test of generalization G. How
high our confidence is that F is true depends on how high k + 1 is as compared to N + 1.
But now suppose that not all answers to these k questions are favorable to generaliza-
tion G. Under what conditions are we entitled to detach a belief that event F occurred,
based on evidence E, generalization G, and the k tests of G? The answer requires a
subjective judgment by the analyst about whether the tests, on balance, favor or disfavor
G. When the number of the k tests disfavoring G exceeds the number of tests favoring G,
we might suppose that we would always detach a belief that event F did not occur, since
G has failed more tests than it survived. But this will not always be such an easy judgment
if the number of tests G passed were judged to be more important than the tests it failed to
pass. In any case, there are N – k tests that remain unanswered. Suppose that k is quite
large, but the number of tests favorable to G is only slightly larger than the number of tests
unfavorable to G. In such cases, the analyst might still give event F the benefit of the doubt,
or believe, at least tentatively, as if F occurred pending the possible acquisition of further
favorable tests of G. And in this case, the confidence of the analyst in this conclusion
should also be very low.
Whatever the basis for an assumption or a benefit of doubt judgment there is, one of
the most important things about the Baconian approach is that the analyst must be
prepared to give an account of the questions that remain unanswered in evidential tests
of possible conclusions. This will be especially important when analysts make assump-
tions, or more appropriately, give generalizations the benefit of doubt, draw as if conclu-
sions, or take certain events for granted. These are situations in which analysts are most
vulnerable and in which Baconian ideas are most helpful.

13:51:08,
.002
20 Chapter 1. Introduction

1.3.4.3 Baconian Probability of Boolean Expressions

Some of the most important properties of Baconian probabilities concern their application
to Boolean combinations of propositions, such as hypotheses. Because the probabilities in
the Baconian system have only ordinal properties, we can say only that hypothesis H1 is
more likely than H2, but we cannot say how much more likely H1 is than H2. Also, in the
Baconian system, it is never necessary to assess subjective probabilities. In our saying that
H1 is more probable than H2, all we are saying is that there is more favorably relevant
evidence on H1 than there is on H2. What counts most in the Baconian system is the
completeness of our evidence and the extent to which we have questions that remain
unanswered by the evidence we have. Here are the three most important Baconian
properties of interest to us concerning intersections, unions, and negation.
Baconian Intersection: Suppose we have some events of interest such as events F, G,
and H. Suppose we have some favorably relevant evidence about each one of these events
and have also considered how complete the evidence is for these events. So we determine
that the Baconian probabilities (B) for these three events are B(F) B(G) B(H). Here’s
what these probabilities say: We have more favorably relevant and complete evidence for
event F than we do for event G, and more favorably relevant and complete evidence for
event G than we have for event H. So, asked what the Baconian probability is for their
intersection (F and G and H), we must say that B(F and G and H) = B(H). What this says is
that the Baconian probability of the intersection of these three events is equal to the
Baconian probability of the event with the least favorably relevant and complete evidence.
This is an example of the MIN rule for Baconian intersections. We might compare this with
the conventional probability of the intersection of these three events. Suppose that events
F, G, and H are independent events where P(F) = 0.8, P(G) = 0.6 and P(H) = 0.4. In this
case, P(F and G and H) = 0.8*0.6*0.4 = 0.192 < P(H) = 0.4. In the Baconian system, the
probability of a conjunction of events or propositions can never be less than that of the
event having the smallest Baconian probability.
Baconian Union: Now consider the same case involving events F, G, and H. Again,
suppose that B(F) B(G) B(H). Now what we wish to determine is the Baconian
probability B(F or G or H). In this case, B(F or G or H) B(F), where B(F) is the largest
of the Baconian probability for the events we are considering. This is the MAX rule
for Baconian probability, and what it says is that the probability of a disjunction of events
is at least as large as the largest Baconian probability of any of the individual events.
Baconian Negation: Baconian negation is not complementary. The Baconian rule is
quite complex; here’s what it says: If we have A and ¬A, if B(A) > 0, then B(¬A) = 0. What
this means essentially is that we cannot commit beliefs simultaneously to two events that
cannot both occur.
What is quite interesting is that the Baconian treatment of conjunctions and disjunc-
tions is the same as in Zadeh’s Fuzzy probability system; namely, they both make use of
min-max rules for these connectives.
More details about Baconian probabilities are provided in Schum (1994 [2001a],
pp. 243–261).

1.3.5 Fuzzy Probability

1.3.5.1 Fuzzy Force of Evidence
One can also express the uncertainty about a conclusion reached by using words, such as
“likely,” “almost certain,” or “much less certain,” rather than numbers, as illustrated by the

13:51:08,
.002
1.3. Probabilistic Reasoning 21

following fragment from the letter sent by Albert Einstein to the United States President
Franklin D. Roosevelt, on the possibility of constructing nuclear bombs (Einstein, 1939):
. . . In the course of the last four months it has been made probable – through
the work of Joliot in France as well as Fermi and Szilárd in America – that it
may become possible to set up a nuclear chain reaction in a large mass of
uranium, by which vast amounts of power and large quantities of new
radium-like elements would be generated. Now it appears almost certain
that this could be achieved in the immediate future.
This new phenomenon would also lead to the construction of bombs, and it
is conceivable – though much less certain – that extremely powerful bombs of
a new type may thus be constructed. . . .
Verbal expressions of uncertainty are common in many areas. In the ﬁeld of law, for
example, forensic standards of proof are always employed using words instead of
numbers. We all know about standards such as “beyond reasonable doubt” (in criminal
cases); “preponderance of evidence” (in civil cases); “clear and convincing evidence” (in
many Senate and congressional hearings); and “probable cause” (employed by magis-
trates to determine whether a person should be held in custody pending further hearings).
All the verbal examples just cited have a current name: They can be called Fuzzy
probabilities. Words are less precise than numbers. There is now extensive study of fuzzy
inference involving what has been termed approximate reasoning, which involves verbal
statements about things that are imprecisely stated. Here is an example of approximate
reasoning: “Since John believes he is overworked and underpaid, then he is probably not
very satisﬁed with his job.” We are indebted to Professor Lofti Zadeh (University of
California, Berkeley), and his many colleagues, for developing logics for dealing with fuzzy
statements, including Fuzzy probabilities (Zadeh, 1983; Negoita and Ralescu, 1975). In his
methods for relating verbal assessments of uncertainty with numerical equivalents, Zadeh
employed what he termed a possibility function, μ, to indicate ranges of numerical
probabilities a person might associate with a verbal expression of uncertainty. Zadeh
reasoned that a person might not be able to identify a single precise number he or she
would always associate with a verbal statement or Fuzzy probability. Here is an example of
a possibility function for the Fuzzy probability “very probable.”
Asked to grade what numerical probabilities might be associated with an analyst’s
Fuzzy probability of “very probable,” the analyst might respond as follows:
For me, “very probable” means a numerical probability of at least 0.75 and at
most 0.95. If it were any value above 0.95, I might use a stronger term, such
as “very, very probable.” I would further say that I would not use the term
“very probable” if I thought the probability was less than 0.75. In such cases,
I would weaken my verbal assessment. Finally, I think it is most possible (μ =
1.0) that my use of the verbal assessment “very probable” means something
that has about 0.85 of occurring. If the analyst decides that “very probable”
declines linearly on either side of μ = 1.0, we would have the possibility
function shown in Figure 1.8.

1.3.5.2 Fuzzy Probability of Boolean Expressions

As an example of using fuzzy probabilities, suppose we have three events, or propositions
A, B, and C. We consider the following Fuzzy probabilities (F) for these events, and we say
the following:

13:51:08,
.002
22 Chapter 1. Introduction

1.0

Very

Possibility
probable
0.5

0 0.25 0.50 0.75 1.0

Probability
Figure 1.8. Possibilities and Fuzzy probabilities.

Event A is very likely.

Event B is likely.
Event C is very unlikely.

We express this by saying that F(A) > F(B) > F(C).

Fuzzy Conjunction: The Fuzzy conjunction of several events is the minimum Fuzzy
probability of these events. For example, F(A and B and C) = F(C), which is the minimum
Fuzzy probability of these three events.
Fuzzy Disjunction: The Fuzzy disjunction of several events is the maximum Fuzzy
probability of these events. For example, F(A or B or C) = F(A), which is the maximum
Fuzzy probability of these three events.
Notice that both in the Baconian system and in the Fuzzy system we have MIN/MAX
rules for combining probabilities for complex events.
Fuzzy Negation: Fuzzy negation is complementary: F(A) = 1 – F(¬A).
More details about the Fuzzy probabilities are provided in Schum (1994 [2001a],
pp. 261–269).

1.3.5.3 On Verbal Assessments of Probabilities

Let us also consider the critics who sneer at verbal assessments of probabilities, saying that
only numerical assessments, conforming to the Kolmogorov axioms, are acceptable. As a
top-ranking analyst, you are asked by an equally high-ranking customer for the probability
of a crucial hypothesis HK. All the evidence in this case is for a one-of-a-kind event, so your
assessment is necessarily subjective. You tell the customer, “Sir, the probability of HK, on
our analysis is 78 percent.” The customer asks, “This is a very precise number. How did
you arrive at it, given the subjective nature of your assessment?” You reply, “Yes, sir, what
I really should have said was that my probability is between 73 percent and 83 percent,
and 78 percent seemed like a good ﬁgure to quote.” The customer then says, “But the
limits to the probability interval you quoted are also precise. How did you arrive at them?”
You might say, “Well, my lower limit is really between 70 percent and 76 percent and my
upper limit is between 80 percent and 86 percent.” Your customer says, “But these are also
precise numbers.” There is, as you see, an inﬁnite regress of similar questions regarding
the basis for subjective numerical assessments.
There are many places to begin a defense of verbal or Fuzzy probability statements. The
most obvious one is law. All of the forensic standards of proof are given verbally: “beyond

13:51:08,
.002
1.3. Probabilistic Reasoning 23

reasonable doubt,” “clear and convincing evidence,” “balance of probabilities,” “sufﬁcient

evidence,” and “probable cause.” Over the centuries, attempts have been made to supply
numerical probability values and ranges for each of these standards, but none of them has
been successful. The reason, of course, is that every case is unique and rests upon many
subjective and imprecise judgments. Wigmore (1913, 1937) understood completely that
the catenated inferences in his Wigmorean networks were probabilistic in nature. Each of
the arrows in a chain of reasoning describes the force of one hypothesis on the next one,
such as E ➔ F. Wigmore graded the force of such linkages verbally, using such terms as
“strong force,” “weak force,” “provisional force,” and so on. Toulmin (1963) also used
fuzzy qualifiers in the probability statements of his system, which grounds the Rationale
analytical tool (van Gelder, 2007). There are many other examples of situations in which it
is difficult or impossible for people to find numerical equivalents for verbal probabilities
they assess. Intelligence analysis so often supplies very good examples in spite of what
Sherman Kent (1994) said some years ago. Indeed, using words is quite often necessary in
analyses based on masses of evidence that are so complex that they resist even the most
devoted attention to the construction of inference networks. Couple this with the fact that
different analysts might disagree substantially about what specific probability should be
assigned to a conclusion. In addition an analyst might assign a different probability to the
same conclusion, based on the same evidence, on different occasions. What this says is
that there will be inter-analyst and intra-analyst variation in the assessment of probabil-
ities. Words are less precise than numbers, so there will often be less disagreement about a
verbal or a Fuzzy probability.
We conclude this discussion by recalling what the well-known probabilist Professor
Glenn Shafer said years ago (Shafer, 1988):

Probability is more about structuring arguments than it is about numbers. All

probabilities rest upon arguments. If the arguments are faulty, the probabil-
ities, however determined, will make no sense.

1.3.6 A Summary of Uncertainty Methods and What They

Best Capture
Evidence-based reasoning is necessarily probabilistic in nature because our evidence
is always incomplete (we can look for more, if we have time), usually inconclusive (it
is consistent with the truth of more than one hypothesis or possible explanation),
frequently ambiguous (we cannot always determine exactly what the evidence is telling
us), commonly dissonant (some of it favors one hypothesis or possible explanation, but
other evidence favors other hypotheses), and with various degrees of believability shy
of perfection.
As illustrated in Table 1.1 and discussed in this section, each of the alternative views of
probability previously discussed captures best some of these characteristics of evidence,
but no single view captures best all of them. We include in this table just the four views
concerning nonenumerative situations. One can easily ﬁnd many works on statistics,
frequentistic or Bayesian, in enumerative situations in which they can estimate probabil-
ities from observed relative frequencies.
The ﬁrst entry in Table 1.1 lists a major strength that is exclusive to the Baconian
system: its concern about how much favorable evidence was taken into account in an

13:51:08,
.002
24 Chapter 1. Introduction

Table 1.1. A Summary of Nonenumerative Uncertainty Methods and What They Best
Capture

Subjective Belief
Major Strength Bayes Functions Baconian Fuzzy

Accounting for incompleteness of coverage

☑
of evidence

Coping with inconclusiveness in evidence ☑ ☑ ☑ ☑

Coping with ambiguities or imprecision

☑ ☑
in evidence and judgmental indecision

Coping with dissonance in evidence ☑ ☑ ☑ ☑

Coping with source believability issues ☑ ☑

analysis, and how completely this evidence covered matters judged relevant to conclusions
that could be reached. A major question this form of analysis allows us to address is the
extent to which questions that have not been answered by existing evidence could have
altered the conclusion being reached. It would be quite inappropriate to assume that
answers to the remaining unanswered questions would, if they were obtained, all favor the
conclusion that was being considered. This, of course, requires us to consider carefully
matters relevant to any conclusion that are not addressed by available evidence.
The second entry in Table 1.1 notes that all four of the probabilistic methods have
very good ways for dealing with the inconclusive nature of most evidence, but they do
so in different ways. The Subjective Bayesian does so by assessing nonzero likelihoods
for the evidence under every hypothesis being considered. Their relative sizes indicate
the force that the evidence is judged to have on each hypothesis. But the Belief
Functions advocate assigns numbers indicating the support evidence provides for
hypotheses or subsets of them. We should be quick to notice that Bayesian likelihoods
do not grade evidential support, since in Belief Functions one can say that an item of
evidence provides no support at all to some hypothesis. But a Bayesian likelihood of
zero under a particular hypothesis would mean that this hypothesis is impossible and
should be eliminated. Offering no support in Belief Functions does not entail that this
hypothesis is impossible, since some support for this hypothesis may be provided by
further evidence. The Baconian acknowledges the inconclusive nature of evidence by
assessing how completely, as well as how strongly, the evidence favors one hypothesis
over others. In Fuzzy probabilities, it would be quite appropriate to use words in
judging how an item or body of evidence bears on several hypotheses. For example,
one might say, “This evidence is indeed consistent with H1 and H2, but I believe it
strongly favors H1 over H2.”
The third entry in the table ﬁrst acknowledges the Belief Functions and Fuzzy
concerns about ambiguities and imprecision in evidence. In the Belief Functions
approach, one is entitled to withhold belief for some hypotheses in the face of ambiguous
evidence. In such cases, one may not be able to decide upon the extent to which the
evidence may support any hypothesis being considered, or even whether the evidence

13:51:08,
.002
1.4. Evidence-based Reasoning 25

supports any of them. Judgmental indecision is not allowed in the Bayesian system since
it assumes one can say precisely how strongly evidence judged relevant favors every
hypothesis being considered. Ambiguities in evidence may be commonly encountered.
The Fuzzy advocate will argue that ambiguities or imprecision in evidence hardly
justifies precise numerical judgments. In the face of fuzzy evidence, we can make only
fuzzy judgments of uncertainty.
The fourth entry in Table 1.1 shows that all four probability systems have very good
mechanisms for coping with dissonant evidence in which there are patterns of contradict-
ory and divergent evidence. Dissonant evidence is directionally inconsistent; some of it
will favor certain hypotheses and some of it will favor others. In resolving such inconsist-
encies, both the Bayesian and Belief Functions approaches will side with the evidence
having the strongest believability. The Bayesian approach to resolving contradictions is
especially interesting since it shows how “counting heads” is not the appropriate method
for resolving contradictions. In times past, “majority rule” was the governing principle.
Bayes’ rule shows that what matters is the aggregate believability on either side of a
contradiction. The Baconian approach also rests on the strength and aggregate believabil-
ity in matters of dissonance, but it also rests on how much evidence is available on either
side and upon the questions that remain unanswered. In Fuzzy terms, evidential disson-
ance, and how it might be resolved, can be indicated in verbal assessments of uncer-
tainty. In such instances, one might say, “We have dissonant evidence favoring both H1
and H2, but I believe the evidence favoring H1 predominates because of its very strong
believability.”
Row five in Table 1.1 concerns the vital matter of assessing the believability of
evidence. From considerable experience, we find that the Bayesian and Baconian
systems are especially important when they are combined. In many cases, these two
radically different schemes for assessing uncertainty are not at all antagonistic but are
entirely complementary. Let us consider a body of evidence about a human intelli-
gence (HUMINT) asset or informant. Ideas from the Baconian system allow us to ask,
“How much evidence do we have about this asset, and how many questions about
this asset remain unanswered?” Ideas from the Bayesian system allow us to ask, “How
strong is the evidence we do have about this asset?” (Schum, 1991; Schum and
Morris, 2007)

1.4 EVIDENCE-BASED REASONING

1.4.1 Deduction, Induction, and Abduction

These three types of inference involved in evidence-based reasoning may be intuitively
summarized as shown in the following.
Deduction shows that something is necessarily true:

A ➔ necessarily B
Socrates is a man ➔ necessarily Socrates is mortal

Induction shows that something is probably true:

A ➔ probably B
Julia was born in Switzerland ➔ probably Julia speaks German

13:51:08,
.002
26 Chapter 1. Introduction

Abduction shows that something is possibly or plausibly true:

A ➔ possibly B
There is smoke in the East building ➔ possibly there is ﬁre
in the East building

These types of inference are described more formally in the following.

Deductive Inference:

8x, U(x) ➔ V(x) Whenever U(x) is true, V(x) is also true

U(a1) U(a1) is true

Necessarily V(a1) Therefore V(a1) is necessarily true

Inductive Inference:

U(a1) and V(a1) When U(a1) was true, it was observed that V(a1) was also true
U(a2) and V(a2) When U(a2) was true, it was observed that V(a2) was also true

... ...

U(an) and V(an) When U(an) was true, it was observed that V(an) was also true

8x, U(x) ➔ Probably V(x) Therefore, whenever U(x) is true, V(x) is also probably true

Abductive Inference:

U(a1) ➔ V(a1) If U(a1) were true then V(a1) would follow as a matter of course
V(a1) V(a1) is true

Possibly U(a1) Therefore U(a1) is possibly true

1.4.2 The Search for Knowledge

We can extend Oldroyd’s Arch of Knowledge from Figure 1.2, as indicated in Figure 1.9, to
show how abduction, deduction, and induction are used in the search for knowledge.
They are at the basis of collaborative processes of evidence in search of hypotheses,
hypotheses in search of evidence, and evidentiary testing of hypotheses in a complex
dynamic world.
Through abductive reasoning (which shows that something is possibly true), we search
for hypotheses that explain our observations; through deductive reasoning (which shows
that something is necessarily true), we use our hypotheses to generate new lines of inquiry
and discover new evidence; and through inductive reasoning (which shows that something
is probably true), we test our hypotheses by evaluating our evidence. Now the discovery of
new evidence may lead to new hypotheses or the reﬁnement of the existing ones. Also,
when there is more than one most likely hypothesis, we need to search for additional
evidence to determine which of them is actually the most likely. Therefore, the processes
of evidence in search of hypotheses, hypotheses in search of evidence, and evidentiary

13:51:08,
.002
1.4. Evidence-based Reasoning 27

Explanatory Hypotheses Probability of Hypotheses

What hypotheses would What evidence is entailed What is the evidence-based

explain these observations? by each hypothesis? probability of each hypothesis?
Abduction: O possibly H Deduction: H necessarily E Induction: E probably H

Observations New Evidence

Evidence in search Hypotheses in Evidentiary testing
of hypotheses search of evidence of hypotheses
Figure 1.9. The search for knowledge.

testing of hypotheses also take place in response to one another, as indicated by the
feedback loops from the bottom of Figure 1.9.

1.4.3 Evidence-based Reasoning Everywhere

As illustrated in Figure 1.10, evidence-based reasoning is at the core of many problem-
solving and decision-making tasks in a wide variety of domains, including physics,
chemistry, history, archaeology, medicine, law, forensics, intelligence analysis, cyberse-
curity, and many others. This is not surprising, because, as Jeremy Betham stated over
two centuries ago, “The field of evidence is no other than the field of knowledge”
(Betham, 1810, p. 5).
Scientists from various domains, such as physics, chemistry, or biology, may recognize
this as a formulation of the scientific method.
In medicine, a doctor makes observations with respect to a patient’s complaints and
attempts to generate possible diagnoses (hypotheses) that would explain them. He or
she then performs various medical tests that provide further evidence for or against
the various hypothesized illnesses. After that, the doctor uses the obtained evidence to
determine the most likely illness.
In law, an attorney makes observations in a criminal case and seeks to generate
hypotheses in the form of charges that seem possible in explaining these observations.
Then, assuming that a charge is justified, attempts are made to deduce further evidence
bearing on it. Finally, the obtained evidence is used to prove the charge.
In forensics, observations made at the site of an explosion in a power plant lead to the
formulation of several possible causes. Analysis of each possible cause leads to the discovery
of new evidence that eliminates or refines some of the causes, and may even suggest new
ones. This cycle continues until enough evidence is found to determine the most likely cause.
In intelligence analysis, an analyst formulates alternative hypotheses that would
explain the evidence about an event. Then the analyst puts each of the hypotheses at
work to guide him or her in the collection of additional evidence, which is used to assess
the probability of each hypothesis.
In cybersecurity, a suspicious connection to our computer from an external one triggers
the automatic generation of alternative threat and nonthreat hypotheses. Each generated

13:51:08,
.002
28 Chapter 1. Introduction