0% found this document useful (0 votes)

113 views80 pages

Human Computer Dialogue System: 2019-GCUF-04303 Supervisor: Prof Kahsif Ali

This document is a thesis submitted by Asad Majeed to the Department of Computer Science at Government College University in fulfillment of the requirements for a Master of Science degree. It investigates developing a human-computer dialogue system. The thesis contains chapters that review literature on dialogue systems and research methods, as well as results and discussion.

Uploaded by

qamar shah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

113 views80 pages

Human Computer Dialogue System: 2019-GCUF-04303 Supervisor: Prof Kahsif Ali

Uploaded by

qamar shah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

HUMAN COMPUTER DIALOGUE SYSTEM

BY
ASAD MAJEED

2019-GCUF-04303

SUPERVISOR: PROF KAHSIF ALI

Thesis submitted in partial fulfilment of

the requirements for the degree of

MASTER OF SCIENCE

DEPARTMENT OF COMPUTER SCIENCE GOVERNMENT COLLEGE

UNIVERSITY
FAISALABAD.

YEAR 2021

1
DECLARATION
The effort re-counted in this thesis was carried out by me beneath the direction of” Professor
Kashif Ali “Assistant professor, Department of computer science, GC University Faisalabad
Pakistan”

I hereby declare that the title of thesis “HUMAN COMPUTER DIALOGUE SYSTEM”
More importantly, the essence of the sentence in the result of my own investigation and no
part is duplicated with any distributed source (except references, standard science or genetic
models/condition/equations/protocol, etc.). I have also announced that this work has not yet
been submitted for other degree /certificates. If the data turns out to be wrong at any time, the
university can take action.

Signature of the Student / Scholar

…………………………………..

Name: Asad Majeed

Registration No: 2019-GCUF-04303

ROLL # 1678

2
CERTIFICATE BY SUPERVISORY COMMITTEE
We certify that the contents and from of thesis submitted by Asad Majeed, Registration No:
2019-GCUF-04303 has been found satisfactory and in accordance with the prescribed format.
I recommend it to be processed for the evaluation by the external Examiner for the
Award of degree.

Signature of supervisor
………………………………………………
Name: ……………………...……………….
Designation with stamp……………………..

Co-supervisor (if any)

Signature………….…………………………
Name: ………………………………………
Designation with stamp……………………..
Member of supervisory committee
Signature………………….…………………
Name: …………………..…………………..
Designation with stamp……………………..
Member of supervisory committee
Signature……………………………….……
Name: ……………………………………....
Designation with stamp……………………..

Chairperson
Signature with stamp………………………..

Dean/ Academic coordinator

Signature with stamp……………..…………

3
Table of Contents
DECLARATION.......................................................................................................................2

Chapter 1....................................................................................................................................1

INTRODUCTION......................................................................................................................1

Dialogue System.....................................................................................................................3

The existing Dialogue System................................................................................................3

Rule-based method;................................................................................................................3

Sequence-To-Sequence;.........................................................................................................5

Reinforcement-Method of-Learning;.....................................................................................6

Chapter 2....................................................................................................................................7

Review of literature....................................................................................................................7

Studies of interactive dialogues..............................................................................................7

Transfer effects.....................................................................................................................11

Research hypotheses.............................................................................................................13

Chapter 3..................................................................................................................................15

Research Methodology.............................................................................................................15

Past-research (the circuit fix-it shop)...................................................................................16

Data Collection First,............................................................................................................16

Experimentation...................................................................................................................17

Underlying Model –.............................................................................................................17

Dialogue as Extension to Top-Down Problem-Solving.......................................................18

Task Initiative.......................................................................................................................19

Negotiation...........................................................................................................................19

Mathematical Analysis.........................................................................................................20

Computational Implementation............................................................................................20

Speaker Utterance.................................................................................................................21

Demo and Full-Scale Systems..............................................................................................22

4
Participants...........................................................................................................................23

Materials...............................................................................................................................24

The Artimis Plan Resto application......................................................................................24

Dialogue scenarios................................................................................................................26

Mental workload self-evaluation questionnaire...................................................................27

Satisfaction questionnaire.....................................................................................................28

Procedure..............................................................................................................................28

Design...................................................................................................................................28

Dependent variables.............................................................................................................29

Initial measures and representation of partner......................................................................29

Performance and subjective levels.......................................................................................29

Discourse structure...............................................................................................................29

Chapter 4..................................................................................................................................31

Results and discussion..............................................................................................................31

Missing data and preliminary analyses.................................................................................31

Missing data..........................................................................................................................31

Preliminary analyses.............................................................................................................31

Performance measures and subjective evaluations...............................................................35

Discourse structure...............................................................................................................38

Challenges............................................................................................................................39

General discussion................................................................................................................40

Jointly modeling multiple subtasks......................................................................................44

1: Jointly modeling subtasks in NLU...................................................................................45

2: Jointly modeling subtasks cross NLU and DM................................................................47

3. Jointly modeling subtasks in NLG...................................................................................48

Joint models for subtasks across NLU, DM and NLG.........................................................50

Usefulness evaluation...........................................................................................................51

5
Localizability........................................................................................................................52

Humanness Evaluation.........................................................................................................55

Language Benefit Evaluation...............................................................................................57

Chapter 5..................................................................................................................................62

Conclusions..............................................................................................................................62

References................................................................................................................................63

6
Abstract
Human-Computer dialogue systems provide a natural language based interface between
human and computers. They are widely demanded in network information services,
intelligent accompanying robots, and so on. A Human-Computer dialogue system typically
consists of three parts, namely Natural Language Understanding (NLU), Dialogue
Management (DM) and Natural Language Generation (NLG). Each part has several different
subtasks. Each subtask has been received lots of attentions, many improvements have been
achieved on each subtask, respectively. But systems built in traditional pipeline way, where
different subtasks are assembled sequentially, suffered from some problems such as error
accumulation and expanding, domain transferring. Therefore, researches on jointly modeling
several subtasks in one part or cross different parts have been prompted greatly in recent
years, especially the rapid developments on deep neural networks based joint models. There
is even a few work aiming to integrate all subtasks of a dialogue system in a single model,
namely end-to-end models.

This paper introduces two basic frames of current dialogue systems and gives a brief survey
on recent advances on variety subtasks at first, and then focuses on joint models for multiple
subtasks of dialogues. We review several different joint models including integration of
several subtasks inside NLU or NLG, jointly modeling cross NLG and DM, and jointly
modeling through NLU, DM and NLG. Both advantages and problems of those joint models
are discussed. We consider that the joint models, or end-to-end models, will be one important
trend for developing The most fundamental communication mechanism for interaction is
dialogues involving speech, gesture, semantic and pragmatic knowledge. Various researches
on dialogue management have been conducted focusing on standardized model for goal
oriented applications using machine learning and deep learning models. The paper presents
the overview on existing methods for dialogue manager training; their advantages and
limitations. Furthermore, a new image-based method is used in Facebook baby Task 1 dataset
in Out Of Vocabulary setting. The results show that using dialogue as an image performs well
and helps dialogue manager in expanding out of vocabulary dialogue tasks in comparison.

7
Chapter 1
INTRODUCTION
The Information and Communication Technology (ICT) with which we interact in daily life
is more distributed and embodied into the environment (the so called intelligent space).
Especially, when designing ICT solutions for elderly people, who are very often critical
towards new technology, distributed system can be even more challenging. To improve the
Human Computer Interaction (HCI) with ICT solutions, a directed natural interaction and an
emotional intelligence is very important.

User studies regarding elder behavior change over the ageing process identified that”a skill
that many elderly people retain, even with significant cognitive degradation, is the ability to
communicate in a multimodal face-to-face fashion. The skills for this type of interaction are
acquired in infancy and early childhood and comprise of tacit, crystallized knowledge in
older adulthood”. Face-to-face interaction incorporates a wide range of non-verbal and
paraverbal ways to carry semantic content complementary to the speech. It allows persons
with disabilities to compensate some perception channels (e.g., hearing) by utilizing other
channels of communication (e.g., body gestures). Face-to-face dialog is also characterized by
well-established repair mechanisms of understanding thus enabling the listener to request a
repetition or clarification by the speaker (e.g., communicated by a head nod). Moreover, face-
to-face dialog has built-in mechanisms for constraining the interactants focus of attention.
This focus is important as some elderly people have difficulty dividing their attention or
handling distractions.

This kind of face-to-face interaction can be provided utilizing avatars. Avatars have the
potential to impersonate the used technology and thus increase the acceptance of the
software. The interaction with avatars is able to provide multiple advantages. Avatars can, for
instance, provide gestures, which in turn are able to increase the understanding of the
presented information. Furthermore, the visual enrichment of verbal information i.e., adding a
lip synched animated character to audio speech output, can increase the intelligence and
enhance the robustness of the information transmission as known from natural speech.
Therefore, a consistency between the visual and vocal output is of uttermost important.

A growing amount of Embodied Conversational Agents (ECAs) make use of Natural

Language Processing (NLP) for implementing the intelligent dialogue component. NLP-

1
based systems are of major interest in human-machine interactions for multimodal interfaces
and are preferred widely for Natural and Spoken Language Understanding (NLU or SLU),
Natural Language Generation (NLG) and Dialogue Processing (DP) tasks. Complete dialogue
solution as shown in figure 1, consists of many parts, each of which specializes in certain
task. Automatic Speech Recognition (ASR) module is responsible for converting spoken
users utterance into text. Natural Language Interpreter converts textual information to
meaningful features so that Dialogue State Tracker (DST) can process this features and
update the current dialogue state. DST outputs current dialogue state so that Dialogue
Response Selection (DRS) module. (which is trained to output a response to user utterance)
can generate textual reply to user. Later this textual reply is converted to speech by text-to-
speech (TTS) synthesizer. Since ASR and TTS are not related to dialogue manager directly,
they can be considered as complementary modules to complete dialogue solution.

A dialogue manager is a component of a dialogue system, which is responsible for the

transmission of information among participants in human-machine interaction. Dialogue
manager can be divided into two main groups: Chat bots and frame-based dialogues. In chat
bots, an agent is often trained to work without any knowledge of the dialogue structure
mostly referred as open dialogue, whereas in frame-based dialogues; dialogue frames are
structured by experts with slots and values that each slot can take. In this work, we mainly
focus on chat bots and how they are trained with rule-based, sequence to sequence learning,
reinforcement learning and hierarchical reinforcement learning based manager models. All
these methods can be used separately or together to leverage each other’s strengths. The
paper gives an overview on the ECAs from the various projects; existing models applied to
dialogue manager training and its advantages and drawbacks. Furthermore, it includes results
on dialogue dataset with new image based method where dialogue is processed as images to
train dialogue manager. . Schematic diagram of Complete Dialogue System The paper is
structured into six sections. First section provides an introduction of the Dialogue system
followed by Embodied Interaction. The third section describes the existing rule based and
machine learning based models in dialogue systems. Later, a new proposed methodology for
the dialogue system is explained. The next section consists of the preliminary results and
their discussions; and the last section concludes the findings of the work. Embodied
Interaction Experience Several working groups and projects have been shown that embodied
interactions can help to increase the interaction with ICT systems. Bick more et al. developed
a virtual laboratory to explore the longitudinal usage of a virtual exercise coach Older.

2
3
Figure No1;

Dialogue Stake
Automatic Tracker
Natural Language
Speech
Interpreter
Recognizer

Text-to-speech
Natural Language
Generator natural Dialogue Response
Selection

Dialogue System
In this figure dialogue method is defined by the showing the steps that how the conversation
starts and become a dialogue. We are very happy because in this method successfully
described that it’s a basic dialogue process.

The existing Dialogue System

The section provides the overview of the existing work on training dialogue managers which
includes: Rule-based method, Sequence-to-Sequence method, Reinforcement method of
learning.

Rule-based method;
The first chat bot developed by using rule-based system was Aqsa, which uses pattern
matching based on user replies. In rule-based systems, human dialogues are modeled as set of
states and dialogue manager has to choose replies for the conversation from the given set of
rules. This model has been used in many different applications such as restaurant booking or
online psychological therapy chat bot In this method, a human, who is usually a domain
expert analyzes the dialogue flow between human agents and tries to come up with
predefined dialogue states and possible replies for each state based on patterns. The
advantages of rule based methods are that dialogue managers have control on selecting

4
replies for the conversation and these selected replies from the full set of replies ensure that
the user is not upset or offended, thus keeping the system consistent.

Yasmeen is a recent example of such system, where expert can define a dialogue structure
as seen on the left side in figure 3 (in .xml format). In NADIA dialogue systems, expert can
define structure with questions and answers and dialogue manager would use these hard
coded rules in order to engage in a conversation. On the right side in figure 3, a sample
conversation of NADIA with user can be seen, which consists of answering user’s questions
while making a trip reservation. If real human dialogue flow does not have many different
states and/or replies, rule-based systems usually outperform machine learning models [21].
However, most of the time in real life, human language can get very complex and it becomes
very easy to run out of dialogue states designed by the expert. In such scenarios, it is not
possible to use rule-based models other than giving generic answer to user such as “I don’t
know what you are asking”, which may frustrate the user after certain amount of time.

5
Sequence-To-Sequence;

I am fine <EOL>

How are you <EOL>

LSTM ENCODER LSTM DECODER

Figure;2

In this figure with the help of sequence-to-sequence method it can be easily seen that how
dialogue system can work. LSTM encoder takes either full history or last reply and convert
into encoded feature vector where LSTM decoder takes this vector and outputs a possible
reply condition on encoded feature vector.

6
Reinforcement-Method of-Learning;

Encode
How are
you old Decode Encode
16?

I’m
16 Decode

Decode Encode

I’m 16,Why I think

are you
asking?
you are
12

Figure; 3

In this method reinforcement is applied. Person A ask to Person B about his age. Person B
told his age. Person A not sure about his age and declared his decision after watching person
B. this is called the reinforcement method.

7
Chapter 2
Review of literature
Studies of interactive dialogues
Only a few studies have compared spoken dialogue interactions (speaking and listening) with
the corresponding written mode (typing at a keyboard, reading on a screen) (Oviatt, 1995;
Oviatt, Cohen, & Wang, 1994; Zoltan-Ford, 1991). Empirical studies have used a number of
experimental approaches: interactive human–human dialogues via a system in both the
written and spoken modes; simulated human– computer dialogues in the spoken mode, the
written mode, or both; real systems based human–computer dialogue in the spoken and
written modes. More precisely, in the HCI literature, interactive discourse has primarily been
studied in the form of task-oriented dialogue. Like any other task-oriented activity, task-
oriented dialogue is guided by a goal, i.e. the accomplishment of a task (Falzon, 1991;
Falzon, Amalberti, & Carbonnel, 1986). Studies have tended to focus on twotypes of
dialogue: (a) dialogues with another individual via a system in order to accomplish a task, i.e.
for the most part Computer Mediated Communication (CMC) (Brennan, 1998; Whittaker,
2003); or (b) direct dialogues with a system in natural language or in a restricted language
(e.g., keywords) in order to accomplish a task, i.e. human–computer dialogues. Interagent
dialogues will not be considered here.

Researchers have often grounded their work on shared, non-exclusive theoretical bases such
as (1) the theory of language acts and pragmatics, (2) (socio) cognitive theories, or (3)
psycholinguistic theories. 468 L. Le Bigot et al. / Computers in Human Behavior 22 (2006)
467–500 In brief, philosophers of language have studied conversation by focusing on the
partners communicative intent (and the way this intent is perceived). The production of a
statement in a dialogue context is conditioned by the realization of language acts and the
recognition of underlying intentions (Austin, 1970; Searle, 1969). At another level,
sociolinguists have suggested analyzing conversations via the study of the way turn-taking is
organized (Sacks, Schegloff, & Jefferson, 1974). Sacks et al. have shown that conversations
are governed by rules which determine, for example, turntaking within the conversation.
Using a precise methodology, they also identified the existence of cues (e.g., intonations,
overlapping) which indicate to the other partner the transitions at which he or she can speak.
Clark and co-workers (Clark, 1996; Clark & Schaefer, 1987; Clark & Wilkes-Gibbs, 1986)
have combined these two perspectives to propose a (socio)cognitive model of human

8
communication. Their idea is that the partners work together to construct reciprocal
understanding by constructing a shared communicative field (or grounding), i.e. a
representation of their prior knowledge, the situation, and their beliefs. This co-adaptation of
the participants in the conversation has been characterized, in particular, by the convergence
of the use of lexical items and the syntactic structure of speech by the partners (Brennan &
Clark, 1996; Clark & Schaefer, 1987, 1989; Fussell & Krauss, 1992; Isaacs & Clark, 1987;
Krauss & Fussell, 1991). Thus the partners share the responsibility for cooperating in the
conversation and make an effort to understand one another (Clark & Wilkes-Gibbs, 1986).
Grounding has been defined as an active process of cooperation that permits conversational
exchange and the construction of a shared referent (Clark & Brennan, 1991).

Psycholinguistic research has studied language specific processes such as verbal production
(e.g., Levelt, 1989) or comprehension (e.g., for text comprehension, Kintsch, 1998), but
rarely within the framework of spoken interactive discourse (Garrod & Pickering, 2004).
Finally, whatever the theoretical positions, researchers have tended to use similar measures
(e.g., the mean length of utterances, the number of words – tokens – the ratio of one type of
occurrence to the total number of words – type token ratio or TTR).

Within these different conceptions, human–computer dialogue has very frequently been
studied using the Wizard of Oz (Woz) technique. This technique consists of making
individuals believe that they are interacting with a system whereas, in fact, the messages they
receive are being sent by a human. Although the method is a very good compromise for the
study of human–computer dialogues, it is subject to some inherent limitations. For example,
in the case of Woz studies involving only the degradation of the human voice, the bias of the
experimenter has often been identified as a major limitation. In semi-automatic systems, the
partners response times are crucial for giving the impression that a real system is at work. A
brief overview of the results obtained using Woz studies will be presented below. However,
first of all, a presentation of the obtained CMC data may be useful for the study and
understanding of the conversational mechanisms involved in the written and spoken human–
computer dialogues.

In the majority of studies, two or more individuals have been interacting via a computer
system of the type ‘‘What You See Is What I See’’, i.e. a shared cooperative L. Le Bigot et
al. / Computers in Human Behavior 22 (2006) 467–500 469 space in which they can perform
the task. The underlying aim of this type of research has been to compare CMC situations

9
with face-to-face communication. The major results indicate that conversation is organized
differently in the written and spoken modes. For example, Ferrara, Brunner, and Whittemore
(1991) observed the emergence of a new written dialogue style which combined the
properties of both spoken and written language. McKinlay, Procter, Masting, Woodburn, and
Arnood (1994) showed that the pauses between turns are longer in written than in spoken
discourses. Moreover, and at a more general level, problems relating to the regulation and
coherence of dialogue, characterized by an impairment (but not by the absence) of the
grounding processes have also been identified (Brennan & Ohaeri, 1999; Hancok & Dunham,
2001). The authors observed a fall-off in the number of courtesy formulations and noted that
this trend was more extreme in the written mode. Furthermore, some authors have noted that
speakers change the style they use to address one another combined with a reduction of
lexical diversity characterized by a fall in the number of copulatives, subject pronouns, and
articles (Ferrara et al., 1991; Hancok & Dunham, 2001). Most authors have attributed the
characteristics of CMC to the permanence of the written medium compared to the transience
of the spoken word (for a review on CMC, see Whittaker, 2003). Finally, on the basis of
subjective measures and performance indicators Brennan and Ohaeri (1999) have established
that face-to-face communication situations are generally richer than written interactive
situations. Some of the observed CMC results have also been found in Woz situations. The
aim of these studies has generally been to distinguish human–human dialogue from human
computer dialogue or to provide an accurate characterization of human computer dialogue.

The main results have shown that communications with a system and via a system have
certain points in common such as the adaptation to the partner (Fais, 1998; Leiser, 1989) both
in the written and spoken modes, thus giving rise to the use of certain terms (e.g, third-person
pronouns, Brennan, 1991). Nevertheless, a large number of differences have also been
demonstrated. In general, users are less verbose when interacting with a system than with a
human partner (Richards & Underwood, 1985).

This difference has been shown to take the form of a reduction in lexical diversity
characterized by the low use of anaphora, connectors, ellipsis, involvement markers
(Amalberti, Carbonnel, & Falzon, 1993; Brennan, 1991; Bubb-Lewis & Scerbo, 2002;
Pierrel, 1988). At the same time, in the spoken mode, the volume of information relevant for
the research in question has been shown to be low on the initial turn in the conversation and
then to subsequently increase as the interactions continue (Amalberti et al., 1993). Generally
speaking, the number of information items has been seen to increase while verbosity

10
decreases. Zoltan- Ford (1991) has reported different results, with utterance length tending to
remain stable in the oral mode while increasing over dialogues in the written mode (see also
Hauptmann & Rudnicky, 1988; Oviatt et al., 1994). Furthermore, the number of disfluencies
(e.g., hesitations, overlappings) seems to diminish during interactions with a system
compared to dialogues with a human partner (Bortfeld, Leon, Bloom, Schober, & Brennan,
2001; Oviatt, 1995). Similarly, indicators of grounding have been seen to be less frequent
when subjects interact with a machine (Brennan, 1991; John- 470 L. Le Bigot et al. /
Computers in Human Behavior 22 (2006) 467–500 stone, Berry, NGuyen, & Asper, 1994).
For example, Brennan (1991) showed that a group interacting with a system produced fewer
first person subject pronouns. Johnstone et al. (1994) showed that confirmations, politeness,
and closures were not used as often in human–machine dialogue as in human–human
dialogue. Authors have found a number of different ways to explain the differences between
human–human dialogue and human–computer dialogue. Some have considered that the users
knowledge, i.e. the speakers model, of the systems supposed comprehension and problem-
solving abilities were erroneous (overestimated or underestimated). For example, Amalberti
et al. (1993) have suggested that individuals who only provide one item of information when
they take their first turn in the dialogue may be doubtful about the structures ability to process
more information than this. For other authors (Brennan, 1991; Bubb-Lewis & Scerbo, 2002;
Johnstone et al., 1994), the differences are primarily characterized by the absence of
grounding markers. Brennan (1998) has hypothesized that the shared context may be
impoverished in HCI situations. As far as the difference between the written and spoken
modes of interaction is concerned, a number of different interpretations have been proposed
(Oviatt, 1995; Zoltan-Ford, 1991). Zoltan-Ford (1991) discusses two explanatory hypotheses:
individuals have less confidence in the spoken mode and speaking requires less effort than
typing and is therefore less precise. Oviatt (1995) considers that the difference can be
explained by specific production levels, the underlying idea being that the planning load (in
Levelts, 1989 sense of the term) differs between the two modes of interaction.

In short, this set of results enables us to make an initial observation: There appear to be
invariants in the mechanisms involved in discourse regulation in both the written and spoken
modes. For example, the individuals real or supposed knowledge plays a decisive role on
discourse organization in human–computer dialogue interactions. Furthermore, the
representation of the partner changes as the interactions proceed, i.e. the individual adapts
lexically and syntactically (see also, the phenomenon of convergence). As interactions with a

11
machine or human partner proceed, the lexical and syntactical content of the discourse
becomes specialized. One of these characteristics is the disappearance of certain unnecessary
terms such as articles. This phenomenon of adaptation might initially simply be attributable
to the phenomenon of learning or familiarization, i.e. to the acquisition of knowledge
concerning the task to be performed and the procedures involved in performing it. However,
certain authors (Brennan, 1991; Johnstone et al., 1994) have illustrated the difference
between human–human and human–machine communication by pointing out the lack of
grounding in the latter type of dialogue. Thus, the interpretations of the mode-related
differences in interactions with or via a system are far from clear and different authors draw
different conclusions on this point. The interaction mode may have both direct and indirect
consequences on the production and the comprehensionof utterances.

If the direct and indirect consequences can be distinguished then it should be possible to gain
a better understanding of the mechanisms underpinned by each of the modes (and, indirectly,
the modalities) of interaction during the performance of a complex cognitive activity.

The indirect consequences should provide information concerning task performance, i.e.
concerning the knowledge of the L. Le Bigot et al. / Computers in Human Behavior 22
(2006) 467–500 471 system and the task acquired by individuals during the interaction. The
direct consequences should be specific to the mode of interaction. The study of the transfer of
knowledge previously acquired in one modality to the other should be a good way of
identifying what is directly dependent on the modality and what is dependent on other factors
such as the task or the use of the system.

Transfer effects
The transfer of knowledge acquired in human–human dialogue to human–computer dialogue
situations has often been taken as the starting point for studies of natural language use.
However, the concept of modal transfer to human–computer dialogue situations has rarely
been examined in itself. Nevertheless, with the emergence of multimodal systems and the
possibility to offer service continuity between different terminals (for example, from the
computer to the telephone or personal assistant), the full significance of this concept is
becoming clear. It is critical to know whether certain skills or items of knowledge acquired in
one mode of interaction are liable to impair the quality of interactions in another mode.
Within the field of computer environments, the transfer effect has been studied on the basis of
problem solving situations (OHara & Payne, 1998, 1999), the completion of complex
cognitive activities (Guthrie, 1988), or knowledge learning situations (de Croock, van
12
Merrie¨nboer, & Paas, 1998; Mayer, 1997; Moreno & Mayer, 2002; Sweller, 1998; van
Gerven, Paas, van Merrie¨nboer, & Schmidt, 2002; van Merrie¨nboer, Schuurman, de
Croock, & Paas, 2002). Nevertheless, the study of the transfer of earlier habits forms a well-
established tradition (James, 1929/1930). The idea is that initial training in one field may
have repercussions for subsequent training in other fields. If this action is positive then the
term ‘‘learning transfer’’ is used; if it is negative the effect is referred to as ‘‘interference’’.
The direction and scale of the effects depends on the level of learning in the first tasks as a
function of time or difficulty as well as on the similarity between the tasks. Thus in the case
of a goal-oriented activity, it is possible to speak of transfer due to the similarity of the
stimuli when the fact of having learned in one mode facilitates interaction in another mode.
Transfer studies have often revealed positive effects. That is the reason why much research
has been devoted to negative transfer (or interference). For example, if, in a problem-solving
situation, subjects learn an initial method then, when faced with a second problem which is
apparently similar to the preceding one, they may fail because they persist in vain with the
technique that previously worked instead of looking for another solution. Luchins (1942)
(reported in Fraisse, 1966) traditional water jug experiment illustrates this phenomenon very
well. The subjects in this experiment were faced with a series of problems requiring them to
transfer a certain quantity ofwater in order to reach the desired volume.

The first problem acted as a demonstration, the next few problems could be solved by
applying a formula, the following problems could be solved using either the same or a
simpler method and the final problem required the use of a different method. The results
showed that the majority of the subjects applied the first method they had learned to all the
problems. 472 L. Le Bigot et al. / Computers in Human Behavior 22 (2006) 467–500
Assuming that human–computer dialogue situations (goal-oriented dialog) call upon mode-
specific skills, then the acquisition of a habit or knowledge in one mode of interaction may
have a negative effect when the activity is repeated in a different interaction mode.
Individuals should continue to apply procedures that are not suited to the new interaction
mode. This hypothesis is not new. For example, Green (1955) used a manual coordination
task to test the hypothesis that transfer should be better from a difficult to an easy task than,
from an easy to a difficult task. However, no experimental study has addressed this question
in a human–computer dialogue context.

13
Research hypotheses
Based on cognitive and sociocognitive theories of learning, as well as the results of CMC and
Woz studies, we formulated a number of hypotheses concerning learning and transfer effects
during human–computer natural language dialogue. In brief, the real or supposed knowledge
possessed about the partner (machine) has a major impact on the structure and organization of
the discourse. The representation of the partner changes as the interactions progress.
Furthermore, individuals do not process information in the same way in the written and
spoken modes. The hypotheses concerning performance, discourse organization and
discourse were as follows:

Transformation of the representation of the partner. The number of relevant items of

information in the first conversational turn should increase as the interactions progress. In
contrast, the number of words used in the first conversational turn should decrease. (1b)
Representation of the partner and mode of interaction. The number of relevant items of
information in the first conversational turn should be greater in the written than in the spoken
mode. It was not, a priori, possible to formulate any hypothesis concerning the length of the
utterances. (2) Effects of learning and mode of interaction on performance and subjective
evaluations. The acquisition of knowledge concerning the system and its functioning should
reveal a learning effect on performance and on the subjective evaluation of the task. Because,
in particular, the processing times are longer in the written than in the spoken mode, there
should be an interaction between the mode of interaction and learning. More precisely, the
performance indicators should improve over the dialogues irrespectively of the mode of
interaction but should improve more rapidly in the written mode. (3) Effects of learning and
mode of interaction on discourse structure. The acquisition of knowledge concerning the
system and its functioning combined with the mode-specific characteristics should result in
an interaction mode learning effect and an interaction effect on discourse organization. More
precisely, we should observe a drop in lexical diversity over the dialogues and, again, this
should occur more rapidly in the written mode. Moreover, although the two modalities seem
to share a number of conversational characteristics, one important difference that has been
observed relates to the grounding indicators. These indicators have been found to be L. Le
Bigot et al. / Computers in Human Behavior 22 (2006) 467–500 473 weak in human–
computer dialogue as well as in written CMC situations.

On the basis of this observation, we can predict that there will be fewer grounding indicators
in written than in spoken mode in human–computer dialogue situations. (4) Effect of transfer

14
on the representation of the partner, performance and discourse structure. Changing the mode
of interaction after learning in an initial mode should result in a transfer effect both at the
level of performance and at that of the discourse indicators. Despite this, transfer should be
greater for the interaction mode in which the effort involved in learning was greater. To test
these hypotheses, an experiment was conducted using an actual natural language dialogue
system within the framework of a goal-oriented activity.

15
Chapter 3
Research Methodology
Our goal is to produce voice interactive human computer collaborative systems. Creating
such a complex system is not done in one step. We follow a development methodology
illustrated in Figure 1. There are six stages of development: creation of an underlying model
(which is based upon previous research and observations), analytical evaluation of the model,
computational implementation of the model, simulation of the model, an implemented
demonstration system and finally full-scale development of a working system. This
development paradigm directly follows the methodology of computer systems design. At
each successive stage, the development of the underlying model is made more specific. Often
it is the case that mathematical analysis of the underlying model can only be made by
simplifying the domain model (the use of exponentially distributed arrival rates is a common
assumption in systems building). However, at the simulation stage, complexities can be
introduced that could not be modeled at an analytical level (for instance, a non-

Past Research

Underlying Model

Analysis

Computer
Simulation

Determined System

16
Demo Pull-scale System

A model system development

Exponentially distributed arrival rate). In turn, the simulation may not truly model what
occurs in the actual domain. Thus each stage of the process brings the process closer to
realizing the full-scale system. Although the process generally moves forward through the
stages, there is feedback from later stages. For instance, during the mathematical analysis of
the model, flaws or deficiencies in the underlying model may be detected. This causes a
revision of the underlying model (which in turn may result in a revision of the analysis).

Past-research (the circuit fix-it shop)

The research described in this paper is a continuation of work done at Duke University, North
Carolina State University and East Carolina University on voice natural language systems
(Biermann et aL 1985; 1993; Fink & Biermann 1986; Guinn 1993a; 1993b; 1994; Hipp 1992;
Moody 1988; Smith 1994 1991; Smith, Hipp, & Biermann 1992). A significant milestone in
this line of research is represented by the implemented voice dialogue system, the Circuit
Fix-It Shop, of Smith and Hipp (Smith & Hipp 1994). The line of research that led to the
creation of the Circuit Fix-It Shop followed four phases: Data Collection First, Moody
conducted a series of "Wizard of Oz" experiments where a human played the role of the
computer in a circuit repair domain (Moody 1988).

Data Collection First,

Moody conducted a series of "Wizard of Oz" experiments where a human played the role of
the computer in a circuit repair domain (Moody 1988). Model Building From the data
collected from Moody, certain patterns of behavior were noted. The structure of the dialogue
closely modeled the structure of the underlying task tree. Utterances were used when there
was a task that a participant could not solve or if a participant was confused and needed
clarification. Clarification subdialogues seemed to occur when one participant overestimated
what the other participant was capable of doing (user model error). From this observation, the
origin of the Missing Axiom Theory arose (Smith 1992) (to be discussed further in Section ).
Another feature of the dialogues that stood out was the need for variable initiative dialogue to
support efficient collaboration. When users have knowledge they would like to apply to a
problem, it is frustrating and inefficient to force the users to behave as novices.
Implementation Using the model built from examining Moody’s corpus, a voice dialogue

17
system was implemented with several interesting features (Smith 1991; Hipp 1992): The
Missing Axiom Theory is a driving force behind the generation of system utterances. (Some
other dialogue systems have used similar approach to organizing dialogue (Cohen et al. 1989;
Gerlach & Horacek 1989; Quilici 1989; Young et al. 1989)). ¯ The system operates in four
discrete dialogue modes that specify how much control the computer will take in directing the
problemsolving. ¯ The system maintains a dynamic user model. ¯ The system uses
expectations to assist in speech recognition and clarification subdialogues. ¯ The system
employs an iterative problemsolving strategy that allows it to consider new nformation.

Experimentation
The system was tested on 8 subjects for a total of 141 human-computer dialogues. There are
several important contributions of the experiments: ¯ The performance of the system
validates that the Missing Axiom Theory can be used as dialogue control mechanism. Users
found the dialogues to be understandable and cohesive. ¯ The experiments also demonstrate
the effects of dialogue mode on natural language discourse.

Underlying Model –
The Collaborative Algorithm The agents in human-humanc ollaboration are individuals. Each
participant is a separate entity. The mental structures and mechanisms of one participant are
not directly accessible to the other. During collaboration the two participants satisfy goals and
share this information by some mean of communication. We say effective collaboration takes
place when each participant depends on the other in solving a commong oal or in solving a
goal more efficiently. It is the synergistic effect of the two problem-solvers working together
that makes the collaboration beneficial for both parties (Calistri- Yeh 1993). An overview of
our collaborative model is presented in

18
Participant A Participant B

Knowledge sharing Dialogue Knowledge sharing

Figure 2. Notice that each participant has a private plan, knowledge base (KB), and user
model. To collaborate there also must be some dialogue between the two participants. Figure
2: A Model of Collaboration There are several important assumptions made by the
Collaborative Algorithm.

1. All knowledge in an agent’s knowledge base is "true". It follows that any information
contained in an utterance is also true. This contrasts with a more realistic environment where
agents may have false beliefs or utter false statements.

2. The communication channel is perfect. Unlike true natural language communication there
is no ambiguity nor is there information loss due to speech misrecognitions.

3. The focus of the interaction is on solving the mutual problem, not on teaching. Unlike a
tutoring environment where one agent would like the other to gain the ability to solve the
problem on its own, the agents in the Collaborative Algorithm only want to solve the top
mutual goal as quickly as possible. However, we are currently working on modifications of
the Collaborative Algorithm to model the tutoring environment more closely.

Dialogue as Extension to Top-Down Problem-Solving

The basic operations of this top-down manner of problem-solving are 1) check to see whether
the goal is already satisfied and 2) decompose the goal into subgoals and solve those
subgoals. Biermann et al. point out that in a collaborative environment the problem-solvers
have another option: ask the other collaborators (Biermann et al. 1993). Smith proposes that
the role of language in a problem-solving environment is to supply "missing axioms" (Smith
1992). Smith’s view is that problem-solving involves the satisfaction of axioms. When a
problemsolver cannot satisfy an axiom trivially by looking it up in its knowledge base and the
problemsolver cannot decompose the goal into subgoals, it has the option of requesting that
the other participant satisfy that axiom. However, the problemsolver will only exercise that
option if it believes the other participant is capable of satisfying that axiom.

19
The problem-solver maintains a model of the other participant to determine what is
appropriate to request. Dialogue Mechanisms for Conflict Resolution The Collaborative
Algorithm utilizes the Missing Axiom Theory to generate goal requests. However, the issue
of how and when goals are answered by other participants is left unresolved by the Missing
Axiom Theory. The Collaborative Algorithm provides a testbed to determine effective
strategies for answering queries. An important issue in collaborative environments is the
concept of conflict resolution. Even though agents may be working together, there still may
exist conflicts between the two agents about which path should be taken in order to solve a
particular goal.

Task Initiative
An important conflict resolution mechanism is the specification of which participant has task
initiative. The participant with task initiative over a goal controls which decomposition of
that goal the two participants will use. We have developed several task initiative setting
algorithms (Guinn 1993a; 1994). The Continuous Mode algorithm sets the initiative level of a
participant based on a probabilistic analysis of the two participant’s knowledge. Using the
Continuous Mode algorithm, the initiative levels of each agent will adapt during the problem-
solving.

Negotiation
Another strategy for conflict resolution is the usage of negotiation to determine which
decomposition the two participants will use. There are innumerable task initiative setting
algorithms as well as many possible negotiation strategies. In our collaborative model, agents
can negotiate when there is a task conflict. Negotiation takes the form of presenting evidence
for a particular path choice. Plan Inference Assistance Our dialogue initiative and negotiation
algorithms require proper plan recognition. However, plan inferencing can be very difficult
when an agent has limited information about the domain and limited information about the
other agent. We have found that certain utterances which we call Non-Obligatory Summaries
can assist in plan recognition. These utterances are announcements of a goal results that have
not been explicitly asked for by the other participant. We utilize both mathematical analysis
of the Collaborative Algorithm and computer-computer simulation of the Collaborative
Algorithm to determine the advantages and disadvantages of different task initiative and
negotiation strategies and the usefulness of Non-Obligatory Summaries.

20
Mathematical Analysis
This paper concentrates on the empirical validation of the Collaborative Algorithm. However,
we briefly note some of the results from our analytical analysis: ¯ We have identified the
necessary conditions to insure the soundness and completeness of the Collaborative
Algorithm We have analytically determined the effect of certain dialogue mode setting
mechanisms using simplified user models.

Weh ave analytically determinedt he effect of certain negotiation strategies using simplified
user models. We have analytically determined the effect of a class of utterances, non-
obligatory summaries, on dialogue efficiency. A detailed account of the above analyses is
given by Guinn (Guinn 1994).

Computational Implementation
The process of implementing an algorithm is an important feedback step to the underlying
model. To create a working computational implementation of an algorithm there can be no
gaps or "handwaving" in the implementation. Every detail must be worked out. If the
underlying model is sufficiently robust, the "fleshing out" of procedures, functions and
modules should not affect the overall model. However, there are occasions when the
implementation of a seemingly simple step at the general model level turns out to have
consequences that affect the overall model. As an example, we found that the closed world
assumption of Prolog was unacceptable in an environment where participants are expected to
have incomplete knowledge.

Thus the problem-solver must be capable of handling a multi-valued logic. The Collaborative
Algorithm has been implemented on Sun workstations with a combination of Prolog and C
language routines. Most of the coding of the algorithm is in Quintus Prolog 3.1.4.

Following Prolog’s lead, the representation of domain knowledge by the Collaborative

Algorithm utilizes Horn-clause propositions. In the implementation, when a participant
wishes to place some fact, request, or other information in the message buffer, the
information is encoded in the Prolog-like form of the knowledge. For human reader
convenience, a natural language generator converts these utterances to a more readable form.

An example dialogue carried out between two computer participants is given in Table 1. In
this example, task initiative for the top goal changes three times: twice when an agent
explicitly grants the other participant control and once following a negotiation. Computer

21
Simulations The implementation of the Collaborative Algorithm has the ability to function in
using the Continuous Mode Algorithm or Random Mode Algo-

Speaker Utterance
~Who is the murderer of Lord Dunsmore?

Is it the case that Suspectl6 is the murderer of Lord

Dunsmore?

Is it the case that Suspectl6 had a motive to murder

Lord Dunsmore?

Suspectl6 had a motive to murder Lord Dunsmore.

Is it the case that Suspectl6 had access to the

poison?

I am unable to determine whether Suspectl6 had

access to the poison.

I am unable to determine whether Suspectl6 is the

murderer of Lord Dunsmore.

I am unable to determine whether Suspect16 is the

bmurderer of Lord Dunsmore.

¢Who is the murderer of Lord Dunsmore?

Is it the case that Suspectl0 is the murderer of Lord

Dunsmore?

Is it the case that Suspectl0 had a motive to murder

Lord Dunsmore?

Suspectl0 had a motive to murder Lord Dunsmore.

Is it the case that Suspectl0 had access to the

poison?

22
Suspectl0 had access to the poison.

Is it the case that Suspectl0 had an opportunity to

administer the poison?

I am unable to determine whether Suspectl0 had an

opportunity to administer the poison.

I am unable to determine whether Suspectl0 is the

murderer of Lord Dunsmore.

I am unable to determine whether Suspectl0 is the

dmurderer of Lord Dunsmore.

Is it the case that Suspect9 is the murderer of Lord

Dunsmore?

Is it the case that Suspect7 is the murderer of Lord

Dunsmore?"

I have proven that Suspect9 has a motive to murder

Lord Dunsmore and Suspect9 had access to the

#p’oison.

I have proven that Suspect7 had access to the poison,

Suspect7 had an opportunity to administer the

apoison, and Suspect7 has a criminal disposition,

hSuspect7 is the murderer of Lord Dunsmore.

Demo and Full-Scale Systems

Work is being done now to implement the Collaborative Algorithm in a human-computer
interactive environment. The previous stages of model building, mathematical analysis,
computational implementation and simulation have enabled us to elaborate on our dialogue
model before actually building our next generation human-computer interactive system.

23
Using the simulation results, we have some predictive power in determining what dialogue
mechanisms are useful for producing moreefficient joint problem-solving.

1.A knowledge distribution was chosen for a set of experiments, i.e., how much knowledge
each participant will receive is set.

2. Knowledgeis distributed between the two participants using the parameters chosen in Step
1.

3. Using the knowledge distribution created in Step 2, the collaborators solve the problem
eight times, once for each possible combination of mode setting, negotiation and summaries.

4. Steps 2 and 3 are repeated until a desired number of simulations are carried out for a
particular knowledge distribution.

5. Step 1 is repeated choosing a different knowledge distribution.

Participants
Fifty people took part in the experiment. The final number of participants was 48 (24 men, 24
women) with a mean age of 30 years (SD = 9.11). A major problem relating to system
operation was encountered by two of the participants. They were replaced by individuals
having the same profile (i.e. same sex, age group, academic level, familiarity with
computers). Each participant was given a 20€ voucher. The academic level was Mean = 13.8
years of study (SD = 2.6). They generally defined themselves as having an average level of
computer skills (M = 3.9, SD = 1.3, on a five-point scale ranging from very poor to very
good). The majority of the participants used interactive on-line or telephone services only
occasionally or rarely (33 out of 48): two participants used this type of service more than
twice a week, and the rest between one and two times a week. None of them had previously
used the application ArtimisPlanResto. The participants were distributed equally into the
experimental groups.

24
Materials
The Artimis Plan Resto application
The ArtimisPlanResto application was used in this experiment. ArtimisPlanResto is a general
public prototype service based on an intelligent natural language dialogue system, used to
locate restaurants in Paris. The service was developed using ARTIMIS technology (Sadek,
1999; Sadek, Bretier, & Panaget, 1997; Sadek et al., 1996). Users can search on the basis of
three criteria: location of the restaurant, price, and food type. The system responds by
proposing the solutions that best match the request (for an example dialogue, see Fig. 1). A
search using ArtimisPlan- Resto can be subdivided into two phases (1) A phase during which
the criteria are formulated, i.e. the user formulates a more or less precise request in natural
language with the help of the system which asks him or her to specify the criteria in greater

25
System: Welcome to PlanResto […]. What would you like?

User: I want a restaurant in the eleventh district in Paris

S: You are looking for a restaurant in the eleventh district. I have found more than 10.
You

can give more information to narrow your search […]

U: Well, an Indian restaurant

S: You are looking for an Indian restaurant in the eleventh district. I have found 7. You
can

give more information to narrow your search, for example location, consult solutions […]

U: A restaurant for 25 euros

S: You are looking for an Indian restaurant for about 25 euros in the eleventh district. I

have found one called "La Ville de Jagannath". […]. Do you want more information or

start a new research?

U: Could you repeat the name of restaurant?

S: The
Fig. Example of a spoken dialogue using the ‘‘PlanResto’’ application (note: all the search
criteria can be specified together; translated from French). detail (if they want, users can
specify multiple criteria at once). (2) A fine-tuning phase, i.e. once users have stated their
criteria, they can consult and work through the solutions returned by the system. They can
also ask for specific information or further details concerning these solutions. Users can
switch between phases 2 and 1 at any time.

26
The ArtimisPlanResto service is a telephone-based voice service (voice synthetics, voice
recognition and dialogue modules). It also operates in written mode with a Web type
interface without mouse implementation (text entered in a window and confirmation with the
Enter key – history displayed below the input window, see Fig. 2). The available interface
was only a test interface and did not correspond to what might be expected of a written
natural dialogue interface. Similarly, the system did not contain a spelling checker. The terms
were processed exactly as typed. For example, if Eiffel Tower was typed without capitals
then the system did not understand and the user had to reformulate the request. One very
important characteristic for the experimental validity of the study was that the content of the
system output was identical for the two interactive modes. Similarly, a user query in spoken
or written form produced precisely the same system response. Finally, the only way to
communicate with the system was to use language.

Dialogue scenarios
Information retrieval tasks were given to the users in the form of scenarios in order to place
them in a test situation. The users had to find a restaurant by means of the ArtimisPlanResto
service. Ideally, the restaurants corresponding to the search were predetermined.
Nevertheless, depending on the formulations chosen by the user or problems arising during
the interaction (e.g., voice recognition problems), the system could also propose other
solutions. Two or three criteria were given for each search. These took the form of the
location, the food type and the price of the restaurant. The location corresponded either to a
major site (e.g., close to the Eiffel Tower) or to a city district (e.g., in the first district). The
food type corresponded to the type of catering provided (e.g., gastronomic cuisine). The price
corresponded either to an exact amount (e.g., for 15 €) or to approximations (e.g., for
approximately 15€, for more than 15€). The number of possible responses varied between 1
and 6. The scenarios were presented in the form of lists of criteria. Twelve simple scenarios
were created by manipulating the number (2 or 3), the order (6 combina- F Web interface.
476 L. Le Bigot et al. / Computers in Human Behavior 22 (2006) 467–500 tions) of the
criteria and the number of solutions. There was one possible scenario per order and number
of criteria. Moreover, the order of the scenarios was counterbalanced using multiple Latin
Squares per scenario. The scenarios were grouped together in sets of six across two
experimental sessions (Sessions 1 and 2).

27
Mental workload self-evaluation questionnaire
The NASA-TLX (Task Load-Index, Hart & Staveland, 1988) is a weighted, bipolar,
subjective, multidimensional scale designed to evaluate the mental workload imposed by any
given task. It provides a global score based on the mean weights of six dimensions (or
sources): three of these relate to the task and three to the operators involvement in the task.
The dimensions are (1) mental activity (mental effort), (2) time pressure (sequence of sub-
tasks), (3) effort (mental and physical work), (4) performance (effectiveness in accomplishing
the task), (5) frustration, (6) physical activity (physical effort required to accomplish the
task). An earlier version of the NASA-TLX made use of nine dimensions. However, studies
showed that certain of these were redundant or failed to provide relevant information. For
example, the ‘‘stress’’ dimension was found to be equivalent to the ‘‘frustration’’ dimension.
In general, each of the six dimensions is operationalized by a question (e.g. Prinzel, Pope,
Freeman, Scerbo, & Mikulka, 2001; Appendix A, p.55). The subjects respond using Likert-
type, 20-point scales (with 1 generally representing the lowest level of mental workload and
20 representing the highest workload: these extremes are labeled ‘‘Low’’ and ‘‘High’’
respectively, with the exception of the ‘‘performance’’ dimension which is labeled ‘‘Good’’
and ‘‘Poor’’). The mental workload index is calculated on the basis of 21 measurements.
More precisely, the NASA-TLX uses the weighted mean model calculated on the basis of six
dimensions and 15 pairwise comparisons between the dimensions. These pairwise
comparisons provide a matrix consisting of six weights corresponding to the relative
importance of the six sources in the global load.

The NASA-TLX questionnaire was modified for the purposes of our experiment, with a
number of changes being made. (1) For reasons relating to the formulation and translation of
the items, the ‘‘frustration’’ dimension was replaced by a ‘‘stress’’ dimension. (2) The
questionnaire contained 7 questions. The ‘‘performance’’ dimension was subdivided into two
questions, one of these relating to effectiveness and the other to achievement of the goal. The
means of the estimates were calculated. (3) With reference to the work performed on
cognitive load (de Croock et al., 1998; van Merrie¨nboer et al., 2002), the scales used in this
experiment contained only 9 points. (4) The mental workload index was calculated using a
simple mean model (Sato et al., 1999). After these modifications, the homogeneity of the test
for the various experimental conditions was: Cronbachs a = 0.85–0.90. The homogeneity of
theoriginal NASA-TLX measured using the test–retest method was 0.83 (Scerbo, 2001).

28
Satisfaction questionnaire
A satisfaction questionnaire was drawn up on the basis of an earlier study. The original
questionnaire consisted of 25 questions. Only 11 of these were retained for the final
questionnaire (Cronbachs a = 0.81). The criteria used to choose the L. Le Bigot et al. /
Computers in Human Behavior 22 (2006) 467–500 477 questions related to satisfaction with
the ease of use of the system. A 12th question assessing general satisfaction was added to the
questionnaire (Cronbachs a = 0.82).

Procedure
The participants were greeted individually in a quiet room and asked to complete an
information questionnaire. The ArtimisPlanResto service was then presented to them. They
were told that the service functioned using a natural language dialogue but were given no
further information. Next, the subjects were given a problem-solving instruction, i.e. to use
the service to find restaurants ‘‘as quickly and accurately as possible’’. After each scenario,
the subjects completed a subjective questionnaire relating to the mental workload. The
scenarios were subdivided into two sessions of six scenarios each. After each session, the
participants completed a satisfaction questionnaire. Finally, the participants were debriefed
concerning their participation. The entire experiment did not last longer than 45 min on
average. The questionnaires were presented in paper-and-pencil form. Half of the participants
performed the first six scenarios (session 1) in spoken mode (telephone) and the other half
performed them in written mode (at a PC). Half of the participants then completed the final
six scenarios (session 2) in the same modality (identical condition) and the other half in a
different modality (different).

Design
A number of independent variables were manipulated in order to study the effects of the
interaction mode, learning, and transfer. The interaction modes in session1 (spoken vs.
written) and in session 2 (identical vs. different) were treated as between- subjects factors.
The ‘‘serial position’’ of the scenario in the session (positions 1 to 6) and ‘‘session’’ (session
1 vs. session 2) were treated as the within-subjects factors.

29
Dependent variables
Some dependent variables were based on the users utterances. The utterances were
transcribed word-for-word. In the case of the voice interactions, the transcriptions were based
on recordings of the dialogues. The written and spoken dialogues were transcribed in the
same way. Other variables were gathered on the basis of the success levels and the responses
to the questionnaires. Finally, measures relating to the dialogues were gathered on the basis
of notes relating to the exchanges between the user and the system.

Initial measures and representation of partner

Relevant information during the first turn taken. The number of items of information relevant
for the search during the users first communicative turn were measured (Amalberti et al.,
1993). An item of information was considered to be relevant for the search if it corresponded
to one of the search criteria (Food type, Location, Price) 478 L. Le Bigot et al. / Computers in
Human Behavior 22 (2006) 467–500 Words per first turn taken. The number of words output
during the participants first communicative turn was measured for each participant.

Performance and subjective levels

Success level. The success level for each session was calculated on the basis of the
correspondence between the identified restaurants and the scenario. Solution time. The total
length of the dialogue interaction with the system in order to complete a search was measured
in seconds. Number of appropriate turns taken. The number of turns taken by the user for
each dialogue was reduced by the number of additional turns taken due to a voice recognition
or spelling error. This index was calculated to cancel out errors inherent to voice recognition.
Efficiency. The number of additional turns taken was divided by the total number of turns
taken per dialogue. This index was calculated for use as a covariate for mental workload and
satisfaction. Mental workload. The global mental workload per dialogue was calculated for
each participant from the mean value of the six dimensions of the subjective mental workload
evaluation questionnaire. This index reflected the mental work performed by the user in order
to complete the task involving the system. Satisfaction level. The satisfaction level per
session for each participant was calculated on the basis of the mean of the 12 questions in the
satisfaction questionnaire.

Discourse structure
Number of words. The total number of user words per dialogue was collected for each
participant. This provided a baseline for the calculation of the other indices. Length of
30
utterances. The mean number of words for each dialogue was calculated by dividing the total
number of words by the number of utterances. Hesitations and comments were excluded from
the calculation. TTR for articles. The TTR for definite (the) and indefinite (a, some) articles
was calculated for each dialogue by dividing the number of definite and indefinite articles by
the total number of words per dialogue. TTR for personal pronouns. The TTR for first person
pronouns (such as I, me, my) and third person pronouns (such as he, she, they, them) was
calculated by dividing the number of occurrences of these pronouns by the total number of
words per dialogue. Some authors (Brennan, 1991) have indicated that the function of the
first person pronouns is meta-conversational (often appearing in indirect questions which are
typically more polite) or translating the assumption of responsibility and involvement in the
dialogue (Chafe, 1982). Literal command utterances. The level to which users and the system
shared lexicon and syntax was revealed by means of a command statement indicator. For
each dialogue, the three types of command statements ‘‘the next restaurant’’ ‘‘more
information please’’ and ‘‘consult the solutions’’ were categorized into four levels of
formulation: (1) Literal statements (presence of the three words in French forming the
commands), (2) Partial command statements (only two words), (3) Single words L. Le Bigot
et al. / Computers in Human Behavior 22 (2006) 467–500 479 (one word of the command)
and (4) Reformulations. This indicator was divided by the number of turns taken dedicated to
commands but excluding utterances due to voice recognition or spelling errors for the task for
each session.

31
Chapter 4
Results and discussion
Missing data and preliminary analyses
Missing data
The online data for one participants first dialogue was lost. It was replaced by the mean data
values for the first dialogue in the corresponding mode of interaction.

Preliminary analyses
The global success level for restaurant searches was nearly 93%. The global error level,
defined as the proportion of dialogue turns due to recognition or interpretation errors, was
14% (18% on spoken utterances, 10% on written utterances). This error level corresponds to
the results obtained by Raymond, Be´chet, De Mori, Damnati, and Este`ve (2004). These
authors calculated an error level on the basis of search criteria based on voice recognition
independently of the conduct of the dialogue. Two analyses were conducted. The first of
these related to the first six dialogues and tested hypotheses 1–3. The second related to the
entire set of dialogues distributed over the two six-dialogue sessions and investigated
hypothesis 4. The data analyses for the search criteria related only to the efficient search
utterances. The dialogue structure analyses, with the exception of the command statements
analysis, related to all turns taken during the dialogues.

Analysis 1: Learning and modality effects

(A first six dialogues)

AGeneral LinearModel (GLM) procedure was employed for the data analysis. The ‘‘serial
position of the dialogue’’ factor was processed on a within-subjects basis (1–6), while the
interaction mode (spoken vs. written) was processed on a between-subjects basis. The
interactions were studied using planned comparisons. Furthermore, trend Dialogue scenarios
were conducted in order to obtain a precise characterization of the learning curve.

32
Initial measures and representation of the partner Number of relevant items of information
and number of words on first turn. The mean scores are presented in Table 1. The analysis
revealed a main effect of serial position of the dialogue using Wilks Lambda criterion (K =
0.40, F(10,37) = 5.45, p < 0.0001) but no main effect of the interaction mode (K = 0.97,
F(2,45) = 0.69, p > 0.1). The multivariate analyses for each of the measures taken
independently revealed an effect of the serial position of the dialogue on the number of items
of information
provided in the first turn (K = 0.59, F(5,42) = 5.91, p < 0.001), but not on the

33
Table 1
Means (and standard deviations) of initial measures for the first six dialogues

Mode N Dial 1 Dial 2 Dial 3 Dial 4 Dial 5 Dial 6 Total

Relevant information on first turn Spoken mode 24 0.83 1.33 1.58 1.75 1.58 1.67
1.46
(0.816) (0.917) (0.830) (0.944) (0.881) (0.761)
Written mode 24 1.38 1.71 1.96 1.83 1.92 2.00 1.80
(0.924) (0.859) (0.955) (0.816) (0.929) (0.780)
Words on first turn Spoken mode 24 5.83 6.67 6.83 7.62 6.50 6.96 6.74
(4.08) (4.54) (4.70) (4.53) (4.63) (5.19)
Written mode 24 4.54 5.67 6.21 4.63 5.00 5.00 5.17
(3.74) (4.18) (4.20) (3.33) (3.78) (3.51)

34
Table 2:

Means (and standard-deviations) of performance measures and subjective ratings for the first
six dialogues

Mode N Dial 1 Dial 2 Dial 3 Dial 4 Dial 5 Dial 6 Total

Solution time (in second) Spoken mode 24 165.7 99.0 94.7 104.0 92.1 76.8 105.4
(131.7) (54.4) (98.0) (21.86) (13.72) (10.27)
Written mode 24 178.9 137.2 98.0 66.3 75.4 53.9 101.6
(121.8) (139.1) (11.42) (6.51) (12.00) (5.37)
Appropriate turn Spoken mode 24 7.50 4.67 3.88 4.25 4.25 3.67 4.70
(5.49) (2.33) (2.49) (3.29) (2.33) (2.44)
Written mode 24 4.26 4.13 3.33 2.71 3.79 2.38 3.43
(2.25) (3.11) (2.10) (1.43) (4.35) (1.13)
Mental workload Spoken mode 24 4.32 3.58 3.08 3.26 3.15 2.68 3.34
(1.34) (1.20) (1.34) (1.54) (1.25) (1.18)
Written mode 24 3.24 2.88 2.29 2.01 2.08 1.87 2.40
(1.11) (1.15) (1.12) (0.83) (1.36) (0.74)
Satisfaction Spoken mode 24 – – – – – – 3.83 (0.51)
Written mode 24 – – – – – – 4.40 (0.48)

35
number of words used during this first turn (K = 0.90, F(5,42) = 0.93, p > 0.1). More
precisely, the trend analysis revealed significant linear (F(1,46) = 17.53, MSe = 0.505 p <
0.001) and quadratic (F(1,46) = 9.90, MSe = 0.406, p < 0.01) components for the number of
relevant items of information on the first turn in the spoken mode but only a linear
component (F(1,46) = 8.91, MSe = 0.505, p < 0.01) for the written mode. No other
comparison reached statistical significance. In other words, only the number of relevant items
of information on the first turn in the dialogue exhibited a tendency to increase during the
initial dialogues and then stabilize for the subsequent ones, at least in the spoken mode. The
number of words varied only.

Performance measures and subjective evaluations

The mean performance and subjective evaluation scores are presented in Table 2. Solution
times and appropriate turns. Using Wilks Lambda criterion, the analysis revealed a main
effect of interaction mode (K = 0.81, F(2,45) = 5.31, p < 0.01) as well as of serial position of
the dialogue (K = 0.41, F(10,37) = 5.27, p < 0.0001). Furthermore, the interaction between
the interaction mode and the serial position of the dialogue was significant (K = 0.59,
F(10,37) = 2.58, p < 0.05). The univariate analyses for each of the measures taken
independently showed that the number of appropriate turns taken was greater in the spoken
than in the written mode, in particular in the case of dialogue 1 and dialogues 4 and 6
(respectively: F(1,46) = 7.16, MSe = 17.57, p = 0.01; F(1,46) = 4.44, MSe = 6.42, p < 0.05;
F(1,46) = 5.51, MSe = 3.63, p < 0.05), whereas there was no difference in solution times
between the two interaction modes. The multivariate analyses for each of the measures taken
independently confirmed an effect of serial position of the dialogue on solution times (K =
0.51, F(5,42) = 5.91, p < 0.0001) and the number of appropriate turns (K = 0.61, F(5,42) =
5.28, p < 0.001). More precisely, the trend analysis revealed a significant linear component of
both spoken and written mode (F(1,46) = 9.58, MSe = 7447, p < 0.01 and F(1,46) = 32.67,
MSe = 7447, p < 0.0001) on solution times. In contrast, the trend analysis revealed linear
(F(1,46) = 16.26, MSe = 8.47, p < 0.001) and quadratic (F(1,46) = 9.87, MSe = 6.01, p <
0.01) components of the spoken mode but only a linear component (F(1,46) = 4.95, MSe =
8.47, p < 0.05) of the written mode on the number of appropriate turns taken. In other words,
the solution time tended to fall over the dialogues in a similar way for both modes of
interaction. In contrast, the number of appropriate turns taken was initially greater in the
spoken than in the written mode and then fell over the dialogues in similar proportions for the
two interaction modes. Mental workload. The efficiency levels, solution times and mean
number of appropriate turns for the six dialogues were entered as covariates in the analysis.

36
The analysis revealed a main effect of interaction mode (F(1,43) = 6.59, MSe = 4.10, p <
0.05) and of the serial position of the dialogue (F(5,215) = 3.17, MSe = 0.877, p < 0.01).
More precisely, the mental workload was greater in the spoken than in the written mode.
Moreover, the trend analysis revealed a significant linear component both in the spoken
(F(1,43) = 25.11, MSe = 1.20, p < 0.0001) and in the written mode (F(1,43) = 12.49, MSe =
1.20, p < 0.001). No other comparison reached statistical

37
Table 3

Means (and standard deviations) of dialogue structure measures for the first six dialogues

Mode N Dial 1 Dial 2 Dial 3 Dial 4 Dial 5 Dial 6 Total

Total words Spoken mode 24 37.58 (29.01) 28.04 (22.03) 25.50 (21.59) 28.75 (30.68)
24.79 (18.73) 22.17 (18.78) 27.81
Written mode 24 14.57 (8.04) 15.92 (13.15) 12.42 (5.41) 9.5 (4.76) 10.13 (5.41) 8.96
(3.30) 11.91
Words per turn Spoken mode 24 4.15 (1.61) 4.57 (1.94) 5.64 (3.14) 5.26 (1.98) 5.49
(3.00) 4.89 (1.94) 5.00
Written mode 24 3.47 (2.30) 3.85 (2.82) 4.47 (3.40) 3.60 (1.81) 3.95 (2.72) 4.31 (2.92)
3.94
TTR I Spoken mode 24 0.037 (0.033) 0.037 (0.037) 0.024 (0.038) 0.038 (0.037) 0.040
(0.040) 0.031 (0.042) 0.035
Written mode 24 0.011 (0.023) 0.004 (0.016) 0.007 (0.023) 0.006 (0.031) 0.004 (0.020)
0.005 (0.024) 0.006
TTR art Spoken mode 24 0.181 (0.085) 0.212 (0.088) 0.178 (0.078) 0.198 (0.047) 0.195
(0.079) 0.212 (0.076) 0.196
Written mode 24 0.099 (0.093) 0.114 (0.102) 0.115 (0.089) 0.093 (0.084) 0.092 (0.093)
0.115 (0.092) 0.105

significance. In other words, the mental workload was greater in the spoken than in the
written mode and fell over the dialogues. Satisfaction. The efficiency levels, solution times,
mean number of appropriate turns, and mental workload for the six dialogues were entered as
covariates in the analysis The analysis revealed no effect of interaction mode (F(1,42) = 0.86,
MSe = 0.182, p > 0.1). After six dialogues, there was no significant difference in satisfaction
between the spoken and written modes. It should be noted that mental workload was most
highly correlated with satisfaction level (R = 0.62, p < 0.0001), i.e. the more the mental
workload increased, the more the level of satisfaction fell. In other words, even if the analysis
did not reveal any significant difference, the strong correlation between satisfaction and
38
mental workload suggests that the two measures are interdependent. In the case of the initial
dialogues, satisfaction seems to have been determined more by the effort involved in
performing the task than by the interaction mode.

Discourse structure
All the data was log transformed to stabilize the variance for the purposes of statistical
analysis. For reasons of clarity, only untransformed data is presented in Tables 3 and 4.
Number of words. The analysis revealed an effect of interaction mode (F(1,46) = 33.18, MSe
= 0.190, p < 0.0001), and of the serial position of the dialogue (F(5,230) = 5.77, MSe =
0.040, p < 0.0001). More precisely, the total number of words was greater in the spoken than
in the written mode. The trend analysis revealed a linear component for the spoken mode
(F(1,46) = 11.56, MSe = 0.042, p < 0.01) and the written mode (F(1,46) = 14.59, MSe =
0.042, p < 0.0001). No other comparison reached statistical significance. In other words, the
number of words used to perform the task fell in a comparable way as the interactions
proceeded irrespective of the modality. Number of words per turn. The analysis revealed an
effect of interaction mode (F(1,46) = 5.27, MSe = 0.141 p < 0.05) and of the serial position of
the dialogue (F(5,230) = 3.10, MSe = 0.112, p = 0.01). More precisely, the number of words
per turn was greater in the spoken than in the written mode. The trend analysis revealed
significant linear (F(1,46) = 4.25, MSe = 0.016, p < 0.05) and quadratic components (F(1,46)
= 6.01, MSe = 0.013, p < 0.0001) in the spoken mode. No other comparison reached
statistical significance. In other words, the length of

39
Table no;4

Means (and standard-deviations) of command statement measures for the first six dialogues

Literal re-use Single word

Spoken mode Written mode Spoken mode Written mode

Total

0.643 0.208 0.084 0.518

(0.313) (0.300) (0.194) (0.404)

Finally, the transfer effect has not yet been studied in human–human dialogue sit Uations to a
sufficient level to enable us to draw clear conclusions. Even in a natural language system
dialogue situation, a learning phase is required, at least for the pur poses of task completion.
Furthermore, there are mode-specific differences that have to be taken into account during
system design. Nevertheless, individuals tend to adapt to the system, at least when using
speech. They are involved in an active task completion process within which they consider
the system as a partner rather than as a tool. Last of all, there is a clear transfer effect when
individuals first use the system in one interaction mode and then continue their work with it
in another mode. In order to complete and extend these results, experiments should be
reproduced using other dialogue systems and applications. Also, interaction mode-dependent
learning and transfer effects should be studied within an alternating learning context (a
speech dialogue, then a written dialogue, then a speech dialogue, etc.).

Challenges
To build dialogue system developers faces many difficulties. These are due lack of
computer’s understanding of natural language. This problem arises many challenges for
developers e.g. problem of Anaphora Resolution, Inferences, Ellipsis, Pragmatics, Reference
resolutions and Clarifications, Inter sentential Ellipsis etc.[7] Besides these language problem
other challenges is to design system prompts, grounding, detection of conflicts and plan

40
recognition etc. In the spoken dialogue systems the problem related to utterance of the user
occur like ill formed utterances. These are the some of the challenges that developers have to
take care of at designing time.

General discussion
The aim of the experiment was to reveal learning, modality and transfer effects on
performance and dialogue structure during a complex, goal-oriented activity.Firstly, the
analyses showed that the representation of the partner changed over the dialogues (Amalberti
et al., 1993) and that it differed depending on the mode of interaction. The individuals
provided more relevant items of information during their first turn in the dialogue in the
written than in the spoken mode (Zoltan-Ford, 1991). The number of items of information
increased over dialogues, in particular in the spoken mode, to reach a maximum level.
Furthermore, the participants became increasingly concise, providing more and more
information in a number of words that remained relatively stable from one dialogue to the
next. This tendency was confirmed between sessions 1 and 2. Whatever the interaction mode
used for learning in session 1, the first utterance produced in the session 2 dialogues
contained multiple items of information and a minimum number of words. The differences
between the spoken and written modes became blurred after learning. More precisely, the
transfer of learning was identical whatever the modality. It is clear that the individuals
became aware of the systems potential and made full use of its capabilities. The implications
of results such as these are very encouraging for designers of natural language dialogue
systems. Furthermore, these results suggest that it should be possible to determine the
approximate level of familiarity with an information retrieval system that makes use of
natural language on the basis of the number of criteria stated and the number of words used in
the first dialogue turn. However, these results apply only to a service that involves a small
number of criteria and is used for information retrieval purposes. It would be interesting to
reproduce this type of experiment with a system using a larger number of search criteria and
in a different task context mobilizing different levels of user knowledge. Also, even though
the experiment was conducted with a real system, it was relatively controlled. The observed
effects might be due to familiarization with the task (information retrieval) rather than
familiarization with the use of the system. Furthermore, the data will only be relevant if the
tasks (scenarios) are sufficiently representative of real system
operation. Secondly, the analysis of the performance indicators revealed a fairly marked
learning effect and modality effect for the dialogues in session 1. The participants were more
efficient in the written than in the spoken mode. Nevertheless, the analysis did not indicate

41
that they reached an optimum level of efficiency any faster in the former than in the latter
mode. These results were partly corroborated by analysis 2 participent performances
continued to improve in session 2 when no change of interaction mode was introduced.
Furthermore, changing the interaction mode had a negative impact on solution times and we
even observed an increase in the number of dialogue turns required to complete the task when
participants switched from written to spoken mode. In both cases, the participants had to
adapt to the new interaction environment, with only a part of the knowledge constructed in
session 1 being transferred. In the case of the switch from the written to the spoken mode,
elements of transfer and interference were can simultaneously identified. The analysis of the
subjective rates reveals the same patterns of results. Thus transfer was greater for the
interaction mode requiring the greater learning effort, i.e. from the spoken mode to the
written mode. There are a number of considerations that might help explain these results.
First of all, an information retrieval task is a goal-oriented activity that requires the
construction of goals and sub-goals. By definition, its completion involves a cognitive cost.
Lovett, Reder, and Lebiere (1999) have pointed out that the performance of almost any
cognitive task calls on working memory for the maintenance and retrieval of information
during processing. Because of the characteristics of the spoken mode, it was more difficult
for the participants to manage both planning of their utterances and planning of the task under
a time constraint, at least during their initial interactions with the system. Secondly, the
management of the cognitive resources demanded the establishment of priorities. In the ACT-
R model (Anderson, 1993), processing activities are dependent on the current goal. The
accessibility of the declarative and procedural knowledge varies as a function of the
experiment. Thus the focusing of the subjects limited attentional resources on the goal
increases the accessibility of knowledge that is relevant to the goal when compared to other
knowledge (Lovett et al., 1999). The continuity of the activity took precedence over task
completion. Moreover, it is probable that only certain parts of the dialogue were responsible
for the effect, i.e. those requiring the greatest mental effort. Thirdly, expressing oneself in
speech and writing induces specific syntactical and lexical structures. Some of Zoltan-Fords
(1991) results were reproduced concerning the length of utterances. The spoken utterances
were longer than the written utterances but, contrary to Zoltan-Fords observations, the length
increased in the spoken mode but remained relatively stable in the written mode. The relative
stability of the number of words in the spoken mode, coupled with the reduction in the
number of turns taken between the two sessions, indicates that the users became increasingly
concise. This confirms the results obtained in Woz situations (Amalberti et al.,1993; Brennan,
42
1991; Bubb-Lewis & Scerbo, 2002; Pierrel, 1988; Richards & Underwood, 1985).
Furthermore, there were fewer grounding and involvement indicators in the written than in
the spoken mode (Chafe, 1982). This result sheds new light on Brennans analysis (1991).
Brennans experiment took place in the written mode, the conclusion being that involvement
and grounding indicators were less frequent in human–computer dialogue than human–
human dialogue situations. It is also possible to say that the use of involvement and
grounding indicators is dependent on the modality. Clark and Brennan (1991) have already
indicated that a number of factors, such as the interaction mode which imposes specific
constraints on the interaction, have an effect on grounding. Thus the establishment of the
common ground and involvement in the dialogue are more important in spoken mode. These
results are very encouraging for designers of dialogue systems. The active processes observed
in human–human dialogue situations are also at work in speech-based human–computer
dialogue situations (Allwood, Traum, & Jokinen, 2000). This interpretation is confirmed by
the analysis of the transfer effect in session 2. Some of the behaviorsfavoring grounding in
session 1 were also found in session 2. The individuals adapted to the systems lexicon and
syntactic structure when their first interaction was in spoken mode (Brennan, 1991; Fais,
1998; Leiser, 1989). Two explanatory hypotheses can be advanced to account for this. (1)
Literal restatements may characterize a more active construction of the common ground in
spoken than in written mode since, by its very nature, spoken mode should favor the
emergence of cooperative behavior. (2) Literal restatements may be characteristic of a mental
economy. More precisely, the phrases heard by the users would remain present in their
working memories (seealso, Garrod & Pickering, 2004, for syntactic priming; Levelt &
Kelter, 1982). The re-use of the material in this form would spare users the cost of
reformulating or of constructing new messages (Leiser, 1989). Fourthly, a cautionary
comment concerning all these results is warranted. We may have underestimated the impact
of voice recognition errors during the interaction is much greater than that of spelling
mistakes. For example, Baber and co-workers (Baber, Mellor, Graham, Noyes, & Tunley,
1996) demonstrated the effect of mental workload on the level of voice recognition errors.
Murray and co-workers (Murray, Jones, & Frankish, 1996b; Murray, Baber, & South, 1996a)
illustrated the effect of user stress on voice recognition. The voice recognition errors resulted
in an increase in stress and mental load which, in turn, led to an increase in the number of
voice recognition errors. In the presence of voice recognition errors, it is possible that users
employ specific error recovery procedures. If this is the case, then the effect of the interaction
mode would actually simply amount to a comparison of the written mode with a spoken
43
mode that contains failures. Nevertheless, the absence of a difference in satisfaction rating
between the written and spoken modes is encouraging. To summarize, the effect of the
interaction mode can be characterized in terms of direct consequences that are specific to
each of the interaction modes and indirect consequences that are specific to the task and the
management of the activity.
The permanence of the written trace and the transience of speech seem to be likely
explanations for some of the differences observed between the interaction modes. The time
required for reading and information processing resulted in a greater improvement in
performance in the written than in the spoken mode. The different nature of the management
of the activity is an indirect consequence of the effect of the interaction mode on task
completion. In contrast, the observation of typical behaviors such as the use of articles and
subject pronouns as a function of interaction mode is a direct consequence of the interaction
mode effect. As Oviatt (1995) has suggested, the syntactical arrangement involved in the
construction of grammatical sequences must differ between the written and spoken modes.
Levelt (1989) has pointed out that the order of the information is important in the macro-
planning of utterances. The time pressure present in the spoken mode does not generally
permit any largescale control of order. In contrast, resources are distributed in a different way
in the written mode. The arguments contained in the utterances can be reorganized. Thus the
interaction mode has an indirect effect on the activity. The task and its completion are
prioritized in the written mode.
Finally, the transfer effect has not yet been studied in human–human dialogue situations to a
sufficient level to enable us to draw clear conclusions. Even in a natural language system
dialogue situation, a learning phase is required, at least for the purposes of task completion.
Furthermore, there are mode-specific differences that have to be taken into account during
system design. Nevertheless, individuals tend to adapt to the system, at least when using
speech. They are involved in an active task completion process within which they consider
the system as a partner rather than as a tool. Last of all, there is a clear transfer effect when
individuals first use the system in one interaction mode and then continue their work with it
in another mode. In order to complete and extend these results, experiments should be
reproduced using other dialogue systems and applications. Also, interaction mode-dependent
learning and transfer effects should be studied within an alternating learning context (a
speech dialogue, then a written dialogue, then a speech dialogue, etc.).

44
Jointly modeling multiple subtasks
Traditionally, dialogue systems were built in a pipeline way. Models for each subtask were
built separately and then assembled into a whole system. Pipeline systems are conceptually
clear. Each part focuses on its own problems independently, each model is developed
independently. But there are also some limitations for pipeline systems. Firstly, it cannot
make use of the interaction information between different parts. There are significant
interactions between each subtask, the interactions are helpful to improve the system
performance. Taking the intent identification and slot filling in NLU as an example, slot
filling is helpful to intent identification, and vice versa. In a flight booking task, if only the
destination slot is labeled in a sentence, then the probability that intent of the sentence is to
tell the destination is big, on the contrary, if intent of a sentence is to tell the departure city,
then, a departure city will occur in the sentence with a big probability. If the interactions
between two subtasks can be modeled properly, it should be helpful to promote both tasks.
There are similar situations for other subtasks. Secondly, models for each subtask are trained
separately in a pipeline system. It brings difficulties from two sides. On the one hand,
developers of dialogue systems usually only get feedback from the end users, who inform
them about final performance of the systems. It is difficult to back propagate or assign final
error signals of the system to each subtask. It is also time-consuming and laborious to get
labeled data for each subtask. On the other hand, because it is difficult or impossible to
ensure fully correct in each subtask, errors in previous subtasks might hurt later subtasks. The
errors might be accumulated and enlarged through the pipeline, even becomeuncontrollable.
Thirdly, interdependences of subtasks in dialogue systems make online adaptation of systems
challenging.
For example, when one module (e.g. NLU) is retrained with new data, all the others (e.g DM)
that depend on it become sub-optimal due to the fact that they were trained on the output
distributions of the older version of the NLU module. Although the ideal solution is to retrain
the entire pipeline to ensure global optimality, this requires significant human effort .
Recent advances are exploring how to overcome above limitations of pipeline system. Jointly
modeling has been proven to be an efficient way. There are lots of work on joint models,
range from jointly modeling subtasks in NLU, DM or NLG respectively, to jointly modeling
subtasks cross NLU and DM, and even jointly modeling cross NLU, DM and NLG. Here,
“joint model” or “jointly modeling” means two or more subtasks are implemented in a single

45
model or in a strongly coupled frame, the model (or frame) is trained as a whole or
simultaneously instead of subtask by subtask.

1: Jointly modeling subtasks in NLU

In recent years, with the success of deep learning in variety applications, several different
types of deep neural networks have been used on jointly modeling of NLU subtasks. Xu, &
Sarikaya described a joint model for intent detection and slot filling based on convolutional
neural networks (CNN). The features were extracted through CNN layers and shared by the
two tasks. Experimental results on ATIS corpus and other 4 unpublished datasets showed that
the joint model outperforms triangular CRF on both intent identification and slot filling. Guo,
Tur, & Yih et al. [41] proposed a RecNN (Recursive Neural Networks) based approach to
jointly model domain identification, intent identification and sematic parsing. Compared with
some previous methods which modeled the three tasks separately, their model achieved
competitive performance on ATIS data and Microsoft Cortana dialogue data. Shi, Yao, &
Chen, et al. [42] proposed a RNN to jointly model three subtasks in NLU, and received better
performance than previous methods. LSTM (Long Short-Term Memory) was also use for
jointly modeling intent identification and sematic parsing [].
It achieved better performance on DSTC2 data than those in separate models. Besides deep
neural networks, traditional CRF were also used as joint models. Lee & Ko proposed a CRF
based new tag addition method to joint model different subtasks. The model added three
positions for speech act, operator and target, respectively before each input sentence. A single
CRF model was used to label Named Entities in the sentence as well as the three labels.
Besides, models combining CRF with CNN were also used for joint modeling.
All of them achieved better or competitive performance than current state-of-the-art
independent models. Jointly modeling subtasks has been shown a promising way for
improving NLU. Although some improvements have been achieved. There also some
problems to be made clear or solved. Among them, we think following three problems are
even important and interesting. The first problem is about how to joint model multiple-
task.There were two ways for joint modeling multiple-task in previous methods. One was to
transfer different tasks in a same type of problem, then model them in a single model. The
typical one is Lee & Ko . They transferred intent identification (a normal classification task)
into a labeling task by adding a tag position for intent before the sentence.
A single labeling model could then be used for intent and slot labeling simultaneously. It
might be though as a flat and parallel mode. Another way was to build hierarchical models
for classification and labeling separately in different layers. Both Shi, Yao, & Chen, et al.

46
[42] and Zhou, Wen, &Wang [43] employed a hierarchical neural network model for intent
classification and slot labeling. While the former put slot labeling in the bottom of
hierarchical network, intent identification on the top.
The later tried two different arrangements (one was exactly same as that in former, another is
inverse) and found the subtask on the top of the network always gained more benefits from
the hierarchical structure, no matter which subtasks were put on the top. It was not clear
which kind of joint way is better for given subtasks, there is still no full investigation on this
problem.
Almost all current joint models were supervised. They needed labeled data for all subtasks.
For deep neural network models, they demanded a large amount of data for better
performance. So, the second problem is how to get a large number of labeled data, or should
we pursue some unsupervised approach? As we have seen, unsupervised models significantly
performed worse than supervised models in single subtasks. There is still no unsupervised
approach for joint models. Could joint tasks find better unsupervised models than that in
single task by utilizing the interaction information between two or more subtasks? If it is
possible, the jointly models gain another important advantage compared with pipeline
models.
Another problem is domain adaption. It is expensive to build a large number of labeled data.
It is even expensive to build a large number of labeled data for each domain. How can we
reuse the labeled data in one domain on another domain? We have to deal with new words,
new intent, new slot values or even new slots for dialogues in new domains. There are some
beginning works on dealing with the problem. For example, Yazdani, & Henderson [45]
explored a zero-shot representation learning model for SLU in new dialogue domains.They
integrated intents (acts) and slots in a label representation learning model, different domains
used common parameters of word embeddings. The experimental results showed that the
word vector based model could adapt well to new domains. We will see in next section that
word based models could also be a possible way for cross-domain adaption in other joint
models.

47
2: Jointly modeling subtasks cross NLU and DM
Normally, DM receives semantic labels of a sentence from NLU as inputs. Some recent work
has crossed the gap, and uses the sentence as input of DM directly. Henderson, Thomson, &
Young proposed a word based RNN model for state tracking. The model mapped the ngrams
of user inputs to dialog states without using an explicit semantic decoder. Each slot was dealt
with a separate RNN model. The method was evaluated on the second Dialog State Tracking
Challenge (DSTC2) corpus and the results demonstrated consistently higher performances
compared with pipeline models. Mrksic & Kadlex, et al. [47] proposed a multi-domain state
tracking model basing on work proposed in Ref. []. The results showed the model could
achieve good performance when combined with some delexicalized features.
Reinforcement Learning (RL) was a major tool for policy modeling. Most of current joint
models including act generation employed Deep Reinforcement Learning (DRL) which was
first proposed in Ref. [48] for playing computer games. Mnih, Kavukcuoglu, & Silver, et al.
[48,49] implemented a screen based game playing agent. The agent selected game actions
according to screen images. They proposed a deep Qlearning algorithm on a Deep Q-
Network (DQN) with two layers of convolutional network and two layers of fullconnection
forward network for learning Q-function. A mapping from image inputs to game acts was
learned. By utilizing DRL, screen understanding was integrated with game operation
selection into an end-to-end model. The model achieved better or competitive scores in a
number of different games compared with human players. In fact, game playing is very
similar to dialogue. Images of screen are analogy to utterances of users, game operators are
analogy to dialogue actions. The goal of game agent is to achieve maximum long-term
rewards in multiple turns, which is also analogy to the goal in dialogues. The only difference
between games and dialogues is:
The inputs of games are continuous images, while the inputs of dialogues are discrete
language symbols. Narasimhan, Kulkarni, & Barzilay proposed a LSTM-DQN model for
text based network games, where LSTM was used to decode text inputs into a vector
representation which was then fed to a DNN to train a Q-function. It achieved better
performance than some previous models. Due to the great successes in computer games and
similarities between games and dialogues, DRL was then rapidly used for building end-to-
end joint models for dialogue systems. Cuayahuitl & Keizer used deep reinforcement
learning on a non-cooperated dialogue to generate dialogue policies, they implemented

48
experiments on a card game instead of dialogues. Cuayahuitl tried to construct a joint model
from the outputs of ASR to act generation. He used DRL in Cuayahuitl & Keizer for DM.
But they just showed some simple DRL results without performance evaluation of the
dialogue system. Zhao & Eskenazi jointly modeled state tracking and action generation in a
deep reinforcement learning frame. LSTM was used to track history of dialogue. They also
proposed a model with supervised information from dialogue state. Dialogue states were
manually designed in past dialogue systems. The design of dialogue states was subjective and
time-consuming. DRL provided an efficient way to avoid explicit design of the dialogue
states. But it was not so easy to train Q-function networks like DQN or LSTM-DQN. The
samples fed to the network was (st,at,rt,stþ1), t ¼ 1,2,…N or something like that. They were
not independent identically distributed (i.i.d.) because stþ1 (state at time t þ 1) was
determined by both st and at. The Q-function networks were therefore prone to oscillation
and difficult to converge. For training the DQN, Mnih, Kavukcuoglu, & Silver, et al. used an
experience replay mechanism proposed by Lin which randomly sampled previous
transitions, and thereby smoothed the training distribution over many past behaviors.
Recently, Hasselt, Guez, & Silver [55] leveraged the overestimation problem of standard Q-
Learning by introducing double DQN, Schaul, Quan, & Antonoglou, et al. improved the
convergence speed of DQN via prioritized experience replay. Although these measures were
workable to some extents, and have helped DRL to achieve great successes in computer
game. But there was no general guarantee for convergence of DRL. Ma & Wang [57] showed
that the Qfunction networks could converge well when a dialogue has a small act space, but
the situation became worse with the increase of act space of the dialogue. How to train Q-
function networks will be still a problem in near future.

3. Jointly modeling subtasks in NLG

Lots of work has been done on jointly modeling content selection, sentence planning and
surface realization in recent years. These methods mapped acts (semantic frames or meaning
representation) from DM to natural language sentences directly, and could mainly be divided
into two types: one was syntax based models, another was sequence based models.
Syntax based models took sentence generation as an inverse process of sentence
understanding. They adopted syntaxes similar to those in syntactical analysis, generated
natural language sentences by rewriting syntactical symbols or semantic symbols
continuously till the leave nodes (words) of syntactical tree. Most of pervious work has
focused on hand-crafted generation grammar. Belz [58] made use of template-based domain
specific rules to get CFG (Context Free Grammar) manually. Recent work automatically

49
learned a CFG from aligned corpus. Wong & Mooney [59] proposed an algorithm to learn a
synchronous context-free grammar (SCFG) automatically using sentence-semantic frame
aligned corpus. The model used left to right Earley chart to map semantic frame to natural
language sentences. It re-ranked mapping results using language model in decoding. Lu & Ng
[60] proposed a SCFG based forest-tostring generation algorithm. Konstas & Lapata used a
bottom-up chart decoder to learn a PCFG from a phrasesemantic slots pairs which were
harvested from a sentencesemantic frame alignment corpus, and re-ranked generative trees
combining n-grams and dependency relations, then outputted a sentence with top rank leave
nodes.
The outputted sentences were grammatical in syntax based methods. But it was difficult to
obtain good syntaxes. Manual rules were expensive and domain dependent, while grammar
learning relied on a large number of aligned corpus. Limited by syntaxes, all above methods
could not deal with semantic frames which did not occur in train data. Sentences generated by
these methods were lack of diversity. Sequence based models took a sentence as a sequence
of words or phrases.
They predicted the generation probability of next word basing on words already generated.
To cover the semantic frame in generated sentence, the sequence model took the dialogue
acts into consider. So, the generation probability of nth word could be estimated by
pqðwnjw1; :::; wn1; DAÞ, where DA is current dialogue act given by semantic frame, q is
parameters for the probability function. Several neural network based models, especially
RNNs, were used to approximate the probability. Zhang & Lapata described a work on using
RNNs to generate Chinese poetry. Wen, Gasic, & Kim, et al. jointly trained a forwardRNN
generator, a CNN and an inversed RNN ranker, to generate natural sentences for specific
DA. Wen, Gasic, & Mrksic used a DA control gate for sentence planning, and a LSTM for
surface realization. Two parts are jointly trained to generate grammatical sentences and
semantically insistent with DA. Mei, Bansal, & Walter proposed an end-to-end,domain
independent neural encoder-aligner-decoder model to jointly model content selection,
sentence planning and surfacerealization..
A LSTM was firstly used for encoding all semantic slots, and then the salient semantic slots
were extracted by analignment model, finally the natural sentence was generated by a
decoder. Dusek & Jurcicek proposed an attention based LSTM to encode inputted DA and
words already generated, and then a LSTM decoder together with a logistic classifier were
used to generate other words in sequence. They demonstrated their model can achieve
comparable performance with other RNN based models with less training data.
50
Compared to syntax based models, sequence based models did not need fine grain level
alignment data for training. The flexibility of sequence based models on modeling dialogue
history, context and word selection brought a diversity of sentences generated. On the other
hand, because the generation process of sequence based models was not controlled by any
specific syntax, it was unavoidable for them to generate ungrammatical sentences. It was also
possible for them to lose or repeat some slots in DA.

Joint models for subtasks across NLU, DM and NLG

It is attractive to jointly model all subtasks of a dialogue system, from NLU, DM to NLG. It
is a real end-to-end model which receives user input and output a natural language sentence
for response. But for a goal-driven task, a response is not the only thing that a model should
give at each turn. A successful agent should keep and update a task related record through the
whole dialogue in order to make proper selection on dialogue actions. For example, in flight
booking, an agent should record information such as time, departure city, and so on in order
to implement the reservation operation correctly.
Such information should be kept and can be updated through the whole dialogue. There is no
full end-to-end model for goal-driven task currently. Most of previous end-to-end models
jointly modeled parts of the subtasks as above. While for a non-goal-driven task, it is not
necessary for a chatterbot to keep so much information, a response is often the only thing that
a model should give. Some full end-to-end models for response generation have been
proposed recently. Data-driven end-to-end response generation has been received many
attentions in recent years by borrowing ideas from some other research areas. Among
different models for
response generation, the model borrowed from machine translation was first proposed by
Ritter, Cherry, & Dolan .
They made use of phrased based models in traditional statistical machine translation for
response generation in social network. The experimental results showed statistical machine
translation based models outperformed information retrieval based models. Alone this way,
by utilizing the RNN language model, Sordoni, Galley, & Auli, et al. proposed a dynamic-
context generative model to address the problem of data sparsity arising when contextual
information was integrated into classic statistical models. Serban, Sordoni, & Bengio, et al.
extended hierarchical recurrent encoderdecoder neural network proposed in Ref. , which was
original suggested for improving query suggestion, to end-toend dialogue model. With the

51
recent advances of sequence-to-sequence models in machine translation, some sequence-to-
sequence models for non-goal-driven dialogue were also proposed. Shang, Lu, & Li.

Presented a RNN based encoder-decoder Neural Responding Machine with attention signal,
Vinyals & Le proposed a LSTM based sequence-to-sequence Conversational Model. is the
typical structure of sequence-to-sequence model. Li, Galley, & Brockett, et al. used
Maximum
Mutual Information (MMI) as objective function to measures the mutual dependence between
inputs and outputs. Experimental results showed MMI helped sequence-to-sequence models
to produces more diverse responses.
These approaches jointly modeled the process from sentence inputs to generation of
responses for non-goal-driven dialogues. They did not include semantic parsing and explicit
DM, and therefore could not be applied to goal-driven dialogues directly. Dodge, Gane, &
Zhang, et al. also considered the difficulty on evaluation of these models. They therefore
proposed a set of tasks, including question answering, recommendation, question
answeringþrecommendation and chatting, to test the ability of end-to-end dialogue systems. It
might be an interesting way to bridge non-goal-driven dialogues and goal-driven dialogues.
On the other hand, there are lots of advances in sequence-to-sequence machine translation
models, which might be borrowed to build more powerful goaldrivendialogues in the future.
It is clear that a full end-to-end goal-driven dialogue system should not only output a final
sentence to respond an input sentence, but also keep and update fruitful internal
representations or memories for dialogues. The internal memories can be either explicitly
extracted and represented, or validated by some external tasks such as question answering.

Usefulness evaluation
If any computer system is to be taken up by users and customers, it must be demonstrably
useful, so ÒusefulnessÓ is the first of the more qualitative evaluation criteria we look at. The
YPA "is a natural language dialogue system that allows users to retrieve information from
British Telecom’s Yellow pages" (Kruschwitz et al., 1999, 2000).
The yellow pages contain advertisements, with the advertiser name, and contact formation.
The YPA system returns addresses and if no address found, a conversation is started and the
system asks the user for more details in order to give a user the required address. The YPA is
composed of the Dialog Manager, the Natural Language Frontend, the Query Construction
Component, and the Backend database. The Backend includes a relational database that
contains tables extracted form the Yellow pages. The conversation starts by accepting user

52
input through a graphical user interface, then the Dialogue Manager sends textual input
through the Natural Language Frontend for parsing. If no addresses are found then the Dialog
Manager sends the textual input to the Natural Language Frontend for parsing.
After that, the parse tree is sent to the Query Construction Component, which translates the
input into a database query, to query the Backend Database and return the retrieved address.
If no addresses are found, then the Dialogue Manager starts putting more questions to the
user to obtain further clarification. To evaluate the YPA, 75 queries were extracted from a
query corpus, and a response sheet was prepared to see if the returned addresses were
appropriate or not, how many dialog steps were necessary, the total number of addresses
recalled and he number of those relevant to the original query. Results show that 62 out of 75
queries managed to return addresses, and 74% of those addresses were relevant to the original
query. In a similar manner, we evaluated the ÒusefulnessÓ of the responses generated by our
Qur’an chatbot. The Qur’an chatbot was developed using our chatbot-trainingprogram, where
the English/Arabic corpus of the holy book of Islam the Qur’an is used. The QurÕan text is
available via the Internet; and in principle the QurÕan provides guidance and answers to
religious and other questions. The resulting system accepts user input in English, and answers
with appropriate ayyas from the QurÕan in the English and Arabic languages.

Localizability
The localizability aspect of evaluation tries to identify how easy it is to adapt a natural
language dialogue system to new domain or language without affecting the way it works.
With this goal in mind, some dialogue systems have been designed to be retrainable to a new
domain via a domain corpus Inui et al., (2003) introduced a natural language dialogue system
based entirely on the use of corpora. The aim of this system is to be so general that it can be
trained with any corpus in any domain and language. The system is mainly composed of three
modules, the NL Parser, the Matcher, and the NL generator as displayed in Figure. The
inputted sentence is sent to the natural language (NL) parser to analyze the input using the N-
gram-based shallow parser (Inui et al., 2002). The matcher uses keyword matching and
structural matching to find the dialogue most similar to the current flow in the Dialogue
Corpus. The matcher uses the Context Data Base, in which each dialogue act is assigned an
intention from a list containing greet, question, explain, etc. In the keyword-based matcher,
the nouns and verbs identified by the NL parser are matched with the most similar nouns and
verbs from the Dialogue Corpus. Before confirming this match, the matcher checks the
intentions associated with those nouns and verbs in the Context Database. In the structural
matcher (Koiso et al., 2002), the similarity dialogue is figured out by calculated the structural

53
distance between two sentences. In this fully corpus-base approach, a user has the choice to
select which matcher to use. The NL generator generates the system’s responses and applies
the necessary exchange on.

NL Parenthesized corpus
Parser

matcher
Dialogue Corpus

NL
Context DB
genertor

54
Figure ; Corpus-Based Approach to Building a Natural Language Dialogue System

We built a generic Java program that reads a dialogue from a corpus and maps it to the AIML
format used by the ALICE chatbot to produce different versions of the chatbot, which were
evaluated using different techniques. Table 1 displays the corpora used to train our program.
After creating AIML files for the corpora types displayed in Table 1, the Pandorabot web-
hosting service1 was used to publish different versions of corpustrained chatbots to make
them available for use over the World Wide Web. Users were asked to chat with these
versions and provide their feedback. Based on user feedback and the retraining corpus, eight
system prototypes were generated to satisfy usersÕ expectations. The key issue in building
these prototypes was how to expand the knowledge learned from the corpus to increase the
chances of finding a match. The idea of matching is based on finding the best match, which is
the longest one. Since the input will not necessary match exactly a whole sentence extracted
from the corpus, other learning techniques were adopted.

Table of training corpus

Corpus language Content

Dialogue diversity corpus English A collection of spoken
English dialogue corpus
Corpus of spoken Afrikaans Afrikaans Transcripts of general
Afrikaans conversations
British national corpus English Spoken English transcripts
in different domains
Quran in Arabic Arabic Arabic monologue text
FAQ of the school of English input and Arabic Aligned English and Arabic
computing at Leeds English output monologue text
university
FAQ of different websites English Multiple FAQs online
QA obtained from health Arabic Questions and ans
websites

In each prototype, machine-learning techniques were used and a new chatbot was tested. The
machine learning techniques ranged from a primitive simple technique to more complicated
ones. Building atomic categories and comparing the input with all atomic patterns to find a
match is an instance based learning technique.

55
However, the learning approach does not stop at this level, but it improved the matching
process by using the first word, and the most significant words. This increases the ability of
finding a nearest match by extending the knowledge base which is used during the matching
process. Four dialog transcripts generated by our Afrikaans prototype were used to measure
efficiency of adopting learning techniques. The frequency of each type of matching (atomic,
first word, significant word, and no match) in each generated dialogue was estimated and the
absolute frequencies were normalised to relative probabilities as shown in Figure 4. The
results proved that the first word and the most significant approach increase the ability to
provide answersto users and to let conversation continue.

Humanness Evaluation
The humanness aspect of a chatbot is traditionally measured by the ability of the dialogue
system to fool users into believing that they are interacting with a real human, not a virtual
one. Colby (1975) used this strategy to evaluate his chatbot PARRY that simulates a paranoid
patient. A blind test was applied by three psychiatrists questioning both PARRY and three
other human patients diagnosed as paranoid. Psychiatrists were not able to distinguish
PARRY chatbot from human patients.

The same policy was adopted in the Loebner prize competition, which allowsusers to chat
with a conversational agent for 10 minutes: if this chatting gives the Matching types in the
Afrikaans Prototype impression to users that they are dealing with a human and not a
machine, that conversational agent succeeds in the competition. However, this is a somewhat
superficial and subjective measure: 10 minutes is not really enough to judge the humanness
of a system, and the judgement depends on subjective opinions of a few users. We adopt a
novel way to measure the humanness of a natural language dialogue system by comparing
dialogues generated by the system, against ÒrealÓ human dialogues. To do this, the Wmatrix
tool (Rayson 2003) was used to compare a dialogue transcript generated via chatting with
ALICE, and real conversations extracted from different dialogue corpuses.

The comparison illustrates the strengths and weaknesses of ALICE as a human simulation,
according to lthe inguistic features: lexical, part-of-speech, and semantic differences. The
semantic comparison illustrates that explicit speech act expressions are highly used within
ALICE, an attempt to reinforce the impression that there is a real dialogue; pronouns (e.g. he,
she, it, they) are used more in ALICE, to pretend personal knowledge and contact; discourse
verbs (e.g. I think, you know, I agree) are overused in ALICE, to simulate human trust and

56
opinions during the chat; liking expressions (e.g. love, like, enjoy) are overused in ALICE, to
give an impression of human feelings.

In terms of Part-of-Speech analysis shows that singular first-person pronoun (e.g.I), second-
person pronoun (e.g. you) and proper names (e.g. Alice) are used more in ALICE, to mark
participant roles more explicitly and hence reinforce the illusion that the conversation really
has two participants.

At lexical level, analysis results shows that ALICE transcripts made more use of specific
proper names ÒAliceÓ (not surprisingly!) and ÒEmilyÓ; and of Òyou_knowÓ, where the
underscore artificially creates a new single word from two real words. Table 2 illustrates the
lexical comparison between ALICE transcripts filerepresented in column ÒO1Ó, and the real
conversation file represented in column ÒO2Ó

Item 01 02 %2 LL

Do 44 3. 90 35 0.65 + 58.69

I 54 4.79 1.25 + 48.04

We 1 0.09 2.41 - 41.15

So 1 0.09 2.19 - 36.75

And 8 0.71 3.65 - 35.19

Emily 9 0.80 0.00 + 31.46

You 72 6.38 2.82 + 28.91

This 0 0.00 1.31 - 26.80

You_know 8 0. 1 0.02 + 22.06

Sorted by likelihood value

Another way of measuring naturalness of a dialogue system was introduced by Dybkjaer et

al., in 2004. They focussed on SLDs and claimed that users should talk to the system in an
easy smoothly way. In order to do that the authors listed some aspects which must be
considered in building dialogue systems: systemÕs output language should control userÕs
language so input becomes manageable for the system; output voice should be clear,

57
intelligible and does not need extra effort to listen; contents of the systemÕs output should be
correct and relevant to the topic; adequate feedback is essential for users to feel in control
during interaction; and the structure of the dialogue should must be natural and reflects
usersÕ intuitive expectations (Dybkjaer et al., 2004).

Recently et al., (2014, p. 1) discovered that "a chat bot that provides responses based on the
participant’s input dramatically increased the perceived humanness and engagement of the
conversational agent." In their experiment researchers created a chat bot that asked
participants to describe a series of images. The interaction was either static, in which the
participants answer the base questions, or dynamic, where there is a mfollow-up
question based on the participant's response. A survey was completed by each participant
after answering questions about all images, In order to measure humanness, a question about
chat partner was provided to see if it was a human or a computer, six option scale were used:
definitely human; probably human; not sure but guess human; not sure but guess computer;
probably computer;mand definitely computer. Results reveal that 79.2% of static interview
participants thought their partner was definitely a computer, while only 41.9% of those using
the dynamic chatbot thought the same.

Language Benefit Evaluation

The Language-benefit evaluation is gained by adopting a natural language dialogue rather
than a simpler keyword-based interface; again this can be done by direct comparison to an
alternative.

Happy Assistant is "a natural language dialog-based navigation system that help users access
e-commerce sites to find relevant information about products and services" (Chai et al.,
2001a). The system is composed of three main modules: the presentation manager (PM), the
dialog Manager (DM), and the Action Manager (AC). The presentation manager applies a
shallow parsing technique to identify semantic and syntactic information of interest from the
user textual input. Then it translates the users input into a well formed XML message called
the logical form. The dialogue manager is responsible for matching concepts from a
user's query to business rules found in the knowledge domain. The business rules
consist of a list of concepts together with some meta data about the target product or service.
If a match is found, the webpage associate with that rule is presented to the user. Otherwise,
the most important missing concept is figured out by introducing questions to the user.
Control is now turned over to the action manager, which accesses the product that

58
matched the query, and if the user provides special preferences, a sorting algorithm is applied
to yield a ranked list of products.

To make users trust the system, it must offer some explanation before producing a result, so
the system summarizes the userÕs request by paraphrasing it using the context history.
presents a sample of conversation with the Happy Assistant System taken from (Chai et
al.,2001a).

U: I am looking for a notebook for my consulting business

S: Please describe your financial constraints.

In this stage a list of most popular ones is displayed

U: not important thing performance is essential

S: are you looking for something that is top of the line?

U: Yes, absolutely

S: I have found something for you.

The target notebook is displayed for the user. And beneath it a summary of the users request
displayed to explain why this product is displayed. Example of interaction with the Happy
Assistant System

Usability in this system is evaluated based on a study that designed to explore how much the
system meet usersÕ expectations in terms of ease of use, system flow, validity of the system
response, and user vocabulary. (Chai et al., 2001b). The study compared the navigation
process in the dialog system against a menu driven system to find target products. Results
show that users preferred the dialog based search over the menu driven search (79% to 21%
of the users) for the following reasons: ease of use, meeting the usersÕ needs, users like the
idea that they could express their needs in their language without being restricted to menu
choices, users feel that the computer did all the work for them, and more over users found
that the system reduce the interaction time. However, novice users preferred the menu driven
system because there is no need for typing. In a similar manner, we used the comparative
evaluation to compare the results generated by Google with the results generated by the
FAQchat system. FAQchat is another version of the chatbot-training-program described in
Section 2, where the FAQ corpus of the School of Computing (SoC) at University of Leeds is
used to train the program. The results returned from FAQchat are similar to ones generated

59
by search engines such as Google, where the outcomes are links to exact or nearest match
web pages. An evaluation sheet was prepared which contains 15 information-seeking tasks or
questions on a range of different topics related to the FAQ database. The evaluation sheet was
distributed among 21 members: nine of the staff and the rest postgraduate students.

An interface was built,which has a box to accept the user input, and a button to send this to
the system. The outcomes appear in two columns: one holds the FAQchat answers, and the
other is holds the Google answers after filtering it to the FAQ database. Users were asked to
try using the system, and state whether they were able to find answers using the FAQchat
responses, or using the Google responses; and which of the two they preferred and why.
Results in Table 3 show that 68% overall of our sample of users managed to find answers
with the FAQchat while 46% found them with Google. Since there are

Users/Tool Mean of users finding answers Proportion of users finding answers

FAQchat Google FAQchat Google

Staff 5.53 3.87 61% 43%

Student 8.8 5.87 73% 49%

Overall 14.3 9.73 68% 46%

In terms of preferences, for each question, users were asked to state which tool they prefer to
use to find the answer. The proportion of users who preferred each tool was calculated.
Results in Figure 6, show that 51% of the staff, 41% of the students, and 47% overall
preferred using FAQchat against 11% who preferred using Google. As a result, we can claim
that our users found the chatbot a novel and interesting way to access the FAQ using natural
language questions.

In 2006, Quarteroni and Manandhar applied the same approach in using the chatbot as
interface for an open-domain QA system. They developed a noninteractive QA system called
YourQA (Quarteroni and Mandandhar, 2006), which accepts a textual natural language query
from a user, takes the top 20 Google results for this query, retrieves the corresponding Web
pages, analyses them to extract answers, ranks them by relevance to the question, and obtains
a list of answers in the form of an HTML result page In order to make their system an
interactive one, Quarteroni and Mandandhar (2007) used ALICE as an interface for their

60
system. First of all they used the Wizard-of-Oz (WOz) experiment (Dahlbaeck et al., 1993)
in which a human operator hides behind a computer interface to simulate the conversation
with the user who thought that he is chatting with a computer system. A set of AIML
categories were prepared and wizard role was to choose the appropriate category and
utterance within the available set as a response to the user.

In case there was o response, the wizard would create an utterance to keep the conversation
alive. The aim of this experiment is to obtain initial data (conversations logs) to enrich
knowledge base of ALICE chatbot. After this the authors build an interactive interface for
YourQA system that consists of an applet window with a left panel where chat takes place
and a right panel where results are visualised when ready. A sample of chatting with this
version is shown in Figure

System0: Welcome!

User1: Hi

S1: Hi! How can I help you?

U2: what is a thermometer?

S2: Possible answers are in the panel on the right. Are you happy with these

answers?

U3: Yes!

S3: Good!

U4: How does it measure the temperature?

S4: Do you mean how does a thermometer measure the temperature?

U5: Yes

S5: I found the answer in the panel on the right. Can I help you further

A sample of dialogue with YourQA chat version (taken from Quarteroni and Manandhar
2007)

Results

61
Simulations were carried out using three different knowledge distributions. The data
presented in Table 2 combines the results from all of those simulations. This data gives us
empirical validation of the effects of applying certain dialogue mechanisms in our underlying
model. For instance, we can see that both the Continuous Mode Algorithm and the use of
negotiation can greatly increase the efficiency of the joint problem-solving. Furthermore, the
use of Non-Obligatory Summaries greatly increased the efficacy of negotiation (but did not
significantly affect the Continuous Mode algorithm).

Demo and Full-Scale Systems Work is being done now to implement the Collaborative
Algorithm in a human-computer interactive environment. The previous stages of model
building, mathematical analysis, computational implementation and simulation have enabled
us to elaborate on our dialogue model before actually building our next generation human-
computer interactive system. Using the simulation results, we have some predictive power in
determining what dialogue mechanisms are useful for producing more efficient joint
problem-solving.

As one stage in the development of a humancomputer dialogue system, computer-computer

dialogue simulations are useful for testing theses concerning the underlying model. Asking
whether computer-computer dialogue simulations are relevant to building human-computer
dialogue systems is tantamount to asking whether building an underlying model is relevant.
Simulations provide one tool for analyzing a complex model.

However, since simulations often involve simplifications of the actual domain being
modeled, we must not attribute results from analyzing the simulations to the realworld being
modeled in the simulation. These simulations only give information about the model. But by
observing the resulting dialogues, we can ascertain whether the underlying model generates
interactions that have the target behaviors observed in human-human dialogues. We also can
apply statisticalanalysis to this generated corpus to furtherour understanding of the
computational model.

62
Chapter 5
Conclusions
This paper gave a brief survey on goal-driven humancomputer dialogue systems, including
two often used frames and some recent research work on each subtask of dialogue systems.
However, the major concern of the paper was joint models, which model multiple subtasks of
dialogues simultaneously.

We considered jointly modeling is one important trend of dialogue systems. In fact, there was
a rapid increase of work on joint models in recent years. We tried to survey most of the
related work in the paper, classified them according to which subtasks were taken into the
joint models. As we have seen,there were several different types of join models, such as flat
or hierarchical type. There were also several different extents of integration, including
integration of several subtasks inside NLG, DM or NLG, jointly modeling subtasks crossing
NLG and DM, and jointly modeling the process through NLU, DM and NLG.

Although the joint models are at their beginning, they have shown some advantages
compared with previous pipeline models. One significant advantage of joint models is that
they could model interaction relations between different subtasks in a single model to
improve the performance of the whole system.

Another practical advantage is that joint models might remove some middle representations
which were built manually before. It might reduce the subjective of human design and assign
a dialogue model more flexible to adapt different tasks in different domains. It is not so
strange that most of recent joint models were constructed by deep neural networks. Deep
neural networks provided some uniform structures and training ways for different subtasks.
Reinforcement learning was still the main tool for DM.

Although neural networks have long been used in reinforcement learning, it is the recent
combination of reinforcement learning and different deep neural networks that brought the
deep reinforcement learning, which has pushed the researches on joint models forward
greatly. Finally, there are lots of problems waiting for solutions in joint models. How to get
enough data for building a dialogue system?, How to train a joint model efficiently?, How to
adapt a joint model in one domain to another? and so on. Some of the problems have
theoretical interests, some of them have practical appeals.

63
64
References
 Allwood, J., Traum, D., & Jokinen, K. (2000). Cooperation, dialogue and ethics.
International Journal of Human–Computer Studies, 53, 871–914.
 Amalberti, R., Carbonnel, N., & Falzon, P. (1993). User representations of computer
systems in human–
 computer speech interaction. International Journal of Man–Machine Studies, 38, 547–
566.
 Anderson, J. R. (1993). Rules of the mind. Hillsdale, NJ: Lawrence Erlbaum
Associates.
 Austin, J. (1970). Quand dire cest faire. Paris: Seuil.
 Baber, C., Mellor, B., Graham, R., Noyes, J. M., & Tunley, C. (1996). Workload and
the use of automatic
 speech recognition: The effects of time and resource demands. Speech
Communication, 20, 37–53.
 Bortfeld, H., Leon, S. D., Bloom, J. E., Schober, M. F., & Brennan, S. E. (2001).
Disfluency rates in
 conversation: Effects of age, relationship, topic, role, and gender. Language and
Speech, 44(2),
 123–147.
 Brennan, S. E. (1991). Conversation with and through computers. User Modeling and
User-Adapted
 Interaction, 1, 67–86.
 Brennan, S. E. (1998). The grounding problem in conversation with and through
computers. In S. R.
 Fussel & R. J. Kreuz (Eds.), Social and cognitive psychological approaches to
interpersonal
 communication (pp. 201–225). Hillsdale, NJ: Lawrence Erlbaum.
 Brennan, S. E., & Clark, H. H. (1996). Conceptual pacts and lexical choice in
conversation. Journal of
 Experimental Psychology: Learning, Memory, and Cognition, 22(6), 1482–1493.
 Brennan, S. E. & Ohaeri, J. O. (1999). Why do electronic conversations seem less
polite? The costs and

65
 benefits of hedging. In Proceedings of the international joint conference on work
activities, coordination,
 and aollaboration (WACC 99), pp. 227–235.
 Bubb-Lewis, C., & Scerbo, M. W. (2002). The effects of communication modes on
performance and
 discourse organization with an adaptive interface. Applied Ergonomics, 33, 15–26.
 Chafe, W. L. (1982). Integration and involvement in speaking, writing, and oral
literature. In D. Tannen
 (Ed.). Spoken and written language (Vol. IX, pp. 35–53). Norwood, NJ: ABLEX
Publishing
 Corporation.
 Clark, H. H. (1996). Using language. Cambridge: Cambridge University Press.
 Clark, H. H., & Brennan, S. E. (1991). Grounding in communication. In L. B.
Resnick, J. M. Levine, & S.
 D. Teasley (Eds.), Perspectives on socially shared cognition (pp. 127–149).
Washington D.C.: APA.
 Clark, H. H., & Schaefer, E. F. (1987). Collaborating on contributions to
conversations. Language and
 Cognitive Processes, 2, 1–23.
 Clark, H. H., & Schaefer, E. F. (1989). Contributing to discourse. Cognitive Science,
13, 259–294.
 Clark, H. H., & Wilkes-Gibbs, D. (1986). Referring as a collaborative process.
Cognition, 22, 1–39.
 L. Le Bigot et al. / Computers in Human Behavior 22 (2006) 467–500 497
 de Croock, M. B. M., van Merrie¨nboer, J. J. G., & Paas, F. G. W. C. (1998). High
versus low contextual
 interference in simulation-based training of troubleshooting skills: Effects on transfer
performance and
 invested mental effort. Computers in Human Behavior, 14(2), 249–267.
 DeVito, J. A. (1966). Psychogrammatical factors in oral and written discourse by
skilled communicators.
 Speech Monographs, 33, 73–76.

66
 DeVito, J. A. (1967). Levels of abstraction in spoken and written language. Journal of
Communication, 17,
 354–361.
 Drieman, G. H. J. (1962). Differences between written and spoken language – an
exploratory study –
 quantitative approach. Acta Psychologica, 20, 36–57.
 Fais, L. (1998). Lexical accommodation in human- and machine-interpreted
dialogues. International
 Journal of Human–Computer Studies, 48, 217–246.
 Falzon, P. (1991). Diagnosis dialogues: Modeling the interlocutors competence.
Applied Psychology: An
 International Review, 40(3), 327–349.
 Falzon, P., Amalberti, R., & Carbonnel, N. (1986). Dialogue control strategies in oral
communication. In
 K. Hopper & I. A. Newman (Eds.), Foundation for human–computer communication
(pp. 73–98).
 North-Holland: Elsevier Science Publishers.
 Ferrara, K., Brunner, H., & Whittemore, G. (1991). Interactive written discourse as an
emergent register.
 Written Communication, 8(1), 8–34.
 Fraisse, P. (1966). La Psychologie Expe´rimentale. Paris: Presse Universitaire de
France.
 Fraisse, P., & Breyton, M. (1959). Comparaisons entre les langages oral et e´crit.
Lanne´e Psychologique, 1,
 61–71.
 Fussell, S. R., & Krauss, R. M. (1992). Coordination of knowledge about in
communication: Effects of
 speakers assumptions about what others know. Journal of Personality and Social
Psychology, 62(3),
 378–391.
 Garrod, S., & Pickering, M. J. (2004). Why is conversation so easy? Trends in
Cognitive Sciences, 8(1),

67
 8–11.
 Green, R. F. (1955). Transfer of skill on a following tracking task as a function of task
difficulty (target
 size). The Journal of Psychology, 39, 355–370.
 Guthrie, J. T. (1988). Locating information in documents: Examination of a cognitive
model. Reading
 Research Quarterly, 23.
 Hancok, J. T., & Dunham, P. J. (2001). Language use in computer-mediated
communication: The role of
 coordination device. Discourse Processes, 31(1), 91–110.
 Hart, S. G., & Staveland, L. E. (1988). Development of a multi-dimensional workload
rating scale: Results
 of empirical and theoretical research. In P. A. Hancock & N. Meshkati (Eds.), Human
mental workload
 (pp. 139–183). Amsterdam, The Netherlands: Elsevier.
 Hauptmann, A. G., & Rudnicky, A. I. (1988). Talking to computers: An empirical
investigation.
 International Journal of Man–Machine Communication, 28, 583–604.
 Hidi, S. E., & Hildyard, A. (1983). The comparison of oral and written productions in
two discourse types.
 Discourse Processes, 6, 91–105.
 Isaacs, E., & Clark, H. H. (1987). References in conversation between experts and
novices. Journal of
 Experimental Psychology: General, 2(6), 26–37.
 James, H. E. O. (1929/1930). The transfer of training. British Journal of Psychology,
20,
 322–332.
 Johnstone, A., Berry, U., NGuyen, T., & Asper, A. (1994). There was a long pause:
Influencing turntaking
 behaviour in human–human and human–computer spoken dialogues. International
Journal of
 Human–Computer Studies, 41, 363–411.

68
 Kintsch, W. (1998). Comprehension: A paradigm for cognition. Cambridge, MA:
Cambridge University
 Press.
 Krauss, R., & Fussell, S. (1991). Perspective-taking in communication:
Representations of others
 knowledge in reference. Social Cognition, 9(1), 2–24.
 Leiser, R. G. (1989). Exploiting convergence to improve natural language
understanding. Interacting with
 Computers, 1(3), 284–298.
 Levelt, W. J. M. (1989). Speaking: From intention to articulation. Cambridge, MA:
The MIT Press.
 498 L. Le Bigot et al. / Computers in Human Behavior 22 (2006) 467–500
 Levelt, W. J. M., & Kelter, S. (1982). Surface form and memory in question
answering. Cognitive
 Psychology, 14, 78–106.
 Lovett, M. C., Reder, L. M., & Lebiere, C. (1999). Modeling working memory in a
unified architecture:
 An ACT-R perspective. In A. Miyake & P. Shah (Eds.), Models of working memory
(pp. 135–182).
 Cambridge, MA: Cambridge.
 Luchins, A. S. (1942). Mechanization in problem solving. Psychological Monographs,
54(248), 1–95.
 Mayer, R. E. (1997). Multimedia learning: Are we asking the right questions?
Educational Psychologist,
 32(1), 1–19.
 McKinlay, A., Procter, R., Masting, O., Woodburn, R., & Arnood, J. (1994). Studies
of turn-taking in
 computer mediated communications. Interacting with Computer, 6(2), 151–171.
 Moreno, R., & Mayer, R. E. (2002). Learning science in virtual reality multimedia
environments: Role of
 methods and media. Journal of Educational Psychology, 94, 598–610.

69
 Murray, I. R., Baber, C., & South, A. (1996a). Towards a definition and working
model of stress and its
 effects on speech. Speech Communication, 20, 3–12.
 Murray, A. C., Jones, D. M.,&Frankish, C. R. (1996b). Dialogue design in speech-
mediated data-entry: The
 role of syntactic constraintsandfeedback. International Journal ofHuman–Computer
Studies,45, 263–286.
 OHara, K. P., & Payne, S. J. (1998). The effects of operator implementation cost on
planfulness of
 problem solving and learning. Cognitive Psychology, 35, 34–70.
 OHara, K. P., & Payne, S. J. (1999). Planning and the user interface: The effects of
lockout time and error
 recovery cost. International Journal of Human–Computer Studies, 50, 41–59.
 Oviatt, S. L. (1995). Predicting spoken disfluencies during human–computer
interaction. Computer Speech
 and Language, 9, 19–35.
 Oviatt, S. L., Cohen, P. R., & Wang, M. (1994). Toward interface design for human
language technology:
 Modality and structure as determinants of linguistic complexity. Speech
Communication, 15, 283–300.
 Pierrel, J.-M. (1988). Le dialogue homme-machine en langage naturel. In Paper
presented at the Premie`res
 Journe´es Nationales du GRECO-PRC.
 Poole, M. E., & Field, T. W. (1976). A comparison of oral and written code
elaboration. Language and
 Speech, 19, 305–312.
 Prinzel, L. J., Pope, A. T., Freeman, F. G., Scerbo, M. W. & Mikulka, P. J. (2001).
Analysis of EEG and
 ERPs for psychophysiological adaptive task allocation: NASA (TM-2002-211016).
 Raymond, C., Be´chet, F., De Mori, R., Damnati, G. & Este`ve, Y. (2004). Automatic
learning of
 interpretations strategies for spoken dialogue systems. In Paper presented at the
international
70
 conference of acoustics, speech, and signal processing, Montreal, Quebec, Canada.
 Richards, M. A. & Underwood, K. M. (1985). How should people and computers
speak to each other? In
 Paper presented at the human–computer interaction – INTERACT84, London, 4–7
September.
 Sacks, H., Schegloff, E., & Jefferson, G. (1974). A simplest systematics for the
organization of turn-taking
 for conversation. Language, 50, 696–735.
 Sadek, D. (1999). Design considerations on dialogue systems: From theory to
technology – the case of
 Artimis. In Paper presented at the ESCA TR workshop on interactive dialogue for
multimodal systems
 (IDS99), Germany.
 Sadek, D., Bretier, P. & Panaget, F. (1997). Artimis: Natural dialogue meets rational
agency. In Paper
 presented at the 15th international joint conference on artificial intelligence
(IJCAI97), Nagoya, Japan.
 Sadek, D., Ferrieux, A., Cozannet, A., Bretier, P., Panaget, F. & Simonin, J. (1996).
Effective human–
 computer cooperative spoken dialogue: The AGS Demonstrator. In Paper presented at
the 4th
 international conference on spoken language processing (ICSLP96).
 Sato, N., Kamada, S., Miyake, S., Akatsu, J., Kumashiro, M., & Kume, Y. (1999).
Subjective mental
 workload in Type A women. International Journal of Industrial Ergonomics, 24, 331–
336.
 Scerbo, M. W. (2001). Stress, workload, and boredom in vigilance: A problem and an
answer. In P. A.
 Hancock & P. E. Desmond (Eds.), Stress, workload, and fatigue (pp. 267–278).
Mahwah, NJ: Erlbaum.
 Searle, J. (1969). Speech acts: An essay in the philosophy of language. Cambridge:
Cambridge University
 Press.

71
 Sweller, J. (1998). Cognitive load during problem solving: Effects on learning.
Cognitive Science, 12,
 257–285.
 L. Le Bigot et al. / Computers in Human Behavior 22 (2006) 467–500 499
 van Gerven, P. W. M., Paas, F. G. W. C., van Merrie¨nboer, J. J. G., & Schmidt, H. G.
(2002). Cognitive
 load theory and aging: Effect of worked examples on training efficiency. Learning
and Instruction, 12,
 87–105.
 van Merrie¨nboer, J. J. G., Schuurman, J. G., de Croock, M. B. M., & Paas, F. G. W.
C. (2002).
 Redirecting learners attention during training: Effects on cognitive load, transfer test,
performance and
 training efficiency. Learning and Instruction, 12, 11–37.
 Whittaker, S. (2003). Theories and methods in mediated communication. In A. C.
Graesser, M. A.
 Gernsbacher, & S. R. Goldman (Eds.), Handbook of discourse processes (pp. 243–
286). Mahwah, NJ:
 LEA.
 Zoltan-Ford, E. (1991). How to get people to say and type w

72
73

Laboratory Assignment Two
No ratings yet
Laboratory Assignment Two
22 pages
Independent Study Kaul Final
No ratings yet
Independent Study Kaul Final
7 pages
Speak Sign
No ratings yet
Speak Sign
22 pages
Analysis of Spoken Dialog Systems
No ratings yet
Analysis of Spoken Dialog Systems
23 pages
Main Project Report
No ratings yet
Main Project Report
43 pages
Sign Language Tech for B.Tech Students
No ratings yet
Sign Language Tech for B.Tech Students
67 pages
Project Report PDF
No ratings yet
Project Report PDF
41 pages
Major Project SRS
No ratings yet
Major Project SRS
33 pages
Assistive System for Disabled Communication
No ratings yet
Assistive System for Disabled Communication
31 pages
Sign Language Detection
No ratings yet
Sign Language Detection
38 pages
Microsoft Dialogue Challenge: Building End-To-End Task-Completion Dialogue Systems
No ratings yet
Microsoft Dialogue Challenge: Building End-To-End Task-Completion Dialogue Systems
5 pages
Operating Systems Lab Manual
No ratings yet
Operating Systems Lab Manual
106 pages
A Mini Project Report On: Submitted in Partial Fulfillment of The Requirements For The Award of
No ratings yet
A Mini Project Report On: Submitted in Partial Fulfillment of The Requirements For The Award of
62 pages
Thesis (Demo)
No ratings yet
Thesis (Demo)
32 pages
Voice Based System Assistant Using NLP and Deep Learning-1
No ratings yet
Voice Based System Assistant Using NLP and Deep Learning-1
82 pages
CS3351 Digital Principles Lab Manual
No ratings yet
CS3351 Digital Principles Lab Manual
76 pages
Representing Game Dialogue As Expressions in First Order Logic
No ratings yet
Representing Game Dialogue As Expressions in First Order Logic
96 pages
CALD Example Questions
No ratings yet
CALD Example Questions
4 pages
FYProject Template (PROPOSAL)
No ratings yet
FYProject Template (PROPOSAL)
22 pages
FYP Libarary Cataloge-3
No ratings yet
FYP Libarary Cataloge-3
150 pages
Interactive Medical Assistant Project
No ratings yet
Interactive Medical Assistant Project
8 pages
THCB
No ratings yet
THCB
20 pages
Digital Principles Lab Record 2023-24
No ratings yet
Digital Principles Lab Record 2023-24
61 pages
DLD EXP 1 and 2
No ratings yet
DLD EXP 1 and 2
20 pages
Adeel Report
No ratings yet
Adeel Report
45 pages
Voice Based System Assistant Using NLP and Deep Learning
No ratings yet
Voice Based System Assistant Using NLP and Deep Learning
63 pages
Final Project Report: AI Based Smart Teaching Assistant For Personalized Exam Preparation
No ratings yet
Final Project Report: AI Based Smart Teaching Assistant For Personalized Exam Preparation
16 pages
Sem 4
No ratings yet
Sem 4
2 pages
LAW SUIT Documentation 13 June
No ratings yet
LAW SUIT Documentation 13 June
60 pages
REPORTVOICE
No ratings yet
REPORTVOICE
26 pages
Voice Operated Comapanion Report - New2
No ratings yet
Voice Operated Comapanion Report - New2
19 pages
Finaldtm Report
No ratings yet
Finaldtm Report
8 pages
18CSL66 - SS Lab Manual
No ratings yet
18CSL66 - SS Lab Manual
83 pages
Human Computer Interaction
No ratings yet
Human Computer Interaction
1 page
TJAS
No ratings yet
TJAS
43 pages
Communication Aid For The Deaf
No ratings yet
Communication Aid For The Deaf
54 pages
Srs - 1 Final - 2
No ratings yet
Srs - 1 Final - 2
33 pages
Voice Programming AI Project Report
No ratings yet
Voice Programming AI Project Report
20 pages
Automatic: Question Paper Generator
No ratings yet
Automatic: Question Paper Generator
32 pages
Smart Assistant for Hearing Impaired
No ratings yet
Smart Assistant for Hearing Impaired
56 pages
Mini Project - Documentation
No ratings yet
Mini Project - Documentation
40 pages
Computational Linguistics of Human-Robot Interaction
No ratings yet
Computational Linguistics of Human-Robot Interaction
1 page
Bca Project
No ratings yet
Bca Project
63 pages
ATHENA Project Report III
No ratings yet
ATHENA Project Report III
11 pages
Seminar
No ratings yet
Seminar
2 pages
Cs3351 Digital Principles and Computer Organization Lab Record PDF
100% (2)
Cs3351 Digital Principles and Computer Organization Lab Record PDF
45 pages
Formatting and Referencing by Using The IEEE Style
No ratings yet
Formatting and Referencing by Using The IEEE Style
33 pages
English I, Subjective Question Model
No ratings yet
English I, Subjective Question Model
2 pages
Speech Recognition System Project 2023
No ratings yet
Speech Recognition System Project 2023
13 pages
DSD Student Manual AR20
No ratings yet
DSD Student Manual AR20
52 pages
Operating Systems Lab Manual (2023-2024)
No ratings yet
Operating Systems Lab Manual (2023-2024)
104 pages
Software Testing Methodologies Lab
No ratings yet
Software Testing Methodologies Lab
155 pages
Hear My Door
No ratings yet
Hear My Door
58 pages
Sign Language Detection Project Report
No ratings yet
Sign Language Detection Project Report
35 pages
18csl66 - Ss Lab Manual
50% (2)
18csl66 - Ss Lab Manual
116 pages
ECEN 203 Lec0 Course Overview
No ratings yet
ECEN 203 Lec0 Course Overview
11 pages
Sohaib Project Report
No ratings yet
Sohaib Project Report
31 pages
Index
No ratings yet
Index
6 pages
Wa0020.
No ratings yet
Wa0020.
58 pages
IoT & M2M in Healthcare & Industry
No ratings yet
IoT & M2M in Healthcare & Industry
8 pages
Tepu
No ratings yet
Tepu
38 pages
Rameesha
No ratings yet
Rameesha
53 pages
AI in Robotics: Implications Analysis
No ratings yet
AI in Robotics: Implications Analysis
84 pages
VANET Security: Attack Detection & Classification
No ratings yet
VANET Security: Attack Detection & Classification
50 pages
Blockchain's Banking Revolution
No ratings yet
Blockchain's Banking Revolution
75 pages
IoT and M2M in Healthcare & Industry
No ratings yet
IoT and M2M in Healthcare & Industry
68 pages
Thesis Document - Plag Free
No ratings yet
Thesis Document - Plag Free
80 pages
Assignment No 1 Subject: Principle of Accounting Class: Bs-Semester 1 Instructor: - Yasir Ali Rana Submission Date:06-December - 2020
No ratings yet
Assignment No 1 Subject: Principle of Accounting Class: Bs-Semester 1 Instructor: - Yasir Ali Rana Submission Date:06-December - 2020
2 pages
Plagiarism Report Abid Hussain
No ratings yet
Plagiarism Report Abid Hussain
96 pages
Thesis Document
No ratings yet
Thesis Document
76 pages
Pleurotus Yield on Wheat Straw
No ratings yet
Pleurotus Yield on Wheat Straw
6 pages
Abid Hussain..... Start Pges of Thesis
No ratings yet
Abid Hussain..... Start Pges of Thesis
8 pages
Resources 1
No ratings yet
Resources 1
16 pages
Power Supply
100% (1)
Power Supply
91 pages
Ict Final Notes
No ratings yet
Ict Final Notes
9 pages
Doble-Tdr9000 Datasheet
No ratings yet
Doble-Tdr9000 Datasheet
4 pages
Assignment MCA 103
No ratings yet
Assignment MCA 103
4 pages
Introduction To SAP BASIS Administration
No ratings yet
Introduction To SAP BASIS Administration
6 pages
6.an AUV Vision System For Target Detection and Precise Positioning
No ratings yet
6.an AUV Vision System For Target Detection and Precise Positioning
8 pages
Python and Django The Fastest Growing Web Development Technology
No ratings yet
Python and Django The Fastest Growing Web Development Technology
9 pages
Micro and Macro Tasks in Node Js
No ratings yet
Micro and Macro Tasks in Node Js
11 pages
2024 AwardofExcellenceWinners List
No ratings yet
2024 AwardofExcellenceWinners List
30 pages
Syl Pgdca 2023
No ratings yet
Syl Pgdca 2023
32 pages
Contact Center Services (CCS)
No ratings yet
Contact Center Services (CCS)
33 pages
Hardware and Networking Service Level - Ii Based On March 2022, Curriculum Version 1
100% (1)
Hardware and Networking Service Level - Ii Based On March 2022, Curriculum Version 1
65 pages
Formal Letter Format Asking For Information
100% (2)
Formal Letter Format Asking For Information
5 pages
Fortinet Transceivers
No ratings yet
Fortinet Transceivers
15 pages
AppFactory Backend Worker Architecture
No ratings yet
AppFactory Backend Worker Architecture
8 pages
SAP SNC (Supply Network Collab With SNC 7.0)
0% (1)
SAP SNC (Supply Network Collab With SNC 7.0)
26 pages
ZBrush 2021 Whats New Part2
No ratings yet
ZBrush 2021 Whats New Part2
16 pages
Tesla's Unboxed Manufacturing Process
No ratings yet
Tesla's Unboxed Manufacturing Process
4 pages
Lab4 Normalization
No ratings yet
Lab4 Normalization
2 pages
Tafl Last Min Notes
No ratings yet
Tafl Last Min Notes
19 pages
Ict Marking Scheme Notes
100% (1)
Ict Marking Scheme Notes
62 pages
School Based INSET Completion Report
No ratings yet
School Based INSET Completion Report
19 pages
Types of Hacking
No ratings yet
Types of Hacking
2 pages
PowerEdge+XE7740 1
No ratings yet
PowerEdge+XE7740 1
13 pages
Info Interns 2022
No ratings yet
Info Interns 2022
2 pages
Decision Trees & Neural Networks
No ratings yet
Decision Trees & Neural Networks
10 pages
Ideapad 500 15 Spec 2
No ratings yet
Ideapad 500 15 Spec 2
2 pages
PCI Adapter Installation Guide
No ratings yet
PCI Adapter Installation Guide
2 pages
CS2 CH1-Introduction To 8085 Microprocessor
No ratings yet
CS2 CH1-Introduction To 8085 Microprocessor
61 pages