0% found this document useful (0 votes)
209 views8 pages

Template Matching Digits

Uploaded by

Lotsa Panica
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
209 views8 pages

Template Matching Digits

Uploaded by

Lotsa Panica
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Memory & Cognition

1996,24 (2), 136-143

A template-matching pandemonium recognizes


unconstrained handwritten characters
with high accuracy
AXELLARSEN and CLAUSBUNDESEN
University of Copenhaqen, Copenhaqen, Denmark

Psychological data suggest that internal representations such as mental images can be used as tem-
plates in visual pattern recognition. But computational studies suggest that traditional template
matching is insufficient for high-accuracy recognition of real-life patterns such as handwritten char-
acters. Here we explore a model for visual pattern recognition that combines a template-matching
and a feature-analysis approach: Character classification is based on weighted evidence from a num-
ber of analyzers (demons), each of which computes the degree of match between the input charac-
ter and a stored template (a copy of a previously presented character). The template-matching pan-
demonium was trained to recognize totally unconstrained handwritten digits. With a mean of 37
templates per type of digit, the system has attained a recognition rate of 95.3%, which falls short of
human performance by only 2%-3%.

Analysis of an input pattern by template matching con- Most cases ofvisual recognition are presumably based
sists in superimposing a stored pattern (template) on the on comparing input patterns against long-term memory
input and determining the degree of match (overlap or representations instead of short-term representations
correlation) between the input and the template. In tradi- such as mental images. The nature of visual long-term
tional recognition by template matching, the input is first memory representations is controversial, but simplicity
compared against a number of templates and then classi- favors the view that visual representations in long-term
fied as a member ofthe same category as the best match- memory are similar in format to visual representations in
ing template. To compensate for irrelevant variations in short-term memory (template-like mental images).
the spatial position, size, and orientation of the input, the Suggestive empirical evidence that visual long-term
input pattern and the template can be aligned by being representations may be used as templates in recognition
shifted, size-scaled, and rotated before their degree of of simple visual patterns has come from two sources. First,
match is determined. a number of reaction time studies have shown systema-
Psychological studies suggest that simple visual pat- tic decrements in recognition speed and accuracy caused
terns such as letters or digits can be recognized by use of by irrelevant variations in visual size (see Cave & Koss-
internal representations as holistic templates. Mental im- Iyn, 1989; Larsen & Bundesen, 1978) and orientation
ages form one type of representations that seem to be (see Cooper, 1975; Jolicoeur, 1985, 1990; Jolicoeur &
used as templates. They can be transformed, and trans- Landau, 1984). The results support the notion that "vi-
formation of mental images appears to be one way of sual pattern recognition is based on position-wise compar-
achieving recognition regardless of stimulus position, ison of stimulus patterns with memory representations"
size (see Bundesen & Larsen, 1975; Jolicoeur & Besner, (Larsen & Bundesen, 1978, p. 19), and template match-
1987; Larsen & Bundesen, 1978), and orientation (see ing is the most elementary way of making position-wise
Shepard & Cooper, 1982; Shepard & Metzler, 1971; also comparisons (also see Ullman, 1989). Second, template-
see Bundesen, Larsen, & Farrell, 1981; Larsen, 1985). matching models have yielded good fits to observed visual-
confusion matrices (see Gervais, Harvey, & Roberts,
1984; Holbrook, 1975) and excellent fits to observed
variations in legibility across character sets (see Loomis,
This research was conducted at the University of Copenhagen and 1990).
supported by grants from the International Human Frontier Science
The computational efficiency of template matching
Program Organization and the Danish Ministry of Education and Re-
search. Much of the work was presented at the Sixth Conference ofthe has been seriously questioned. For several decades, com-
European Society for Cognitive Psychology in Elsinore, Denmark, mon wisdom has held that template matching is insuffi-
September 11-15, 1993 (see Larsen & Bundesen, 1993). Thanks are cient for recognition of unconstrained real-life patterns
due to G. Loftus, P. Dixon, and two anonymous reviewers for con- (see, e.g., Eysenck & Keane, 1990; Hummel & Bieder-
structive comments on an earlier draft of this article. Correspondence
should be addressed to A. Larsen, Center for Visual Cognition, De- man, 1992; Humphreys & Bruce, 1989; Lindsay & Nor-
partment of Psychology, University of Copenhagen, Njalsgade 90, man, 1972; Neisser, 1967; Reed, 1973). Recognition of
DK-2300 Copenhagen S, Denmark (e-mail: axel@[Link]). unconstrained handwritten characters is probably the most

Copyright 1996 Psychonomic Society, Inc. 136


A TEMPLATE-MATCHING PANDEMONIUM 137

frequent textbook example of a task on which template Above the level of feature demons is a number of cog-
matching should fail. nitive demons, one for each type of character. Each cog-
We recently tested the efficiency of traditional tem- nitive demon is connected to every feature demon, and
plate matching in machine recognition of totally uncon- the net input to a cognitive demon is a weighted sum of
strained handwritten digits (Larsen & Bundesen, 1992). the degrees of match determined by the feature demons.
Our learning and recognition algorithm was simple; no The activation of a cognitive demon increases with the
previous knowledge concerning handwritten digits net input to the unit.
was presupposed, and preprocessing was limited to On top ofthe processing hierarchy is a decision demon.
Gaussian smoothing and normalization with respect It classifies the input character as belonging to the type
to position, size, and orientation. For patterns pre- that corresponds to the cognitive demon with the highest
sented in a known orientation, recognition rates were activation.
69%, 77%, and 89%, respectively, when about 5, 10,
and 60 templates had been learned for each type of digit. SIMULATIONS
For patterns presented in unknown orientations, rec-
ognition rates were slightly lower. High levels of relia- General Method
bility could be attained by omitting classifications based Input patterns. The input patterns presented to the pandemo-
on weak evidence. However, at the end of training, the nium consisted of 6,000 totally unconstrained handwritten digits
(600 tokens of each of the 10 types of digit). The digits were taken
effect offurther increase in the number of templates was
from zip codes collected by the U.S. Postal Service from dead letter
extremely small, and recognition rates substantially envelopes (see Larsen & Bundesen, 1992). The number of writers
higher than 90% seemed practically impossible to is not known, but can be assumed to approach the actual number
obtain. For comparison, human subjects tested with a of samples. The material was provided in digitized and binarized
random sample of the handwritten digits (presented in form. Typical samples are shown in Figure 1.
upright orientation) achieved a mean recognition rate of Recognition algorithm. When an input pattern was presented
97%. to the system, it was processed as follows.
1. The centroid ("center ofgravity") of the character (defined as
In traditional recognition by template matching, the a figure consisting of pixels with a value of Ion a ground of pix-
classification of an input depends solely on the type of els with a value of 0) was found.
the best matching template. This is wasteful of informa- 2. The size of the character was determined as the greatest dis-
tion. The method fails to utilize the diagnostic power of tance from the centroid to any pixel that belonged to the character
templates for any given type of digit in discriminating (i.e., any pixel with a value of 1).
between digits of other types. For example, the degree of
match with a template for a digit of Type 0 yields infor-
mation about the likelihood that the input belongs to
Type 6 rather than Type 7, but this information is not uti-
lized. When the template-matching approach is com-
bined with a feature-analysis (pandemonium; Selfridge,
1959; Selfridge & Neisser, 1960) approach to recogni-
tion, the degree ofmatch with any given template may be
treated as a particular feature of the input, and the feature
may be used as positive or negative evidence for any
classification. We explored such a system. It encodes new
patterns as templates, uses the templates for feature analy-
sis, strengthens the role of useful templates, and weak-
ens the role of useless ones.

MODEL

Our pandemonium model of human character recog-


nition contains a number of feature demons (analyzers),
each of which stores a particular template. The template

o
is a copy of a previously presented character. When a
character is presented to the system for recognition, the
character is first normalized in spatial position and size
and smoothed by convolution with a circularly symmet-
ric two-dimensional Gaussian filter. Next, each feature ).
demon determines the degree of match between the char-
acter and the template stored with the demon. The de-
gree of match is a measure of the maximum correlation
that can be found between the two patterns by permitting
some displacement between their centroids. Figure 1. Examples of input patterns.
138 LARSEN AND BUNDESEN

3. The input pattern was normalized with respect to the position input." The input pattern was classified as a token of the digit type
and the size of the character. Effectively, an object-centered Carte- represented by the class node with the highest activation.
sian xy coordinate system was imposed on the input pattern so that Implementation. The algorithm was written in C. It was exe-
the origin of the coordinate system coincided with the centroid of cuted on a computer system consisting of a Digital Equipment
the character and the units of length along the x and y axes equaled Corporation Micro-VAX 2 and a DEC-station 3100.
the size of the character. 1 The input characters varied widely in size, about 7 X 12 pixels
4. The normalized input pattern was smoothed by convolution up to 53 X 53 pixels. After normalization, each character was rep-
with a two-dimensional Gaussian filter, resented in a format such that the greatest distance from the cen-
troid (i.e., the centroid pixel) to any pixel that belonged to the
G(x,y) = -I-2 (x
+
exp - ~ 2
,
2
2) character equaled 15 pixels (l unit oflength). Distances between
21ra 2a pixels were measured from center to center.
Whereas the normalized character was represented as a figure of
with standard deviation C1. pixels with a value of 1 on a ground of pixels with a value of 0, the
5. The smoothed input pattern was compared with every tem- Gaussian filter coefficients were quantized on a 7-bit scale so that
plate stored in memory. When the input was compared with a par- the sum of the coefficients was within the range of the scale. The
ticular template, the two patterns were first aligned so that their standard deviation C1 of the Gaussian filter equaled 1.5 pixels, and
centroid pixels coincided. Then the template was shifted relative to the greatest distance from the center of the Gaussian filter to a
the input by a certain number of pixels along the x axis and a cer- pixel at which the quantized filter coefficient was different from 0
tain number of pixels along the y axis. All shifts ofup to ::': 0.2 units equaled 3 pixels.
oflength (20% of the size ofa character) along the x axis and ::':0.2 The smoothed input pattern had a value of 0 at any pixel farther
units along the y axis were tested. For each shift, the product mo- than 18 pixels from the centroid. Each product moment correlation
ment correlation between the two patterns was determined. Letting between the smoothed input pattern and a template stored in mem-
r be the highest product moment correlation obtained between the ory was made by computing the Pearson product moment correla-
two patterns, the degree of match between the input and the tem- tion coefficient between the two patterns across all pixels located
plate was defined as r7 (see Larsen & Bundesen, 1992).2 The de- at or within a distance of 18 pixels from the centroid of one or the
gree of match was stored as the level of activation of a unit (tem- other pattern.
p/ate node) associated with the template.
6. The input was classified as a digit of Type 0, 1, ... , or 9. Each
digit type was represented by a unit (class node), the net input of
Experiments
which was a weighted sum of the degrees of match computed at The recognition system was trained on a fixed subset
Step 5 (the levels of activation of all template nodes). The activa- of the input patterns. The subset (training set) consisted
tion of the class node equaled the hyperbolic tangent of its net of 4,000 digits (400 randomly selected tokens of each

100
I I
, ,I .. I
,,I i j j I j
, Ii
I
,I , I , , , , I i , I. I , , I

99 ~.,.---o-
,0/
"'-Trainin9

98 I
J
0

E 97
..
-0
0:
I: 96
:2 /Test
°c
.'"u
0

0:
95
./
.---.
/
/
94
.It
93 •

92 I ,,, I I ,,,r• t,
I ,I , I ,, ! ,
I ,, ! !
I , t,
I ,,I I ,
0 50 100 150 200 250 300 350 400 450 500 550
Training Posses
Figure 2. Mean recognition rates on training (open circles) and test sets (filled circles)
as functions of the number of passes through the training set. (Results are based on five
independent simulations.)
A TEMPLATE-MATCHING PANDEMONIUM 139

type). In each pass through the training set, the 4,000 almost all of the weights on intrinsic connections were
digits were presented one at a time in random order. positive. Most weights on extrinsic connections were
At the beginning of training, only one template was negative, but a substantial proportion (21%) were posi-
stored in memory. This template was a smoothed normal- tive. Many templates had heavy positive weights on their
ized version of a randomly selected member ofthe train- intrinsic connections, light positive weights on some of
ing set. The weight on the connection from the template their extrinsic connections, and negative weights on the
node to the class node that represented the character type remaining ones.
of the template (the correct classification) was 1. Weights The reliability of the recognition responses was de-
on connections to the other nine class nodes were -0.01. fined as the relative frequency ofcorrect responses among
During the first pass through the training set, learning all responses. At the expense of getting omission errors
occurred whenever an input was incorrectly classified. (rejections), reliability of recognition responses could
In this case, the smoothed normalized version of the in- be increased by omitting responses that would have been
correctly classified input was added to the set of tem- based on weak evidence. During passes through the test
plates stored in memory. The corresponding template set, we investigated the effect of rejecting a test pattern
node was connected to the correct class node with a (omitting response to the pattern) if the difference in ac-
weight of 1 and to the other nine class nodes with weights tivation between the two most active class nodes was
of -0.01. After the first pass, about 370 templates had below a certain threshold. The rejection rate was varied
been stored (mean offive independent simulations). by changing the threshold. The results are shown in Fig-
During later passes through the training set, no new ure 4. As illustrated, reliabilities of97.0% and 99.0% were
templates were acquired, but weights on connections from
template to class nodes were adjusted. After each pre-
sentation of a training pattern, weight adjustments were
made by the delta rule (see Stone, 1986; Sutton & Barto, 0.4
1981; see also Donegan, Gluck, & Thompson, 1989).
Intrinsic
Specifically, the change in weight on the connection from
0. .3 Connections
template node i to the class node for digit type j equaled
0.025d;Ct - a). ""c:0
Here, d, is the degree of match signaled by the template '"
::l
<:T
0.2
node, and aj is the activation of the class node. If the e
...
training pattern was oftypej, then constant t = 0.7, else
t = -0.7. 0.1
After each pass through the training set, performance
was measured on a separate test set. The test set con-
sisted of 2,000 digits (200 tokens of each type). No
learning occurred during passes through the test set.
0.0 , I , I , I
_.1 1••__-
I I I

-4 -.3 -2 -1 0 2 .3 4 5
Results based on five independent simulations are
Weight
shown in Figure 2. Before weights were adjusted (i.e.,
after the first pass through the training set), the recogni-
tion rate averaged 97.7% on the training set and 93.1 % 0.5
on the test set. At the end of training, the recognition
Extrinsic
rates on the training and test sets were 99.4% and 95.3%,
respectively." For both sets, convergence to the higher 0.4 Connections
level of performance was fast.
Figure 3 shows distributions of weights on connec- ~ 0.3
tions from template to class nodes at the end of training ..
c
::l
(means of the five simulations). Let an intrinsic connec- <:T

tion be a connection from a node for a template of a cer- ...e 0.2


tain type of digit to the class node for this type of digit,
and let an extrinsic connection be a connection from a 0.1
node for a template of a certain type of digit to a class
node for a different type of digit. The upper panel of Fig-
ure 3 shows the distribution of weights on intrinsic con-
nections, and the lower panel shows the distribution of
0.0
-4
,
___.1
I

-.3
I I

-2
, !

-1 0
1._- 2 .3 4 5
weights on extrinsic connections. The distribution of Weight
weights on intrinsic connections had a mean of 1.5 and
Figure 3. Distributions of weights on connections from template to
a standard deviation of 0.7, whereas the distribution of class nodes at the end of training. Upper panel: Weights on intrinsic
weights on extrinsic connections had a mean of -0.3 and connections. Lower panel: Weights on extrinsic connections. Results
a standard deviation of 0.5. Thus, as would be expected, are based on five independent simulations.
140 LARSEN AND BUNDESEN

attained on the test set with rejection rates of 3.5% and 100 j" i' i' .; f ,. I if' " iii, I' i I' f f i ' • I

14.1%, respectively.
95
The number of templates stored in memory could be
substantially reduced with only modest effects on per- 90
formance. At the end of training, the strength of a tem- 85
plate was calculated as the sum of the squares of the .......
~
weights on the connections from the template's node to ..
1)
80
the class nodes. The weakest templates and their nodes 75
were removed, the remaining weights were readjusted by '"c
~ 70
further practice on the training set, and the recognition i:
rate was measured on the test set. The curve shown in
Figure 5 was generated by repeating this process many .'"
0
u 65

times. As can be seen, a recognition rate of95% could be


'" 60

attained with a total of 278 templates or about 28 tem- 55


plates per type of digit. With a mean of 8 templates per
50
type of digit, the recognition rate (89%) equaled the rate
we have reported for traditional recognition by template 45 I,." I,!,. I ! " , 1"" I"" I"" t"" I,!" I

matching with about 60 templates per type of digit (see o 5 10 15 20 25 30 35 40

Larsen & Bundesen, 1992). Templates per DigIt

Figure 5. Recognition rate as a function of the mean number of


DISCUSSION templates per type of digit.

Computational Efficiency
The recognition rate attained by the template-matching written digits (see, e.g., Lam & Suen, 1988; LeCun et a\.,
pandemonium model of human character recognition is 1989; Suen, 1990).
remarkable. It falls short of human performance by only It is instructive to trace the steps by which the recog-
2%-3% (see Larsen & Bundesen, 1992). Despite the ex- nition rate was improved when our system for recogni-
treme simplicity of the model, it appears to perform as tion by traditional template matching (Larsen & Bunde-
well as the most complex and successful machine algo- sen, 1992) was developed into the template-matching
rithms designed specifically for recognition of hand- pandemonium. In the system for recognition by tradi-
tional template matching, the input was always classified
as a member of the same category as the best matching
j ii" r' I, I ,; j iii i" iI' ". i I. "1'" I Ii' i, I
template. Various degrees of Gaussian smoothing were
100
....................
.---.............. tested. Without any Gaussian smoothing, the recogni-
tion rate was 86% at the end of training (i.e., when about
99 .0'
....-- e-
75 templates had been stored for each type ofdigit). The
highest recognition rate was obtained when the standard

.•.. 0'0" deviation CT of the Gaussian filter equaled 10% of the


size of a normalized character. In this case the recogni-
98
~
/.- tion rate reached a value of 89% (with about 60 tem-
'-'

;;
:c 97
/ plates per type of digit).
A small further increase in recognition rate was ob-
.5.'
Qj

'" 96

95
/
o
/ tained by improving the normalization with respect to
position. By tolerating minor displacements between
centroids of templates and input patterns (see Step 5 of
the current recognition algorithm), the recognition rate
was increased by about 0.5%.
Dramatic improvements in performance were found
when the system was rewired so that classification was
94 ' , ,J, !,. I
I I., • I, , I t I , ! •• I , , ,!!., , ! , , I
based on evidence summed across many templates rather
o 5 10 15 20 25 30 35 40
than evidence provided by the best matching template.
Rejection Rote (%) When all weights on intrinsic connections from template
Figure 4. Reliability of recognition responses as a function of re- to class nodes were equally great, and weights on extrin-
jection rate. (The reliability is the relative frequency of correct re- sic connections were close to zero, the recognition rate
sponses among all responses. A test pattern was rejected [response averaged 93% (with about 37 templates per type of
was omitted) ifthe difference in activation between the two most ac-
tive class nodes was below a certain threshold The rejection rate was digit). Adjusting the weights through learning yielded a
varied by changing the threshold. Results are based on five indepen- further increment of several percentiles up to the final
dent simulations.) recognition rate of95.3%.
A TEMPLATE-MATCHING PANDEMONIUM 141

Feature Analysis by Template Matching & Biederman, 1992; Reed, 1973). In the structural-
Our template-matching pandemonium model for char- description approach, an object is represented by a struc-
acter recognition combines a template-matching with a tural description, that is, a symbolic representation ofthe
feature-analysis approach to pattern recognition. In the geometric structure of the object. The structural descrip-
combined approach, input patterns are analyzed on a tion specifies the components of the object and ways in
number of form dimensions (dimensions of variation in which the components are interrelated. The components
form). Each form dimension is defined by a template. The may be elementary visual features (Sutherland, 1968) or
value of the input pattern on the dimension equals the de- three-dimensional primitives such as generalized cones
gree ofmatch between the input and the template. As the (Marr, 1982; Marr & Nishihara, 1978) or geons (Bieder-
value on the dimension (the degree of similarity to the man, 1987; Biederman & E. E. Cooper, 1992; see also
template) is a visual feature of the input, the process of related work by L. A. Cooper, Schacter, Ballesteros, &
template matching is a process offeature analysis. Recog- Moore, 1992).
nition of the input pattern is based on al1of the visual fea- The template-matching, feature-analysis, and structural-
tures that are extracted by template matching. Each fea- description approaches to pattern recognition are often
ture is used as positive or negative evidence for each of contrasted, but they are not incompatible. As described
the possible classifications. in the previous section, our template-matching pande-
The templates in the current pandemonium are copies monium model for character recognition combines a
of previously presented instances of handwritten digits. template-matching and a feature-analysis approach. The
Thus the long-term memory of the system contains pic- template-matching pandemonium might be included as
torial representations ofmoderate complexity. The analy- a character recognition module in a model of word
sis of a new pattern is done by cross-correlating the input recognition based on structural descriptions (symbolic
with each ofthese representations, so new patterns are an- representations of spatial arrangements of letters). Such
alyzed by being filtered through a sample of previously a model would represent a simple synthesis oftemp1ate-
experienced patterns. By the learning process governed matching, feature-analysis and structural-description ap-
by the delta rule, the role of useful feature analyzers proaches to visual pattern recognition.
(templates) is strengthened and the role of useless ana-
lyzers is weakened. If the weakest analyzers are purged Recognition of Three-Dimensional Objects
(see Figure 5), feature analyzers are effectively selected The template-matching pandemonium model can be
on the basis of their diagnostic power with respect to the extended to recognition of three-dimensional objects. A
set of relevant pattern classifications. The feature analyz- natural extension can be made by representing a three-
ers that survive correspond to those copies of previously dimensional object by a col1ection of two-dimensional
presented patterns that are highest in diagnostic power. perspective views obtained by inspecting the object from
The template-matching pandemonium throws new different viewpoints. This mode of representation is at-
light on the quest for the basic units or dimensions of vi- tractive because learning of a multiple-view representa-
sual pattern analysis. In the middle of the century, a tion seems much more easy than learning of a three-
strong plea was made for the development of a perceptual dimensional model of the object (a single viewpoint-
psychophysics (Gibson, 1950, 1959) that should include invariant description ofthe three-dimensional geometric
a psychophysics ofform (Attneave & Arnoult, 1956). structure ofthe object; see Biederman, 1987; Lowe, 1987;
Quantitative studies of shape and pattern perception Marr, 1982).
were initiated in attempts to create a psychological metric The efficiency of multiple-view representations of
of visual form (see, e.g., Brown & Owen, 1967; Michels three-dimensional objects has been explored by Edel-
& Zusne, 1965). Ad hoc geometric measures such as man and Weinshal1 (1991), who trained a simple two-
number of sides in a figure, number of angles, moments layer network to recognize wire-frame objects from dif-
of area, and moments of the perimeter were computed ferent viewpoints. The network learned to recognize 10
and correlated with behavioral measures (see, e.g., Zusne, different objects, and the extent of generalization to novel
1970). However, the fundamental problem remained un- views seemed comparable to that found in human sub-
solved. The template-matching pandemonium model jects (see Rock & DiVita, 1987; Rock, Wheeler, & Tu-
suggests that the quest for the basic units or dimensions dor, 1989; Tarr & Pinker, 1989). At a more general level,
of visual form was misguided. It suggests that there are Poggio and Edelman (1990) and Ullman and Basri
very many dimensions of variation in visual form (one (1991) have provided computational arguments that a
for each template), but no particularly basic ones. The three-dimensional object can be recognized from any
important dimensions of variation in visual form are not viewpoint by use of a multiple-view representation based
mutually orthogonal, and few, if any, are universal. on a small number of views.

Possible Role of Structural Descriptions Neural Mechanisms ofVisual Recognition


Three general approaches to visual pattern recognition The template-matching pandemonium model seems
are commonly considered: template matching, feature general1y consistent with electrophysiological findings
analysis, and structural description (see, e.g., Hummel on neural mechanisms of visual recognition. Shape-
142 LARSEN AND BUNDE SEN

based recognition is thought to be subserved by a visual BUNDESEN, c., LARSEN, A., & FARRELL, J. E. (1981). Mental transfor-
pathway running from primary visual cortex (V 1) via vi- mations of size and orientation. In 1. Long & A. D. Baddeley (Eds.),
Attention and performance IX(pp. 279-294). Hillsdale, NJ: Erlbaum.
sual areas V2 and V4 to inferotemporal cortex (IT) (see CAVE, K. R., & KOSSLYN, S. M. (1989). Varieties of size-specific visual
Goodale & Milner, 1992; Mishkin, Ungerleider, & selection. Journal ofExperimental Psychology: General, 118,148-
Macko, 1983; Ungerleider & Mishkin, 1982). In an ex- 164.
tensive investigation of the way in which visual form is COOPER, L. A. (1975). Mental rotation of random two-dimensional
represented in the IT of the macaque monkey, Tanaka, shapes. Cognitive Psychology, 7, 20-43.
COOPER, L. A., SCHACTER, D. L., BALLESTEROS, S., & MOORE, C.
Saito, Fukada, and Moriya (1991) determined the opti- (1992). Priming and recognition of transformed three-dimensional
mal stimulus of individual IT cells. In the anterior part of objects: Effects of size and reflection. Journal ofExperimental Psy-
IT (i.e., TE), most cells required moderately complex chology: Learning, Memory, & Cognition, 18,43-57.
features for their activation. Examples of features re- DONEGAN, N. H., GLUCK, M. A., & THOMPSON, R. E (1989). Integrat-
ing behavioral and biological models of classical conditioning. In
quired for activation of individual cells in TE are the R. D. Hawkins & G. H. Bower (Eds.), Computational models of
shape of an inverted T, a six-rayed star, a horizontally learning in simple neural systems: The psychology oflearning and
striped disk on top ofa vertically striped one. These crit- motivation (Vol. 23, pp. 109-156). New York: Academic Press.
ical features are two-dimensional, and responses of cells EDELMAN, S., & WEINSHALL, D. (1991). A self-organizing multiple-
were almost always selective for the orientation of stim- view representation of3D objects. Biological Cybernetics, 64, 209-
219.
uli. The selectivity to the optimal stimulus was fairly EYSENCK, M. w., & KEANE, M. T. (1990). Cognitive psychology: A stu-
sharp, but not absolute. dent's handbook. Hillsdale, NJ: Erlbaum.
Fujita, Tanaka, Ito, and Cheng (1992) found that cells FUJITA, I., TANAKA, K., ITO,M., & CHENG, K. (1992). Columns for vi-
located at nearby positions in TE had similar-but not sual features of objects in monkey inferotemporal cortex. Nature,
360,343-346.
identical-stimulus selectivity. Their results suggest that GERVAIS, M. J., HARVEY, L. 0., JR., & ROBERTS, J. O. (1984). Identifi-
TE consists of columnar modules in which cells with cation confusions among letters of the alphabet. Journal ofExperi-
overlapping but slightly different selectivity cluster to- mental Psychology: Human Perception & Performance, 10, 655-
gether. According to Tanaka (1993), the selectivity and 666.
the columnar organization are not determined by genes GIBSON, J. J. (1950). The perception of the visual world. Boston:
Houghton Mifflin.
or early development in infancy, but are subject to GIBSON, J. J. (1959). Perception as a function of stimulation. In S. Koch
changes by perceptual learning in the adult. Extended (Ed.), Psychology: A study ofa science (Vol. I, pp. 456-501). New
discrimination training with particular shapes produces York: McGraw-Hill.
a marked increase in the number ofcells that give a max- GOODALE, M. A., & MILNER, A. D. (1992). Separate visual pathways
for perception and action. Trends in Neurosciences, 15,20-25.
imal response to some of the trained stimuli. HOLBROOK, M. B. (1975). A comparison of methods for measuring the
There are strong similarities between the foregoing interletter similarity between capital letters. Perception & Psycho-
description of processing in TE and the template- physics, 17, 532-536.
matching pandemonium. Just like the template-matching HUMMEL, J. E., & BIEDERMAN, I. (1992). Dynamic binding in a neural
pandemonium, TE contains a lot of units working in network for shape recognition. Psychological Review, 99, 480-517.
HUMPHREYS, G. W., & BRUCE, V. (1989). Visual cognition: Computa-
paralle1. Each unit has a critical feature of moderate tional, experimental, and neuropsychological perspectives. Hills-
complexity (much like a single handwritten digit), and dale, NJ: Erlbaum.
each unit belongs to a cluster of units with overlapping JOLICOEUR, P. (1985). The time to name disoriented natural objects.
but slightly different selectivity (like handwritten sam- Memory & Cognition, 13,289-303.
JOLICOEUR, P. (1990). Orientation congruency effects on the identifi-
ples of a given type of digit). Units that are maximally cation of disoriented shapes. Journal ofExperimental Psychology:
sensitive to particular shapes are created by training Human Perception & Performance, 16, 351-364.
with these shapes, but the critical features appear to be JOLICOEUR, P., & BESNER, D. (1987). Additivity and interaction be-
two-dimensional rather than three-dimensional. The tween size ratio and response category in the comparison of size-
two-dimensionality accords with the conjecture that discrepant shapes. Journal of Experimental Psychology: Human
Perception & Performance, 13, 478-487.
three-dimensional objects are recognized by use of JOLICOEUR, P., & LANDAU, M. 1. (1984). Effects of orientation on the
multiple-view representations in a template-matching identification of simple visual patterns. Canadian Journal of Psy-
pandemonium. chology, 38, 80-93.
LAM, L., & SUEN,C. Y. (1988). Structural classification and relaxation
REFERENCES matching of totally unconstrained handwritten zip-code numbers.
Pattern Recognition, 21, 19-31.
ATTNEAVE, E, & ARNOULT, M. D. (1956). The quantitative study of LARSEN, A. (1985). Pattern matching: Effects of size ratio, angular dif-
shape and pattern perception. Psychological Bulletin, 53, 452-471. ference in orientation, and familiarity. Perception & Psychophysics,
BIEDERMAN, I. (1987). Recognition-by-components: A theory of 38,63-68.
human image understanding. Psychological Review, 94, 115-147. LARSEN, A., & BUNDESEN, C. (1978). Size scaling in visual pattern
BIEDERMAN, I., & COOPER, E. E. (1992). Size invariance in visual ob- recognition. Journal of Experimental Psychology: Human Percep-
ject priming. Journal ofExperimental Psychology: Human Percep- tion & Performance, 4,1-20.
tion & Performance, 18,121-133. LARSEN, A., & BUNDESEN, C. (1992). The efficiency of holistic tem-
BROWN, D. R., & OWEN, D. H. (1967). The metrics of visual form: plate matching in the recognition of unconstrained handwritten dig-
Methodological dyspepsia. Psychological Bulletin, 68, 243-259. its. Psychological Research, 54,187-193.
BUNDESEN, C., & LARSEN, A. (1975). Visual transformation of size. LARSEN, A., & BUNDESEN, C. (1993). An adaptive pandemonium of
Journal ofExperimental Psychology: Human Perception & Perfor- templates for visual pattern recognition. In C. Bundesen & A. Lar-
mance, 1,214-220. sen (Eds.), Proceedings ofthe Sixth Conference ofthe European So-
A TEMPLATE-MATCHING PANDEMONIUM 143

ciety for Cognitive Psychology: Summaries (p. 5). Copenhagen: TANAKA, K., SAITO, H.-A., FUKADA, Y, & MORIYA, M. (1991). Coding
European Society for Cognitive Psychology. visual images of objects in the inferotemporal cortex ofthe macaque
LECUN, Y, BOSER, B., DENKER, J. S., HENDERSON, D., HOWARD, R. E., monkey. Journal ofNeurophysiology, 66, 170-189.
HUBBARD, w., & JACKEL, L. D. (1989). Backpropagation applied to TARR, M. J., & PINKER, S. (1989). Mental rotation and orientation-
handwritten zip code recognition. Neural Computation, 1, 541-551. dependence in shape recognition. Cognitive Psychology, 21, 233-
LINDSAY, P. H., & NORMAN, D. A. (1972). Human information pro- 282.
cessing. New York: Academic Press. ULLMAN, S. (1989). Aligning pictorial descriptions: An approach to
LOOMIS, J. M. (1990). A model of character recognition and legibility. object recognition. Cognition, 32, 193-254.
Journal of Experimental Psychology: Human Perception & Perfor- ULLMAN, S., & BASRI, R. (1991). Recognition by linear combinations
mance, 16, 106-120. of models. IEEE Transactions on Pattern Analysis & Machine Intel-
LOWE, D. G. (1987). Three-dimensional object recognition from single ligence, 13,992-1006.
two-dimensional images. Artificial Intelligence, 31,355-395. UNGERLEIDER, L. G., & MISHKIN, M. (1982). Two cortical visual sys-
MARR, D. (1982). Vision. San Francisco: W. H. Freeman. tems. In D. 1. Ingle, M. A. Goodale, & R. 1. W. Mansfield (Eds.),
MARR, D., & NISHIHARA, H. K. (1978). Representation and recognition Analysis of visual behavior (pp. 549-586). Cambridge, MA: MIT
ofthe spatial organization of three-dimensional shapes. Proceedings Press.
ofthe Royal Society ofLondon: Series B, 200, 269-294. ZUSNE, L. (1970). Visual perception ofform. New York: Academic
MICHELS, K. M., & ZUSNE,L. (1965). Metrics of visual form. Psycho- Press.
logical Bulletin, 63, 74-86.
MISHKIN, M., UNGERLEIDER, L. G., & MACKO, K. A. (1983). Object vi- NOTES
sion and spatial vision: Two cortical pathways. Trends in Neuro-
sciences, 6, 414-417. I. Normalization with respect to orientation (see Larsen & Bunde-
NEISSER, U. (1967). Cognitive psychology. New York: App1eton-Century- sen, 1992) was not invoked. Each digit was input in the (approximately
Crofts. upright) orientation in which it had originally been written on a letter
POGGIO, T., & EDELMAN, S. (1990). A network that learns to recognize envelope.
three-dimensional objects. Nature, 343, 263-266. 2. Use of r 7 (the 7th power of r) instead of r implies that variations
REED, S. K. (1973). Psychological processes in pattern recognition. in r have little effect unless Irl is high. For example, an increment in r
New York: Academic Press. from 0 up to 0.5 makes a difference of 0.01 in r 7, but an increment in
ROCK, 1., & DIVITA, J. (1987). A case of viewer-centered object per- r from 0.5 up to I makes a difference of 0.99 in r 7.
ception. Cognitive Psychology, 19,280-293. 3. The hyperbolic tangent of x is a sigmoid function defined as
ROCK, 1., WHEELER, D., & TuDOR, L. (1989). Can we imagine how ob-
jects look from other viewpoints? Cognitive Psychology, 21, 185-210. exp(x) - exp(-x)
SELFRIDGE, O. G. (1959). Pandemonium: A paradigm for learning. In exp(x) + exp(-x)
Mechanisation ofthought processes (pp. 511-526). London: H.M.S.O.
SELFRIDGE, O. G., & NEISSER, U. (1960, August). Pattern recognition It is a linear transform of the logistic function
by machine. SCientific American, 203, 60-68.
SHEPARD, R. N., & COOPER, L. A. (1982). Mental images and their
transformations. Cambridge, MA: MIT Press. I + exp(-2x)
SHEPARD, R. N., & METZLER, J. (1971, February 19). Mental rotation
of three-dimensional objects. Science, 171, 701-703. and it squashes the continuum of real numbers into the open interval
STONE, G. O. (1986). An analysis of the delta rule and the learning of (-1,1).
statistical associations. In D. E. Rumelhart & 1. L. McClelland 4. In principle, any given pattern can be correctly classified by our
(Eds.), Parallel distributed processing: Explorations in the micro- recognition algorithm if that particular pattern is stored as a template
structure of cognition (Vol. 1, pp. 444-459). Cambridge, MA: MIT with appropriate weights. If all patterns in the training set had been
Press. stored as templates, a recognition rate of 100% on the training set
SUEN, C. Y (ED.) (1990). Frontiers in handwriting recognition. Mon- could have been obtained with weights of I and 0 on connections rep-
treal, Canada: Concordia University, Centre for Pattern Recognition resenting correct and incorrect classifications, respectively.
& Machine Intelligence. The recognition rate of 95.3% on the test set could be increased. It
SUTHERLAND, N. S. (1968). Outlines of a theory of visual pattern is a mean across five independent simulations. By running the five
recognition in animals and man. Proceedings ofthe Royal Society of pandemonia simultaneously and letting them vote on a majority basis,
London: Series B, 171,297-317. the recognition rate rose to 95.9%.
SUTTON, R. S., & BARTO, A. G. (1981). Toward a modern theory of
adaptive networks: Expectation and prediction. Psychological Re-
view, 88, 135-170.
TANAKA, K. (1993, October 29). Neuronal mechanisms of object rec- (Manuscript received May 16, 1994;
ognition. Science, 262, 685-688. revision accepted for publication February 23, 1995.)

You might also like