Recognition Geons
Recognition Geons
Irving Biederman
State University of New York at Buffalo
The perceptual recognition of objects is conceptualized to be a process in which the image of the
input is segmented at regions of deep concavity into an arrangement of simple geometric compo-
nents, such as blocks, cylinders, wedges, and cones. The fundamental assumption of the proposed
theory, recognition-by-components (RBC), is that a modest set of generalized-cone components,
called geons (N ^ 36), can be derived from contrasts of five readily detectable properties of edges in
a two-dimensional image: curvature, collinearity, symmetry, parallelism, and cotermmation. The
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
detection of these properties is generally invariant over viewing position and image quality and conse-
This document is copyrighted by the American Psychological Association or one of its allied publishers.
quently allows robust object perception when the image is projected from a novel viewpoint or is
degraded. RBC thus provides a principled account of the heretofore undecided relation between
the classic principles of perceptual organization and pattern recognition: The constraints toward
regularization (Pragnanz) characterize not the complete object but the object's components. Repre-
sentational power derives from an allowance of free combinations of the geons. A Principle of Com-
ponential Recovery can account for the major phenomena of object recognition: If an arrangement
of two or three geons can be recovered from the input, objects can be quickly recognized even when
they are occluded, novel, rotated in depth, or extensively degraded. The results from experiments
on the perception of briefly presented pictures by human observers provide empirical support for
the theory.
Any single object can project an infinity of image configura- simple volumetric terms, such as "a block," "a cylinder" "a
tions to the retina. The orientation of the object to the viewer funnel or truncated cone." We can look at the zig-zag horizontal
can vary continuously, each giving rise to a different two-dimen- brace as a texture region or zoom in and interpret it as a series
sional projection. The object can be occluded by other objects of connected blocks. The same is true of the mass at the lower
or texture fields, as when viewed behind foliage. The object left: we can see it as a texture area or zoom in and parse it into
need not be presented as a full-colored textured image but in- its various bumps.
stead can be a simplified line drawing. Moreover, the object can Although we know that it is not a familiar object, after a while
even be missing some of its parts or be a novel exemplar of its we can say what it resembles: "A New York City hot dog cart,
particular category. But it is only with rare exceptions that an with the large block being the central food storage and cooking
image fails to be rapidly and readily classified, either as an in- area, the rounded part underneath as a wheel, the large arc on
stance of a familiar object category or as an instance that cannot the right as a handle, the funnel as an orange juice squeezer and
be so classified (itself a form of classification). the various vertical pipes as vents or umbrella supports." It is
not a good cart, but we can see how it might be related to one.
It is like a 10-letter word with 4 wrong letters.
A Do-It-\burself Example
We readily conduct the same process for any object, familiar
Consider the object shown in Figure 1. We readily recognize or unfamiliar, in our foveal field of view. The manner of segmen-
it as one of those objects that cannot be classified into a familiar tation and analysis into components does not appear to depend
category. Despite its overall unfamiliarity, there is near unanim- on our familiarity with the particular object being identified.
ity in its descriptions. We parse—or segment—its parts at re- The naive realism that emerges in descriptions of nonsense
gions of deep concavity and describe those parts with common, objects may be reflecting the workings of a representational sys-
tem by which objects are identified.
115
116 IRVING B1EDERMAN
objects. (Additional analyses of the role of surface features is edges (e.g., collinearity, symmetry) are detected. Parsing is per-
presented later in the discussion of the experimental compari- formed, primarily at concave regions, simultaneously with a de-
son of the perceptibility of color photography and line draw- tection of nonaccidental properties. The nonaccidental proper-
ings.) The goal of the present effort is to account for what can ties of the parsed regions provide critical constraints on the
be called primal access: the first contact of a perceptual input identity of the components. Within the temporal and contex-
from an isolated, unanticipated object to a representation in tual constraints of primal access, the stages up to and including
memory. the identification of components are assumed to be bottom-up.'
A delay in the determination of an object's components should
Basic Phenomena of Object Recognition have a direct effect on the identification latency of the object.
The arrangement of the components is then matched against
independent of laboratory research, the phenomena of every- a representation in memory. It is assumed that the matching
day object identification provide strong constraints on possible of the components occurs in parallel, with unlimited capacity.
models of recognition. In addition to the fundamental phenom- Partial matches are possible with the degree of match assumed
enon that objects can be recognized at all (not an altogether
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
obvious conclusion), at least five facts are evident. Typically, an the image and the representation.2 This stage model is presented
object can be recognized rapidly, when viewed most from novel to provide an overall theoretical context. The focus of this arti-
orientations, under moderate levels of visual noise, when par- cle is on the nature of the units of the representation.
tially occluded, and when it is a new exemplar of a category. When an image of an object is painted on the retina, RBC
The preceding five phenomena constrain theorizing about assumes that a representation of the image is segmented—or
object interpretation in the following ways: parsed—into separate regions at points of deep concavity, par-
ticularly at cusps where there are discontinuities in curvature
1. Access to the mental representation of an object should
(Marr & Nishihara, 1978). In general, paired concavities will
not be dependent on absolute judgments of quantitative detail,
arise whenever convex volumes are joined, a principle that
because such judgments are slow and error prone (Garner,
Hoffman and Richards (1985) term transversality. Such seg-
1962; Miller, 1956). For example, distinguishing among just
mentation conforms well with human intuitions about the
several levels of the degree of curvature or length of an object
boundaries of object parts and does not depend on familiarity
typically requires more time than that required for the identifi-
cation of the object itself. Consequently, such quantitative pro-
1
cessing cannot be the controlling factor by which recognition is The only top-down route shown in Figure 2 is an effect of the nonac-
achieved. cidental properties on edge extraction. Even this route (aside from col-
linearity and smooth curvature) would run counter to the desires of
2. The information that is the basis of recognition should
many in computational vision (e.g., Marr, 1982) to build a completely
be relatively invariant with respect to orientation and modest
bottom-up system for edge extraction. This assumption was developed
degradation.
in the belief that edge extraction does not depend on prior familiarity
3. Partial matches should be computable. A theory of object with the object. However, as with the nonaccidental properties, a top-
interpretation should have some principled means for comput- down route from the component determination stage to edge extraction
ing a match for occluded, partial, or new exemplars of a given could precede independent of familiarity with the object itself. It is pos-
category. We should be able to account for the human's ability sible that an edge extraction system with a competence equivalent to
to identify, for example, a chair when it is partially occluded by that of a human—an as yet unrealized accomplishment—will require
other furniture, or when it is missing a leg, or when it is a new the inclusion of such top-down influences. It is also likely that other top-
model. down routes, such as those from expectancy, object familiarity, or scene
constraints (e.g.. Biederman, 1981; Biederman, Mezzanotte, & Rabm-
owitz, 1982), will be observed at a number of the stages, for example,
Recognition-by-Components: An Overview at segmentation, component definition, or matching, especially if edges
are degraded. These have been omitted from Figure 2 in the interests of
Our hypothesis, recognition-by-components (RBC), bears
simplicity and because their actual paths of influence are as yet undeter-
some relation to several prior conjectures for representing ob-
mined. By proposing a general account of object recognition, it is hoped
jects by parts or modules (e.g., Binford, 1971; Brooks, 1981; that the proposed theory will provide a framework for a principled anal-
Guzman, 1971;Marr, 1977; Marr ANishihara, 1978;Tverslcy ysis of top-down effects in this domain.
1
& Hemenway, 1984). RBC's contribution lies in its proposal for Modeling the matching of an object image to a mental representa-
a particular vocabulary of components derived from percep- tion is a rich, relatively neglected problem area, Tversky's (1977) con-
tual mechanisms and its account of how an arrangement of trast model provides a useful framework with which to consider this
these components can access a representation of an object in similarity problem in that it readily allows distinctive features (compo-
memory. nents) of the image to be considered separately from the distinctive com-
ponents of the representation. This allows principled assessments of
similarity for partial objects (components in the representation but not
Stages of Processing in the image) and novel objects (containing components in the image
that are not in the representation). It may be possible to construct a
Figure 2 presents a schematic of the presumed subprocesses
dynamic model based on a parallel distributed process as a modification
by which an object is recognized. These stages are assumed to
of the kind proposed by McClelland and Rumelhart (1981) for word
be arranged in cascade. An early edge extraction stage, respon- perception, with components playing the role of letters. One difficulty
sive to differences in surface characteristics namely, luminance, of such an effort is that the set of neighbors for a given word is well
texture, or color, provides a line drawing description of the ob- specified and readily available from a dictionary; the set of neighbors
ject. From this description, nonaccidental properties of image for a given object is not.
118 IRVING BIEDERMAN
Stages in Object Perception good continuation, symmetry, and Pragnanz. RBC thus pro-
vides a principled account of the relation between the classic
phenomena of perceptual organization and pattern recognition:
Although objects can be highly complex and irregular, the units
Edge by which objects are identified are simple and regular. The con-
Extraction straints toward regularization (Pragnanz) are thus assumed to
characterize not the complete object but the object's com-
ponents.
\ z
The preceding account is clearly edge-based. Surface charac-
teristics such as color, brightness, and texture will typically have
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
b c
Figure 3. Different arrangements of the same components can produce different objects.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
relations include specification of the relative sizes of the compo- curvature could be determined {as in Besl & Jain, 1986). Al-
This document is copyrighted by the American Psychological Association or one of its allied publishers.
nents, their orientation and the locus of their attachment. though a surface property derived from such gradients will be
invariant over some transformations, Witkin and Tenenbaum
Nonaccidental Properties: A Perceptual Basis for a (1983) demonstrate that the suggestion of a volumetric compo-
Componential Representation nent through the shape of the surface's silhouette can readily
override the perceptual interpretation of the luminance gradi-
Recent theoretical analyses of perceptual organization (Bin- ent The psychological literature, summarized in the next sec-
ford, 1981; Lowe, 1984; Rock, 1983; Witkin & Tenenbaum, tion, provides considerable evidence supporting the assumption
1983) provide a perceptual basis for generating a set of geons. that these nonaccidental properties can serve as primary organi-
The central organizational principle is that certain properties zational constraints in human image interpretation.
of edges in a two-dimensional image are taken by the visual
system as strong evidence that the edges in the three-dimen- Psychological Evidence for the Rapid Use of
sional world contain those same properties. For example, if
Nonaccidental Relations
there is a straight line in the image {collinearity), the visual sys-
tem infers that the edge producing that line in the three-dimen- There can be little doubt that images are interpreted in a
sional world is also straight. The visual system ignores the possi- manner consistent with the nonaccidental principles. But are
bility that the property in the image might be a result of a these relations used quickly enough to provide a perceptual ba-
(highly unlikely) accidental alignment of eye and curved edge. sis for the components that allow primal access? Although all
Smoothly curved elements in the image (curvilinearity) are sim- the principles have not received experimental verification, the
ilarly inferred to arise from smoothly curved features in the available evidence strongly suggests an affirmative answer to the
three-dimensional world. These properties, and the others de- preceding question. There is strong evidence that the visual sys-
scribed later, have been termed nonaccidental (Witkin & Tenen- tem quickly assumes and uses collinearity, curvature, symme-
baum, 1983) in that they would only rarely be produced by try, and cotermination. This evidence is of two sorts: (a) demon-
accidental alignments of viewpoint and object features and con- strations, often compelling, showing that when a given two-di-
sequently are generally unaffected by slight variations in view- mensional relation is produced by an accidental alignment of
point. object and image, the visual system accepts the relation as exist-
If the image is symmetrical (symmetry), we assume that the ing in the three-dimensional world; and (b) search tasks showing
object projecting that image is also symmetrical. The order of that when a target differs from distractors in a nonaccidental
symmetry is also preserved: Images that are symmetrical under property, as when one is searching for a curved arc among
both reflection and 90* increments of rotation, such as a square straight segments, the detection of that target is facilitated com-
or circle, are interpreted as arising from objects (or surfaces) pared to conditions where targets and background do not differ
that are symmetrical under both rotation and reflection. Al- in such properties.
though skew symmetry is often readily perceived as arising Coliineanty versus curvature. The demonstration of the col-
from a tilted symmetrical object or surface (Palmer, 1983), linearity or curvature relations is too obvious to be performed
there are cases where skew symmetry is not readily detected as an experiment. When looking at a straight segment, no ob-
(Attneave, 1982). When edges in the image are parallel or coter- server would assume that it is an accidental image of a curve.
minate we assume that the real-world edges also are parallel or That the contrast between straight and curved edges is readily
coterminate, respectively. available for perception was shown by Neisser (1963). He found
These five nonaccidental properties and the associated three- that a search for a letter composed only of straight segments,
dimensional inferences are described in Figure 4 (adapted from such as a Z, could be performed faster when in a field of curved
Lowe, 1984). Witkin and Tenenbaum (1983; see also Lowe, distractors, such as C, G, O, and Q, then when among other
1984) argue that the leverage provided by the nonaccidental re- letters composed of straight segments such as N, W, V, and M.
lations for inferring a three-dimensional structure from a two- Symmetry and parallelism. Many of the Ames demonstra-
dimensional image edges is so great as to pose a challenge to the tions (Meson, 1952), such as the trapezoidal window and Ames
effort in computational vision and perceptual psychology that room, derive from an assumption of symmetry that includes
assigned central importance to variation in local surface char- parallelism. Palmer (1980) showed that the subjective direction-
acteristics, such as luminance gradients, from which surface ality of arrangements of equilateral triangles was based on the
120 IRVING BIEDERMAN
Principle ot Non-Accidentolness- Critical information is unlikely to be a inference as to the identity of the volume in the image. For ex-
conseQuence of on accident of viewpoint. ample, the silhouette of a brick contains a series of six vertices,
Three Space Inference from Image Features which alternate between Ls and arrows, and an internal Y ver-
2-0 Relation 3-D Inference tex, as illustrated in Figure 5. The Y vertex is produced by the
cotermination of three segments, with none of the angles
4. Collineority of Collineority in 3-Space greater than 180°. (An arrow vertex, also formed from the coter-
points or lines
mination of three segments, contains an angle that exceeds
180'; an L vertex is formed by the cotermination of two seg-
2.Curvilineorityof Curvilinearity in 3-Space ments.) As shown in Figure 5, this vertex is not present in com-
points of arcs ponents that have curved cross sections, such as cylinders, and
thus can provide a distinctive cue for the cross-section edge.
(The curved Y vertex present in a cylinder can be distinguished
3. Symmetry Symmetry in 3-Space from the Y or arrow vertices in that the termination of one seg-
(Skew Symmetry ?! ,
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
varty, 1979].)
Perkins (1983) has described a perceptual bias toward paral-
lelism in the interpretation of this vertex.4 Whether the pres-
4. Parallel Curves Curves are parole! in 3-Space ence of this particular internal vertex can facilitate the identifi-
Wver Small
Visual Angles) cation of a brick versus a cylinder is not yet known, but a recent
study by Biederman and Blickle (1985), described below, dem-
5. Vertices-two or more Curves terminate at a onstrated that deletion of vertices adversely affected object rec-
terminations at a common point in 3-Space ognition more than deletion of the same amount of contour at
common ponit midsegment.
The T vertex represents a special case in that it is not a locus
of cotermination (of two or more segments) but only the termi-
Fork "Arrow" nation of one segment on another. Such vertices are important
Figure 4. Five nonaccidental relations. (From Figure 5.2, Perceptual
for determining occlusion and thus segmentation (along with
organisation and visual recognition [p. 77] by David Lowe. Unpub- concavities), in that the edge forming the (normally) vertical
lished docloria] dissertation, Stanford University. Adapted by permis- segment of the T cannot be closer to the viewer than the segment
sion.) forming the top of the T (Binford, 1981). By this account, the
T vertex might have a somewhat different status than the Y,
arrow, and L vertices, in that the Ts primary role would be
derivation of an axis of symmetry for the arrangement. King, in segmentation, rather than in establishing the identity of the
Meyer, Tangney, and Biederman (1976) demonstrated that a volume.5
perceptual bias toward symmetry contributed to apparent Vertices composed of three segments, such as the Y and ar-
shape constancy effects. Garner (1974), Checkosky and Whit-
lock (1973), and Pomerantz (1978) provided ample evidence 4
When such vertices formed the central angle in a polyhedron, Per-
that not only can symmetrical shapes be quickly discriminated kins (1983) reported that the surfaces would almost always be inter-
from asymmetrical stimuli, but that the degree of symmetry preted as meeting at right angles, as long as none of the three angles was
was also a readily available perceptual distinction. Thus, stimuli less than 90°. Indeed, such vertices cannot be projections of acute angles
that were invariant tinder both reflection and 90* increments in (Kanade, i98l)but the human appears insensitive to the possibility that
rotation could be rapidly discriminated from those that were the vertices could have arisen from obtuse angles. If one of the angles in
only invariant under reflection (Checkosky & Whitlock, 1973). the central Y vertex was acute, then the polyhedra would be interpreted
as irregular. Perkins found that subjects from rural areas of Botswana,
Colerminalion. The "peephole perception" demonstrations,
where there was a lower incidence of exposure to carpentered (right-
such as the Ames chair (Meson, 1952) or the physical realiza- angled) environments, had an even stronger bias toward rectilinear in-
tion of the "impossible" triangle (Pearose & Penrose, 1958), terpretations than did Westerners (Perkins & Deregowski, 1982).
are produced by accidental alignment of the ends of noncoter- 5
The arrangement of vertices, particularly for polyhedra, offers con-
minous segments to produce—from one viewpoint only—L, Y, straints on "possible" interpretations of toes as convex, concave, or
and arrow vertices. More recently, Kanade (1981) has presented occluding (e.g., Sugihara, 1984). In general, the constraints take the
a detailed analysis of an "accidental" chair of his own construc- form that a segment cannot change its interpretation, for example, from
tion. The success of these demonstrations document the imme- concave to convex, unless it passes through a vertex. "Impossible" ob-
diate and compelling impact of cotermination. jects can be constructed from violations of this constraint (Waltz, 1975)
The registration of cotermination is important for determin- as well as from more general considerations (Sugihara, 1982, 1984). It
ing vertices, which provide information that can serve to distin- is tempting to consider that the visual system captures these constraints
in the way in which edges are grouped into objects, but the evidence
guish the components. In fact, one theorist (Binford, 1981) has
would seem to argue against such aa interpretation. The impossibility
suggested that the major Junction of eye movements is to deter- of most impossible objects is not immediately registered, but requires
mine coincidence of segments. "Coincidence" would include scrutiny and thought before the inconsistency is detected. What this
not only cotermination of edges but the termination of one edge means in the present context is that the visual system has a capacity for
on anothec as with a T vertex. With polyhedra (volumes pro- classifying vertices locally, but no perceptual routines for determining
duced by planar surfaces), the Y, arrow, and L vertices allow the global consistency of a set of vertices.
HUMAN IMAGE UNDERSTANDING 121
Brick Cylinder
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
This document is copyrighted by the American Psychological Association or one of its allied publishers.
wo tangent Y vertices
Three Three (Occluding edge tangent
parallel outer at vertex to
edges arrow discontinuous edge)
vertices Curved edges
Two parallel
edges
Figure S. Some differences in nonaccidentai properties between a cylinder and a brick.
row, and their curved counterparts, are important determinants that is hypothesized to be the basis of object recognition should
as to whether a given component is volumetric or planar. Planar be rapidly identifiable and invariant over viewpoint and noise.
components (to be discussed later) lack three-pronged vertices. These characteristics would be attainable if differences among
The high speed and accuracy of determining a given nonacci- components were based on differences in nonaccidentai proper-
dentai relation {e.g., whether some pattern is symmetrical) ties. Although additional nonaccidentai properties exist, there
should be contrasted with performance in making absolute is empirical support for rapid perceptual access to the five de-
quantitative judgments of variations in a single physical attri- scribed in Figure 4. In addition, these five relations reflect intu-
bute, such as length of a segment or degree of tilt or curvature. itions about significant perceptual and cognitive differences
For example, the judgment as to whether the length of a given among objects.
segment is 10,12,14,16, or 18 cm is notoriously slow and error From variation over only two or three levels in the nonaccidentai
prone (Beck, Prazdny, & Rosenfeld, 1983; Fildes & Triggs, relations of four attributes of generalized cylinders, a set of 36
1985; Garner, 1962; Miller, 1956; Virsu, 1971a, 1971b). Even geons can be generated. A subset is illustrated in Figure 6.
these modest performance levels are challenged when the judg- Six of the generated geons (and their attribute values) are
ments have to be executed over the brief 100-ms intervals shown in Figure 7. Three of the attributes describe characteris-
(Egeth & Pachella, 1969) that are sufficient for accurate object tics of the cross section: its shape, symmetry, and constancy of
identification. Perhaps even more telling against a view of ob- size as it is swept along the axis. The fourth attribute describes
ject recognition that postulates the making of absolute judg- the shape of the axis. Additional volumes are shown in Figures
ments of fine quantitative detail is that the speed and accuracy 8 and 9.
of such judgments decline dramatically when they have to be
made for multiple attributes (Egeth & Pachella, 1969; Garner,
Nonaccidental Two-Dimensional Contrasts
1962; Miller, 1956). In contrast, object recognition latencies for
complex objects are reduced by the presence of additional (re-
Among the Geons
dundant) components (Biederman, Ju, & Clapper, 1985, de- As indicated in the above outline, the values of the four gener-
scribed below). alized cone attributes can be directly detected as contrastive
differences in nonaccidentai properties: straight versus curved,
Geons Generated From Differences in Nonaccidental symmetrical versus asymmetrical, parallel versus nonparallel
Properties Among Generalized Cones (and if nonparallel, whether there is a point of maximal convex-
ity). Cross-section edges and curvature of the axis are distin-
I have emphasized the particular set of nonaccidentai proper- guishable by coUinearity or curvilinearity. The constant versus
ties shown in Figure 4 because they may constitute a perceptual expanded size of the cross section would be detectable through
basis for the generation of the set of components. Any primitive parallelism; a constant cross section would produce a general-
122 OWING BIEDERMAN
were undecidabk.
This document is copyrighted by the American Psychological Association or one of its allied publishers.
Geons with Expanded and Contracted Cross Sections (—) be individually represented is that they often produce volumes
that resemble symmetrical, but truncated, wedges or cones.
This latter form of representing asymmetrical cross sections
would be analogous to the schema-plus-correction phenome-
Cross Section •
non noted by Bartlett (1932). The implication of a schema-
Edge- Curved (C)
Symmetry: Yes {•*•) plus-correction representation would be that a single primitive
Size: Expanded ft Contracted: (--) category for asymmetrical cross sections and wedges might be
Axis: Straight {+) sufficient For both kinds of volumes, their similarity may be a
(Lemon) function of the detection of a lack of parallelism in the volume.
One would have to exert scrutiny to determine whether a lack
of parallelism had originated in the cross section or in a size
change of a symmetrical cross section. In this case, as with the
components with curved axes described in the preceding sec-
Cross Section'•
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
Edge: Curved (C) rical straight-edged volumes could be postulated that would al-
Symmetry: Yes (+) low a reduction in the number of primitive components. There
Size: Expanded (+) is considerable evidence that asymmetrical patterns require
Axis: Curved H more time for their identification than symmetrical patterns
(Checkosky & Whitlock, 1973; Pomerantz, 1978). Whether
these effects have consequences for the time required for object
identification is not yet known.
One other departure from regular components might also be
Cross Section:
noted. A volume can have a cross section with edges that are
Edge: Curved (C)
both curved and straight, as would result when a cylinder is sec-
Symmetry: Yes (+)
Size= Expanded 8 Contracted {--) tioned in half along its length, producing a semicircular cross
section. The conjecture is that in such cases the default cross
Axis: Curved (-)
section is the curved one, with the straight edges interpreted as
(Gourd) slices off the curve, in schema-plus-correction representation
(Bartlett, 1932).
Figure 8. Three curved geons with curved axes or expanded and/or con-
tracted cross sections. (These tend to resemble biological forms.)
CROSS SECTION
A number of subordinate and related issues are raised by this Eta Svpnttrv SM Axis
attempt, some of which will be addressed in this section. This Straight S Constant ++ Straight +
Curved C Ref* Expanded - Curved -
section need not be covered by a reader concerned primarily Asymm- ExpSCont—
with the overall gist of RBC.
Asymmetrical cross sections. There are an infinity of possible
cross sections that could be asymmetrical. How does RBC rep-
resent this variation? RBC assumes that the differences in the
departures from symmetry are not readily available and thus
do not affect primal access. For example, the difference in the
shape of the cross section for the two straight-edged volumes in
Figure 10 might not be apparent quickly enough to affect object
recognition. This does not mean that an individual could not
store the details of the volume produced by an asymmetrical
cross section. But the presumption is that the access for this
detail would be too slow to mediate primal access. I do not
know of any case where primal access depends on discrimina-
tion among asymmetrical cross sections within a given compo-
nent type, for example, among curved-edged cross sections of
constant size, straight axes, and a specified aspect ratio. For in-
stance, the curved cross section for the component that can
model an airplane wing or car door is asymmetrical. Different
wing designs might have different shaped cross sections. It is
likely that most people, including wing designers, will know that
the object is an airplane, or even an airplane wing, before they
know the subclassification of the wing on the basis of the asym- Figure 9. Geons with curved axis and straight or curved cross sections.
metry of its cross section. (Determining the shape of the cross section, particularly if straight,
A second way in which asymmetrical cross sections need not might require attention.)
124 IRVING BIEDERMAN
10 or the eye of the elephant in Figure 11. Such shapes can be cross section traveling along a curved axis (e.g., the components
conceptualized in two ways. The first (and less favored) is to on the first, third, and fifth rows of Figure 9) appear somewhat
assume that these are just quantitative variations of the volu- less familiar and more difficult to apprehend than their curved
metric components, but with an axis length of zero. They would counterparts. It is possible that this difficulty may merely be a
then have default values of a straight axis (+) and a constant consequence of unfamiliarity. Alternatively, the subjective
cross section (+). Only the edge of the cross section and its sym- difficulty might be produced by a conjunction-attention effect
metry could vary. (CAE) of the kind discussed by Treisman (e.g., Treisman & Gel-
Alternatively, it might be that a planar region is not related ade, 1980). (CAEs are described later in the section on atten-
perceptually to the foreshortened projection of the geon that tional effects.) In the present case, given the presence in the im-
could have produced it. Using the same variation in cross-sec- age of curves and straight edges (for the rectilinear cross sec-
tion edge and symmetry as with the volumetric components, tions with curved axis), attention (or scrutiny) may be required
seven planar geons could be defined. For ++symmetry there to determine which kind of segment to assign to the axis and
would be the square and circle (with straight and curved edges, which to assign to the cross section. Curiously, the problem
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
respectively) and for +symmetry the rectangle, triangle, and el- does not present itself when a curved cross section is run along
This document is copyrighted by the American Psychological Association or one of its allied publishers.
lipse. Asymmetrical (—) planar geons would include trapezoids a straight axis to produce a cylinder or cone. The issue as to
(straight edges), and drop shapes (curved edges). The addition the role of attention in determining geons would appear to be
of these seven planar geons to the 36 volumetric geons yields empirically tractable using the paradigms created by Treisman
43 components (a number close to the number of phonemes and her colleagues (Treisman, 1982; Treisman &Gelade, 1980).
required to represent English words). The triangle is here as- Conjunction-attentional effects. The time required to detect
sumed to define a separate geon, although a triangular cross a single feature is often independent of the number of distract-
section was not assumed to define a separate volume under the ing items in the visual field. For example, the time it takes to
intuition that a prism (produced by a triangular cross section) detect a blue shape (a square or a circle) among a field of green
is not quickly distinguishable from a wedge. My preference for distractor shapes is unaffected by the number of green shapes.
assuming that planar geons are not perceptually related to their However, if the target is defined by a conjunction of features, for
foreshortened volumes is based on the extraordinary difficulty example, a blue square among distractors consisting of green
of recognizing objects from views that are parallel to the axis of squares and blue circles, so that both the color and the shape of
the major components so that foreshortening projects only the each item must be determined to know if it is or is not the target,
planar cross section, as shown in Figure 27. The presence of then target detection time increases linearly with the number of
three-pronged vertices thus provides strong evidence that the distractors (Treisman & Gelade, 1980). These results have led
image is generated from a volumetric rather than a planar com- to a theory of visual attention that assumes that humans can
ponent. monitor all potential display positions simultaneously and with
Selection of axis. Given that a volume is segmented from the unlimited capacity for a single feature (e.g., something blue or
object, how is an axis selected? Subjectively, it appears that an something curved). But when a target is defined by a conjunc-
axis is selected that would maximize the axis's length, the sym- tion of features, then a limited capacity attentional system that
metry of the cross section, and the constancy of the size of the can only examine one display position at a time must be de-
cross section. By maximizing the length of the axis, bilateral ployed (Treisman & Gelade, 1980).
symmetry can be more readily detected because the sides would The extent to which Treisman and Gelade's (1980) demon-
be closer to the axis. Often a single axis satisfies all three criteria, stration of conjunction-attention effects may be applicable to
but sometimes these criteria are in opposition and two (or the perception of volumes and objects has yet to be evaluated.
more) axes (and component types) are plausible (Brady, 1983). In the extreme, in a given moment of attention, it may be the
Under such conditions, axes will often be aligned to an external case that the values of the four attributes of the components are
frame, such as the vertical (Humphreys, 1983). detected as independent features. In cases where the attributes,
Negative values. The plus values in Figures 7, 8, and 9 are taken independently, can define different volumes, as with the
those favored by perceptual biases and memory errors. No bias shape of cross sections and axes, an act of attention might be
is assumed for straight and curved edges of the cross section. For required to determine the specific component generating those
symmetry, clear biases have been documented. For example, if attributes: Am I looking at a component with a curved cross
an image could have arisen from a symmetrical object, then it section and a straight axis or is it a straight cross section and
is interpreted as symmetrical (King et al., 1976). The same is a curved axis? At the other extreme, it may be that an object
apparently true of parallelism. If edges could be parallel, then recognition system has evolved to allow automatic determina-
they are typically interpreted as such, as with the trapezoidal tion of the geons.
room or window. The more general issue is whether relational structures for
Curved axes. Figure 8 shows three of the most negatively the primitive components are defined automatically or whether
marked primitives with curved crossed sections. Such geons of- a limited attentional capacity is required to build them from
ten resemble biological entities. An expansion and contraction their individual-edge attributes. It could be the case that some
of a rounded cross section with a straight axis produces an ellip- of the most positively marked geons are detected automatically,
soid (lemon), an expanded cross section with a curved axis pro- but that the volumes with negatively marked attributes might
duces a horn, and an expanded and contracted cross section require attention. That some limited capacity is involved in the
with a rounded cross section produces a banana slug or gourd. perception of objects (but not necessarily their components) is
In contrast to the natural forms generated when both cross documented by an effect of the number of distracting objects
section and axis are curved, the geons swept by a straight-edged on perceptual search (Biederman, Brickie, Teitelbaum, Klatsky,
126 IRVING BIEDERMAN
& Mezzgnotte, in press). In their experiment, reaction times Computationally, a limit is suggested by estimates of the num-
and errors for detecting an object such as a chair increased lin- ber of objects we might know and the capacity for RBC to
early as a function of the number of non target objects in a 100- readily represent a far greater number with a limited number
ms presentation of nonscene arrangements of objects. Whether of primitives.
this effect arises from the necessity to use a limited capacity to
construct a geon from its attributes or whether the effect arises Empirical Support for a Limit
from the matching of an arrangement of geons to a representa-
tion is not yet known. Although the visual system is capable of discriminating ex-
tremely fine detail, I have been arguing that the number of volu-
metric primitives sufficient to model rapid human object recog-
Relations of RBC to Principles
nition may be limited. It should be noted, however, that the
of Perceptual Organization number of proposed primitives is greater than the three—cylin-
Textbook presentations of perception typically include a sec- der, sphere, and cone—advocated by some "How-to-Draw"
books. Although these three may be sufficient for determining
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
never linked to any other function of perception. RBC posits relative proportions of the parts of a figure and can aid perspec-
a specific role for these organizational phenomena in pattern tive, they are not sufficient for the rapid identification of ob-
recognition. As suggested by the section on generating geons jects.8 Similarly, Marr and Nishihara's (1978) pipe-cleaner
through nonaccidental properties, the Gestalt principles, par- (viz., cylinder) representations of animals (their Figure 17)
ticularly those promoting Pragnanz (Good Figure), serve to de- would also appear to posit an insufficient number of primitives.
termine the individual geons, rather than the complete object. On the page, in the context of other labeled pipe-cleaner ani-
A complete object, such as a chair, can be highly complex and mals, it is certainly possible to arrive at an identification of a
asymmetrical, but the components will be simple volumes. A particular (labeled) animal, for example, a giraffe. But the thesis
consequence of this interpretation is that it is the components proposed here would hold that the identifications of objects that
that will be stable under noise or perturbation. If the compo- were distinguished only by the aspect ratios of a single compo-
nents can be recovered and object perception is based on the nent type would require more time than if the representation
components, then the object will be recognizable. of the object preserved its componential identity. In modeling
This may be the reason why it is difficult to camouflage ob- only animals, it is likely that Marr and Nishihara capitalized on
jects by moderate doses of random occluding noise, as when the possibility that appendages (such as legs and some necks)
a car is viewed behind foliage. According to RBC, the geons can often be modeled by the cylindrical forms of a pipe cleaner.
accessing the representation of an object can readily be recov- By contrast, it is unlikely that a pipe-cleaner representation of
ered through routines of coilinearity or curvature that restore a desk would have had any success. The lesson from Marr and
contours (Lowe, 1984). These mechanisms for contour restora- Nishihara's demonstration, even when limited to animals, may
tion will not bridge cusps (e.g., Kanizsa, 1979). Fbr visual noise be that an image that conveys only the axis structure and axes
to be effective, by these considerations, it must obliterate the length is insufficient for primal access.
concavity and interrupt the contours from one geon at the pre- As noted earlier, one reason not to posit a representation sys-
cise point where they can be joined, through coilinearity or con- tem based on fine quantitative detail, for example, many varia-
stant curvature, with the contours of another geon. The likeli- tions in degree of curvature, is that such absolute judgments are
hood of this occurring by moderate random noise is. of course, notoriously slow and error prone unless limited to the 7 ± 2
extraordinarily low, and it is a major reason why, according to values argued by Miller (1956). Even this modest limit is chal-
RBC, objects are rarely rendered unidentifiable by noise. The lenged when the judgments have to be executed over a brief 100-
consistency of RBC with this interpretation of perceptual orga- ms interval (Egeth & Pachella, 1969) that is sufficient for accu-
nization should be noted. RBC holds that the (strong) toci of rate object identification. A further reduction in the capacity
parsing is at cusps; the geons are organized from the contours for absolute judgments of quantitative variations of a simple
between cusps. In classical Gestalt demonstrations, good figures
are organized from the contours between cusps. Experiments 7
Absolute judgments are judgments made against a standard in
subjecting these conjectures to test are described in a later memory, for example, that Shape A is 14 cm. in length. Such judgments
section. are to be distinguished from comparative judgments in which both
stimuli are available for simultaneous comparison, for example, that
A Limited Number of Components? Shape A, lying alongside Shape B, is longer than B. Comparative judg-
ments appear limited only by the resolving power of the sensory system.
According to the prior arguments, only 36 volumetric com- Absolute judgments are limited, in addition, by memory for physical
ponents can be readily discriminated on the basis of differences variation. That the memory limitations are severe is evidenced by the
in nonaccidental properties among generalized cones. In addi- finding that comparative judgments can be made quickly and accurately
for differences so fine that thousands of levels can be discriminated.
tion, there are empirical and computational considerations that
But accurate absolute judgments rarely exceed 1 ±2 categories (Millet;
are compatible with a such a limit.
1956).
Empirically, people are not sensitive to continuous metric 8
Paul Cezanne is often incorrectly cited on this point "Treat nature
variations as evidenced by severe limitations in humans' capac- by the cylinder, the sphere, the cone, everything in proper perspective so
ity for making rapid and accurate absolute judgments of quanti- that each side of an object or plane is directed towards a centra! point"
tative shape variations.' The errors made in the memory for (Cezanne, !904/i941, p. 234, italics mine). Cezanne was referring to
shapes also document an insensitivity to metric variations. perspective, not the veridical representation of objects.
HUMAN IMAGE UNDERSTANDING 127
shape would derive from the necessity, for most objects, to make (in the case of cups) to perhaps 15 or more (in the case of lamps)
simultaneous absolute judgments for the several shapes that readily discernible exemplars." Let us assume (liberally) that
constitute the object's parts (Egeth & Pachella, 1969; Miller, the mean number of types is 10. This would yield an estimate
1956). This limitation on our capacities for making absolute of 30,000 readily discriminable objects (3,000 categories X 10
judgments of physical variation, when combined with the de- types/category).
pendence of such variation on orientation and noise, makes A second source for the estimate derives from considering
quantitative shape judgments a most implausible basis for ob- plausible rates for learning new objects. Thirty thousand ob-
ject recognition. RBC's alternative is that the perceptual dis- jects would require learning an average of 4.5 objects per day,
criminations required to determine the primitive components every day for 18 years, the modal age of the subjects in the exper-
can be made categorically, requiring the discrimination of only iments described below.
two or three viewpoint-independent levels of variation.9
Our memory for irregular shapes shows clear biases toward
9
"regularization" (e.g., Woodworth, 1938). Amply documented This limitation on our capacities for absolute judgments also occurs
in the auditory domain in speech perception, in which the modest num-
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
2 Join at long or short surface of GI volumes (or shapes) will produce two concavities, unless an
This document is copyrighted by the American Psychological Association or one of its allied publishers.
Representational Calculations
Componential Relations: The Representational
Capacity of 36 Geons
The 1,296 different pairs of the 36 geons (i.e., 362), when mul-
How many objects could be represented by 36 geons? This tiplied by the number of relational combinations, 57.6 (the
calculation is dependent upon two assumptions: (a) the number product of the various values of the five relations), gives us
of geons needed, on average, to uniquely specify each object; 74,649 possible two-geon objects. If a third geon is added to the
and (b) the number of readily discriminable relations among two, then this value has to be multiplied by 2,073 (36 geons X
the geons. We will start with (b) and see if it will lead to an 57.6 ways in which the third geon can be related to one of the
empirically plausible value for (a). A possible set of relations is two geons), to yield 154 million possible three-component ob-
presented in Table 1. Like the components, the properties of the jects. This value, of course, readily accommodates the liberal
relations noted in Table 1 are nonaccidental in that they can be estimate of 30,000 objects actually known.
determined from virtually any viewpoint, are preserved in the The extraordinary disparity between the representational
two-dimensional image, and are categorical, requiring the dis- power of two or three geons and the number of objects in an
crimination of only two or three levels. The specification of individual's object vocabulary means that there is an extremely
these five relations is likely conservative because (a) it is cer- high degree of redundancy in the filling of the 154 million cell
tainly a nonexhaustive set in that other relations can be defined; geon-relation space. Even with three times the number of ob-
and (b) the relations are only specified for a pair, rather than jects estimated to be known by an individual (i.e., 90,000 ob-
triples, of geons. Let us consider these relations in order of their jects), we would still have less than ^ of 1% of the possible com-
appearance in Table 1. binations of three geons actually used (i.e., over 99.9% redun-
1. Relative size. Par any pair of geons, GI and G2, G! could dancy).
be much greater than, smaller than, or approximately equal There is a remarkable consequence of this redundancy if we
toG2. assume that objects are distributed randomly throughout the
2. Verticality. GI can be above or below or to the side of G2, object space. (Any function that yielded a relatively homoge-
a relation, by the author's estimate, that is defined for at least neous distribution would serve as well.) The sparse, homoge-
80% of the objects. Thus giraffes, chairs, and typewriters have a neous occupation of the space means that, on average, it will be
top-down specification of their components, but forks and rare for an object to have a neighbor that differs only by one
HUMAN IMAGE UNDERSTANDING 129
geon or relation.12 Because the space was generated by consider- subjectively, according to generally easy agreement among at
ing only the number of possible two or three component ob- least three judges. The artists were unaware of the set of geons
jects, a constraint on the estimate of the average number of com- described in this article. For the most part, the components cor-
ponents per object that are sufficient for unambiguous identifi- responded to the parts of the object. Seventeen geon types (out
cation is implicated. If objects were distributed relatively of the full set of 36), were sufficient to represent the 180 compo-
homogeneously among combinations of relations and geons, nents comprising the complete versions of the 36 objects.
then only two or three geons would be sufficient to unambigu- The objects were shown either with their full complement of
ously represent most objects. components or partially, but never with less than two compo-
nents. The first two or three components that were selected were
Experimental Support for a Componential almost always the largest components from the complete object,
as illustrated in Figures 12 and 13. For example, the airplane
Representation
(Figure 13), which required nine components to look complete,
According to the RBC hypothesis, the preferred input for ac- had the fuselage and two wings when shown with three of its
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
cessing object recognition is that of the volumetric geons. In nine components. Additional components were added in de-
This document is copyrighted by the American Psychological Association or one of its allied publishers.
most cases, only a few appropriately arranged geons would be creasing order of size, subject to the constraint that additional
all that is required to uniquely specify an object. Rapid object components be connected to the existing components. Occa-
recognition should then be possible. Neither the full comple- sionally the ordering of large-to-small was altered when a
ment of an object's geons, nor its texture, nor its color, nor the smaller component, such as the eye of an animal, was judged to
full bounding contour (or envelope or outline) of the object be highly diagnostic. The ordering by size was done under the
need be present for rapid identification. The problem of recog- assumption that processing would be completed earlier for
nizing tens of thousands of possible objects becomes, in each larger components and, consequently, primal access would be
case, just a simple task of identifying the arrangement of a few controlled by those parts. However, it might be the case that a
from a limited set of geons. smaller part, if it was highly diagnostic, would have a greater
Several object-naming reaction time experiments have pro- role in controlling access than would be expected from its small
vided support for the general assumptions of the RBC hypothe- size. The objects were displayed in black line on a white back-
sis, although none have provided tests of the specific set of geons ground and averaged 4.5° in greatest extent.
proposed by RBC or even that there might be a limit to the
number of components.'*
12
In all experiments, subjects named or quickly verified briefly Informal demonstrations suggest that this is the case. When a single
presented pictures of common objects.1'1 That RBC may pro- component or relation of an object is altered, as with the cup and the
vide a sufficient account of object recognition was supported by pail, only with extreme rarity is a recognizable object from another cate-
experiments indicating that objects drawn with only two or gory produced.
13
three of their components could be accurately identified from Biederman (1985) discusses bow a limit might be assessed. Among
other consequences, a limit on the number of components would imply
a single 100-ms exposure. When shown with a complete com-
categorical effects whereby quantitative variations in the contours of an
plement of components, these simple line drawings were identi-
object, for example, degree of curvature, that did not alter a compo-
fied almost as rapidly as full colored, detailed, textured slides of nent's identity would have less of an effect on the identification of the
the same objects. That RBC may provide a necessary account object than contour variations that did alter a component's identity.
of object recognition was supported by a demonstration that u
Our decision to use a naming task with which to assess object rec-
degradation (contour deletion), if applied at the regions that ognition was motivated by several considerations. Naming is a sure sign
prevented recovery of the geons, rendered an object unidentifi- of recognition. Under the conditions of these experiments, if an individ-
able. All the original experimental results reported here have ual could name the object, be or she must have recognized it. With other
received at least one, and often several, replications. paradigms, such as discrimination or verification, it is difficult (if not
impossible) to prevent the subject from deriving stimulus selection strat-
egies specific to the limited number of stimuli and distraclors. Although
Perceiving Incomplete Objects naming RTs are relatively slow, they are remarkably well behaved, with
surprisingly low variability (given their mean) for a given response and
Biederman, Ju, and Clapper (1985) studied the perception of
few of the response anticipation or selection errors that occur with bi-
briefly presented partial objects lacking some of their compo- nary responses (especially, keypresses). As in any task with a behavioral
nents. A prediction of RBC was that only two or three geons measure, one has to exert caution in making inferences about represen-
would be sufficient for rapid identification of most objects. If tations at an earlier stage. In every experiment reported here, whenever
there was enough time to determine the geons and their re- possible, the same objects (with the same name) served in all conditions.
lations, then object identification should be possible. Complete The data from these experiments (e.g., Figures 19 and 20) were so
objects would be maximally similar to their representation and closely and reasonably associated with the contour manipulations as
should enjoy an identification speed advantage over their partial to preclude accounts based on a late name-selection stage. Moreover,
providing the subjects with the set of possible names prior to an experi-
ment, which might have been expected to affect response selection, had
virtually no effect on performance. When objects could not be used as
Stimuli their own controls, as was necessary in studies of complexity, it was
possible to experimentally or statistically control naming-stage variabil-
The experimental objects were line drawings of 36 common ity because the determinants of this variability—specifically, name fa-
objects, 9 of which are illustrated in Figure 11. The depiction miliarity (which is highly correlated with frequency and age of acquisi-
of the objects and their partition into components was done tion) and length—are well understood.
130 IRVING BIEDERMAN
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
This document is copyrighted by the American Psychological Association or one of its allied publishers.
The purpose of this experiment was to determine whether the ponents, for example, might be more rapidly recognized than
first few geons that would be available from an unoccluded view only a partial version of that airplane, with only three of its
of a complete object would be sufficient for rapid identification components. The prediction from RBC was that complex ob-
of the object. We ordered the components by size and diagnos- jects, by furnishing more diagnostic combinations of compo-
ticity because our interest, as just noted, was on primal access nents that could be simultaneously matched, would be more
in recognizing a complete object. Assuming that the largest and rapidly identified than simple objects. This prediction is con-
most diagnostic components would control this access, we stud- trary to models that assume that objects are recognized through
ied the contribution of the nth largest and most diagnostic com- a serial contour tracing process such as that studied by Oilman
ponent; when added to the n- i already existing components, (1983).
because this would more closely mimic the contribution of that
component when looking at the complete object. (Another kind General Procedure
of experiment might explore the contribution of an "average"
component by balancing the ordering of the components. Such Trials were self-paced. The depression of a key on the sub-
an experiment would be relevant to the recognition of an object ject's terminal initiated a sequence of exposures from three
that was occluded in such a way that only the displayed compo- projectors. First, the corners of a 500-ms fixation rectangle (6*
nents would be available for viewing.) wide) that corresponded to the corners of the object slide were
shown. This fixation slide was immediately followed by a 100-
Complexity ms exposure of a slide of an object that had varying numbers
of its components present. The presentation of the object was
The objects shown in Figure 11 illustrate the second major immediately followed by a 500-ms pattern mask consisting of a
variable in the experiment. Objects differ in complexity; by random appearing arrangement of lines. The subject's task was
RBC's definition, the differences are evident in the number of to name the object as fast as possible into a microphone that
components they require to look complete. For example, the triggered a voice key. The experimenter recorded errors. Prior
lamp, the flashlight, the watering can, the scissors, and the ele- to the experiment, the subjects read a list of the object names
phant require two, three, four, six, and nine components, re- to be used in the experiment. (Subsequent experiments revealed
spectively. As noted previously, it would seem plausible that that this procedure for name familiarization produced no
partial objects would require more time for their identification effect. When subjects were not familiarized with the names of
than complete objects, so that a complete airplane of nine com- the experimental objects, results were virtually identical to
HUMAN IMAGE UNDERSTANDING 131
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
This document is copyrighted by the American Psychological Association or one of its allied publishers.
Figure 12. Illustration of the partial and complete versions of 2 three-component objects
(the wine glass and flashlight) and I nine-component object ((he penguin).
when such familiarization was provided. This finding indicates look complete) were present, subjects were almost 90% accu-
that the results of these experiments were not a function of in- rate. In general, the complete objects were named without erroi;
ference over a small set of objects.) Even with the name famil- so it is necessary to look at the RTs to see if differences emerge
iarization, all responses that indicated that the object was iden- for the complexity variable.
tified were considered correct. Thus "pistol," "revolve^" "gun," Mean correct RTs, shown in Figure 15, provide the same gen-
and "handgun" were all acceptable as correct responses for the eral outcome as the errors, except that there was a slight ten-
same object. Reaction times (RTs) were recorded by a micro- dency for the more complex objects, when complete, to have
computer that also controlled the projectors and provided shorter RTs than the simple objects. This advantage for the com-
speed and accuracy feedback on the subject's terminal after plex objects was actually underestimated in that the complex
each trial. objects had longer names (three and four syllables) and were less
Objects were selected that required two, three, six, or nine familiar than the simple objects. Oldfield (1966) and Oldfield
components to look complete. There were 9 objects for each of and Wingfield (1965) showed that object-naming RTs were
these complexity levels, yielding a total set of 36 objects. The longer for names that have more syllables or are infrequent. This
various combinations of the partial versions of these objects effect of slightly shorter RTs for naming complex objects has
brought the total number of experimental trials (slides) to 99. been replicated, and it seems safe to conclude, conservatively,
Each of 48 subjects viewed all the experimental slides, with bal- that complex objects do not require more time for their identi-
ancing accomplished by varying the order of the slides. fication than simple objects. This result is contrary to what
would be expected from a serial contour-tracing process (e.g.,
Results Ullman, 19S4). Serial tracing would predict that complex ob-
jects would require more time to be seen as complete compared
Figure 14 shows the mean error rates as a function of the to simple objects, which have less contour to trace. The slight
number of components actually displayed on a given trial for RT advantage enjoyed by the complex objects is an effect that
the conditions in which no familiarization was provided. Each would be expected if their additional components were afford-
function is the mean for the nine objects at a given complexity ing a redundancy gain from more possible diagnostic matches
level. Although each subject saw all 99 slides, only the data for to their representations in memory.
the first time that a subject viewed a particular object will be
discussed here. For a given level of complexity, increasing num- Line Drawings Versus Colored Photography
bers of components resulted in better performance, but error
rates overall were modest. When only three or four components The components that are postulated to be the critical units
of the complex objects (those with six or nine components to for recognition are edge-based and can be depicted by a line
132 IRVING BIEDERMAN
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
This document is copyrighted by the American Psychological Association or one of its allied publishers.
Figure 13. IDustration of partial and complete versions of a nine-component object (airplane).
drawing. Cokn; brightness, and texture would be secondary contributing to primal access, then the former kinds of objects,
routes for recognition. From this perspective, Biederman and for which color is diagnostic, should have enjoyed a larger ad-
Ju (1986) reasoned that naming RTs for objects shown as line vantage when appearing in a color photograph, but this did not
drawings should closely approximate naming RTs for those ob- happen. Objects with a diagnostic color did not enjoy any ad-
jects when shown as colored photographic slides with complete vantage when they were displayed as color slides compared with
detail, color, and texture. This prediction would be true of any their line-drawing versions. That is, showing color-diagnostic
model that posited an edge-based representation mediating rec- objects such as a banana or a fork as a color slide did not confer
ognition. any advantage over the line-drawing version compared with ob-
In the Biederman and Ju experiments, subjects identified jects such as a chair or mitten. Moreover, there was no color
brief presentations (50-100 ms) of slides of common objects.15
Each object was shown in two versions: professionally photo- 15
graphed in full color or as a simplified line drawing showing An oft-cited study, Ryan and Schwartz (1956), did compare pho-
only the object's major components (such as those in Figure tography (black & white) against line and shaded drawings and car-
toons. But these investigators did not study basic-level categorization of
11). In three experiments, subjects named the object; in a
an object. Subjects had to determine which one of four configurations
fourth experiment a yes-no verification task was performed of three objects (the positions of five double-throw electrical knife swit-
against a target name. Overall, performance levels with the two ches, the cycles of a steam valve, and the fingers of a hand) was being
types of stimuli were equivalent: mean latencies in identifying depicted. The subjects knew which object was to be presented on a given
images presented by color photography were 11 ms shorter than trial. For two of the three objects, the cartoons had lower thresholds than
the drawing but with a 3.9% higher error rate. the other modes. But stimulus sampling and drawings and procedural
A previously unexplored color diagnosticity distinction specifications render interpretation of this experiment problematical;
among objects allowed us to determine whether color and light- for example, the determination of the switch positions was facilitated
ness was providing a contribution to primal access independent in the cartoons by filling in the handles so they contrasted with the back-
of the main effect of photos versus drawings. For some kinds ground contacts. The variability was enormous: Thresholds for a given
form of depiction for a single object ranged across the four configura-
of objects, such as bananas, forks, fishes, or cameras, color is
tions from 50 to 2,000 ms. The cartoons did not have lower thresholds
diagnostic to the object's identity. For other kinds, such as than the photographs for the hands, the stimulus example most fre-
chairs, pens, or mittens, color is not diagnostic. The detection quently shown in secondary sources (e.g., Neisser, 1967; Hochberg,
of a yellow region might facilitate the perception of a banana, 1978; Rock, 1984). Even without a mask, threshold presentation dura-
but the detection of the color of a chair is unlikely to facilitate tions were an order of magnitude longer than was required in the present
its identification, because chairs can be any color. If color was study.
HUMAN IMAGE UNDERSTANDING 133
40 I-
30
2 Number of Components
UJ in Complete Object
1=
<u 20
o \3
10
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
This document is copyrighted by the American Psychological Association or one of its allied publishers.
•2
2 3 4 5 6 7 8 9
Number of Components Presented
Figure )4, Mean percent error as a function of the number of components in the displayed object (abscissa)
and the number of components required for the object to appear complete (parameter), {Each point is the
mean for nine objects on the first occasion when a subject saw that particular object.)
diagnosticity advantage for the color slides on the verification Yet when the cylinder is used to make a cup and a pail in Figure
task, where the color of the to-be-verified object could be antici- 3, or the cone used to make a wine glass in Figure 12, the vol-
pated. umes are interpreted as concave (hollow). It would thus seem to
This failure to find a color diagnosticity effect, when com- be the case that the interpretation offaoUowness—aninterpreta-
bined with the finding that simple line drawings could be identi- tion that overrides the default value of solidity—of a volume
fied so rapidly as to approach the naming speed of fully de- can be readily accomplished top-down once a representation is
tailed, textured, colored photographic slides, supports the elicited.
premise that the earliest access to a mental representation of an
object can be modeled as a matching of an edge-based represen- The Perception of Degraded Objects
tation of a few simple components. Such edge-based descrip-
tions are thus sufficient for primal access. RBC assumes that certain contours in the image are critical
The preceding account should not be interpreted as suggest- for object recognition. Several experiments on the perception
ing that the perception of surface characteristics per se are de- of objects that have been degraded by deletion of their contour
layed relative to the perception of the components but merely (Biederman & Blickle, 1985) provide evidence that these con-
that in most cases surface cues are generally iess efficient routes tours are necessary for object recognition (under conditions
for primal access. That is, we may know that an image of a chair where contextual inference is not possible).
has a particular color and texture simultaneously with its volu- RBC holds that parsing of an object into components is per-
metric description, but it is only the volumetric description that formed at regions of concavity. The nonaccidental relations of
provides efficient access to the mental representation of "chair." collinearity and curvilinearity allow filling-in: They extend bro-
It should be noted that our failure to find a benefit from color ken contours that are collinear or smoothly curvilinear. In con-
photography is likely restricted to the domain whereby the cert, the two assumptions of (a) parsing at concavities and (b)
edges are of high contrast. Under conditions where edge extrac- filling-in through collinearity or smooth curvature lead to a
tion is difficult, differences in color, texture, and luminance prediction as to what should be a particularly disruptive form
might readily facilitate such extraction and result in an advan- of degradation: If contours were deleted at regions of concavity
tage for color photography. in such a manner that their endpoints, when extended through
There is one surface characteristic that deserves special note: collinearity or curvilinearity, bridge the concavity, then the
the luminance gradient Such gradients can provide sufficient components would be lost and recognition should be impossi-
information as to a region's surface curvature (e.g., Best & Jain, ble. The cup in the right column of the top row of Figure 16
1986) from which the surface's convexity or concavity can be provides an example. The curve of the handle of the cup is
determined. Our outline drawings lacked those gradients. Con- drawn so that it is continuous with the curve of the cylinder
sider the cylinder and cone shown in the second and fifth rows, forming the back rim of the cup. This form of degradation, in
respectively, of Figure 7. In the absence of luminance gradients, which the components cannot be recovered from the input
the cylinder and cone are interpreted as convex (not hollow). through the nonaccidental properties, is referred to as nonrecov-
134 IRVING BIEDERMAN
WO r-
Number of Components
_ 4000 in Complete Object:
o
<p
en
* 2
A A 3
x x 6
900
9
t£
o
o»
x—
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
This document is copyrighted by the American Psychological Association or one of its allied publishers.
O
0>
700
2 3 4 5 6 7 8 9
Number of Components Presented
Figure 15. Mean correct reaction time as a function of the number of components in the displayed object
(abscissa) and the number of components required for the object to appear complete (parameter). (Each
point is the mean for nine objects on the first occasion when a subject saw that particular object.)
erable degradation and is illustrated for the objects in the right The other way to alter vertices is to produce them through
column of Figure 16. misleading extension of contours. Just as approximate joins of
An equivalent amount of deleted contour in a midsection of interrupted contours might be accepted to produce continuous
a curve or line should prove to be less disruptive as the compo- edges, if three or more contours appear to meet at a common
nents could then be restored through coflinearity or curvature. point when extended then a misleading vertex can be suggested.
In this case the components should be recoverable. Example For example, in the watering can in the right column of Figure
of recoverable forms of degradation are shown in the middle 11, the extensions of the contour from the spout attachment
column of Figure 16. and sprinkler appear to meet the contours of the handle and
In addition to the procedure for deleting and bridging con- rim, suggesting a false vertex of five edges. (Such a multivertex
cavities, two other applications of nonaccidental properties is nondiagnostic to a volume's three-dimensional identity [e.g.,
were used to prevent determination of the components: vertex Guzman, 1968; Sugihara, 1984].)
alteration and misleading symmetry or parallelism.
Misleading Symmetry or Parallelism
Vertex Alteration
Nonrecoverabflity of components can also be produced by
When two or more edges terminate at the same point in the contour deletion that produces symmetry or parallelism not
image, the visual system assumes that they are terminating at characteristic of the original object. For example, the symmetry
the same point In depth and a vertex is present at that point. of oval region in the opening of the watering can suggests a pla-
Vertices are important for determining the nature of a compo- nar component with that shape.
nent (see Figure 5). As noted previously, volumetric compo- Even with these techniques, it was difficult to remove con-
nents will display at least one three-pronged vertex. tours supporting all the components of an object, and some re-
There are two ways to alter vertices. One way is by deleting a mained in nominally nonrecoverable versions, as with the han-
segment of an existing vertex. For example, the T-vertex pro- dle of the scissors.
duced by the occlusion of one blade of the scissors by the other Subjects viewed 35 objects, in both recoverable and nonre-
has been converted into an L-vertex, suggesting that the bound- coverable versions. Prior to the experiment, all subjects were
aries of the region in the image are the boundaries of that region shown several examples of the various forms of degradation for
of the object. In the cup, the curved-T-vertex produced by the several objects that were not used in the experiment. In addi-
joining of a discontinuous edge of the front rim of the cup with tion, familiarization with the experimental objects was manipu-
the occlusional edge of the sides and back rim has been altered lated between subjects. Prior to the start of the experimental
to an L-vertex by deleting the discontinuous edge. With only L- trials, different groups of six subjects (a) viewed a 3-sec slide of
vertices, objects typically lose their volumetric character and the intact version of the objects, for example, the objects in the
appear planar. left column of Figure 16, which they named; (b) were provided
HUMAN IMAGE UNDERSTANDING 135
with the names of the objects on their terminal; or (c) were given
no familiarization. As in the prior experiment, the subject's
task was to name the objects.
A glance at the second and third columns in Figure 16 is
sufficient to reveal that one does not need an experiment to
show that the nonrecoverable objects would be more difficult to
identify than the recoverable versions. But we wanted to deter-
mine if the nonrecoverable versions would be identifiable at ex- r
tremely long exposure durations (5s) and whether the prior ex-
posure to the intact version of the object would overcome the
effects of the contour deletion. The effects of contour deletion
in the recoverable condition was also of considerable interest
when compared with the comparable conditions from the par-
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
r
Results
The error data are shown in Figure 17. Identifiability of the \i
nonrecoverable stimuli was virtually impossible: The median k J
error rate for those slides was 100%. Subjects rarely guessed
wrong objects in this condition; most often they merely said
that they "didn't know" When nonrecoverable objects could be
\
identified, it was primarily for those instances where some of
the components were not removed, as with the circular rings of
the handle of the scissors. When this happened, subjects could
name the object at 200-ms exposure duration. For the majority
of the objects, however, error rates were well over 50% with no
gain in performance even with 5 s of exposure duration. Objects
in the recoverable condition were named at high accuracy at the
longer exposure durations.
OLJf
As in the previous experiments, familiarizing the subjects
with the names of the objects had no effect compared with the =
condition in which the subjects were given no information
Figure 16. Example of five stimulus objects in the experiment on the
about the objects. There was some benefit, however, in provid-
perception of degraded objects. (The left column shows the original in-
ing intact versions of the pictures of the objects. Even with this tact versions. The middle column shows the recoverable versions. The
familiarity, performance in the nonrecoverable condition was contours have been deleted in regions where they can be replaced
extraordinarily poor, with error rates exceeding 60% when sub- through collinearity or smooth curvature. The right column shows the
jects had a full 5 s to decipher the stimulus. As noted previously, nonrecoverable versions. The contours have been deleted at regions of
even this value underestimated the difficulty of identifying ob- concavity so that collinearity or smooth curvature of the segments brid-
jects in the nonrecoverable condition, in that identification was ges the concavity. In addition, vertices have been altered, for example,
possible only when the contour deletion allowed some of the from Ys to Ls, and misleading symmetry and parallelism have been
components to remain recoverable. introduced.)
The emphasis on the poor performance in the nonrecovera-
ble condition should not obscure the extensive interference that
was evident at the brief exposure durations in the recoverable Biederman and Blickle (1985). In the previous experiment, it
condition. The previous experiments had established that intact was necessary to delete or modify the vertices in order to pro-
objects, without picture familiarization, could be identified at duce the nonrecoverable versions of the objects. The recovera-
near perfect accuracy at 100 ms. At this exposure duration in ble versions of the objects tended to have their contours deleted
the present experiment, error rates for the recoverable stimuli, in midsegment. It is possible that some of the interference in
whose contours could be restored through collinearity and cur- the nonrecoverable condition was a consequence of the removal
vature, averaged 65%. These high error rates at 100-ms expo- of vertices per se, rather than the production of inappropriate
sure duration suggest that the filling-in processes require an im- components. Contour deletion was performed either at the ver-
age (retinal or iconic)—not merely a memory representation— tices or at midsegments for 18 objects, but without the acciden-
and sufficient time (on the order of 200 ms) to be successfully tal bridging of components through collinearity or curvature
executed. that was characteristic of the nonrecoverable condition. The
amount of contour removed varied from 25%, 45%, and 65%,
A Parametric Investigation of Contour Deletion and the objects were shown for 100, 200t or 750 ms. Other as-
pects of the procedure were identical to the previous experi-
The dependence of componential recovery on the availability ments with only name familiarization provided. Figure 18
and locus of contour and time was explored parametrically by shows an example for a single object.
136 IRVING BIEDERMAN
V
\
filling-in. The greater disruption from vertex deletion is ex- Figure 18. Illustration for a single object of 25, 45, and 65% contour
pected on the basis of their importance as diagnostic image fea- removal centered at either midsegment or vertex. (Unlike the nonrecov-
tures for the components. Overall, both the error and RT data erable objects illustrated in Figure 16, vertex deletion does not prevent
document a striking dependence of object identification on identification of the object.)
HUMAN IMAGE UNDERSTANDING 137
60
Exposure
Duration
Contour Deletion
50 At Vertex
At Midsegment
00msec
2 40
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
c
This document is copyrighted by the American Psychological Association or one of its allied publishers.
2 30
S.
o
o>
20
^200 msec
•750msec
25 45 65
Percent Contour Deletion
Figure 19. Mean percent object naming errors as a function of locus of contour removal
(midsegment or vertex), percent removal, and exposure duration.
of a representation most likely proceeds in the absence of the subset of a few components (a partial object) can provide a
parts, with weaker activation the consequence of the missing sufficient input for recognition, the activation of that represen-
parts. The two methods for removing contour may thus be tation is not optimal compared with a complete object. Thus, in
affecting different stages. Deleting contour in midsegment the partial object experiment described previously, recognition
affects processes prior to and including those involved in the RTs were shortened with the addition of components to an al-
determination of the components (see Figure 2). The removal ready recognizable object. If all of an object's components were
of whole components (the partial object procedure) is assumed degraded (but recoverable), recognition would be delayed until
to affect the matching stage, reducing the number of common contour restoration was completed. Once the filling-in was
components between the image and the representation and in- completed and the complete complement of an object's geons
creasing the number of distinctive components in the represen- was activated, a better match to the object's representation
tation. Contour filling-in is typically regarded as a fast, low-level would be possible (or the elicitation of its name) than with a
process. We (Biederman, Beiring, Ju, & Blickle, 1985) studied partial object that had only a few of its components. The inter-
the naming speed and accuracy of six- and nine-component ob- action can be modeled as a cascade in which the component-
jects undergoing these two types of contour deletion. At brief deletion condition results in more rapid activation of the geons
exposure durations (e.g., 65 ms) performance with partial ob- but to a lower asymptote (because some geons never get acti-
jects was better than objects with the same amount of contour vated) than the midsegment-deletion condition.
removed in midsegment both for errors (Figure 23) and RTs More generally, the finding that partial complex objects—
(Figure 24). At longer exposure durations (200 ms), the RTs with only three of their six or nine components present—can
reversed, with the midsegment deletion now faster than the par- be recognized more readily than objects whose contours can
tial objects. be restored through filling-in documents the efficiency of a few
Our interpretation of this result is that although a diagnostic components for accessing a representation.
138 IRVING BIEDERMAN
Exposure
lOOOr- Duration
Contour Deletion
100msec
At Vertex
O
O> 950 - At Midsegment
Q>
.§ 900
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
This document is copyrighted by the American Psychological Association or one of its allied publishers.
.o
"o 200msec
o 850
fi
750msec
o
o>
O 800
o
O
o>
750
25 45 65
Percent Contour Deletion
Figure 20. Mean correct object-naming reaction time {in milliseconds) as a function of locus
of contour removal (midsegment or vertex), percent removal, and exposure duration.
Contour Deletion by Occlusion in Figure 25, no object becomes apparent. In the recoverable
version in Figure 26, an object does pop into a three-dimen-
The degraded recoverable objects in the right column of Fig- sional appearance, but most observers report a delay (our own
ure 16 have the appearance of flat drawings of objects with in- estimate is approximately 500 ms) from the moment the stimu-
terrupted contours. Biederman and Buckle (1985) designed a lus is first fixated to when it appears as an identifiable three-
demonstration of the dependence of object recognition on com- dimensional entity.
poaential identification by aligning an occluding surface so that This demonstration of the effects of an occluding surface to
it appeared to produce the deletions. If the components were produce contour interruption also provides a control for the
responsible for an identifiable volumetric representation of the possibility that the difficulty in the nonrecoverable condition
object, we would expect that with the recoverable stimuli the was a consequence of inappropriate figure-ground groupings,
object would complete itself under the occluding surface and as with the stool in Figure 16. With the stool, the ground that
assume a three-dimensional character. This effect should not was apparent through the rungs of the stool became figure in
occur in the nonrecoverable condition. This expectation was the nonrecoverable condition. (In general, however, oniy a few
met, as shown in Figures 25 and 26. These stimuli also provide of the objects had holes in them where this could have been a
a demonstration of the time (and effort?) requirements for con- factor.) Figure-ground ambiguity would not invalidate the
tour restoration through collinearity or curvature. We have not RBC hypothesis but would complicate the interpretation of the
yet obtained objective data on this effect, which may be compli- effects of the nonrecoverable noise, in that some of the effect
cated by masking effects from the presence of the occluding sur- would derive from inappropriate grouping of contours into
face, but we invite the reader to share our subjective impres- components and some of the effect would derive from inappro-
sions. When looking at a nonrecoverable version of an object priate figure-ground grouping. That the objects in the nonre-
HUMAN IMAGE UNDERSTANDING 139
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
This document is copyrighted by the American Psychological Association or one of its allied publishers.
Figure 21. A comparison of a nonrecoverable version of an object (on the left) with a recoverable version
(on the right) with half the contour of the nonrecoverable. Despite the reduction of contour the recoverable
version still enjoys an advantage over the nonrecoverable.
Component Midsegment coverable condition remain unidentifiable when the contour in-
Complete Deletion Deletion terruption is attributable to an occluding surface suggests that
figure-ground grouping cannot be the primary cause of the in-
terference from the nonrecoverable deletions.
40
\
30
20 \
I
t
fc
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
Q_
This document is copyrighted by the American Psychological Association or one of its allied publishers.
c
o
o>
Component
Deletion
Midsegment Deletion
65 100 200
Exposure Duration (msec)
Figure 23, Mean percent errors of object naming as a function of the nature of contour
removal (deletion of midsegments or components) and exposure duration.
Orientation Variability mental rotation. It may be that mental rotation—or a more gen-
eral imaginal transformation capacity stressing working mem-
Objects can be more readily identified from some orienta- ory—is required only under the (relatively rare) conditions where
tions compared with others (Palmer, Roseh, & Chase, 1981). the relations among the components have to be rearranged. Thus,
According to the RBC hypothesis, difficult views will be those we might expect to find the equivalent of mental paper folding if
in which the components extracted from the image are not the the parts of an object were rearranged and the subject's task was
components (and their relations) in the representation of the to determine if a given object could be made out of the displayed
object. Often such mismatches will arise from an "accident" of components. RBC would hold that the lengthening of naming KTs
viewpoint where an image property is not correlated with the in Jolicoeur*s (1985) experiment is better interpreted as an effect
property in the three-dimensional world. For example, when that arises not from the use of orientation dependent features but
the viewpoint in the image is along the axis of the major compo- from the perturbation of the "top-of" relations among the compo-
nents of the object, the resultant foreshortening converts one or nents.
some of the components into surface components, such as disks Palmer et al. (1981) conducted an extensive study of the per-
and rectangles in Figure 27, which are not included in the com- ceptibility of various objects when presented at a number of
ponentia! description of the object. In addition, as illustrated in different orientations. Generally, a three-quarters front view
Figure 27, the surfaces may occlude otherwise diagnostic com- was most effective for recognition, and their subjects showed a
ponents. Consequently, the components extracted from the im- clear preference for such views. Palmer et al. (1981) termed this
age will not readily match the mental representation of the ob- effective and preferred orientation of the object its canonical
ject and identification will be much more difficult compared to orientation. The canonical orientation would be, from the per-
an orientation, such as that shown in Figure 28, which does spective of RBC, a special case of the orientation that would
convey the components. maximize the match of the components in the image to the rep-
A second condition under which viewpoint affects identifiability resentation of the object.
of a specific object arises when the orientation is simply unfamil-
iar, as when a sofa is viewed from below or when the top-bottom
Transfer Between Different Viewpoints
relations among the components are perturbed as when a nor-
mally upright object is inverted. Jdicoeur (1985) recently reported When an object is seen at one viewpoint or orientation it can
that naming RTs were lengthened as a function of an object's rota- often be recognized as the same object when subsequently seen
tion away from its normally upright position. He concluded that at some other orientation in depth, even though there can be
mental rotation was required for the identification of such objects, extensive differences in the retinal projections of the two views.
as the effect of X-Y rotation on KIs was similar for naming and The principle of componential recovery would hold that trans-
HUMAN IMAGE UNDERSTANDING 141
1020
1000
\
\
\
«_ \\
g 980
\
J
^ •——-^____^ Component
e ^s^ \ ~"~~ ^Deletion
.§ 960
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
.0
X
This document is copyrighted by the American Psychological Association or one of its allied publishers.
1 940 X
rr
g X
X
fe 920
0
o
o> Midsegment
2 900 ~ Deletion
'
i i i
65 100 200
Exposure Duration (msec)
Figure 24. Mean correct reaction time (in milliseconds) in object naming as a function of the nature
of contour removal (deletion at midsegments or components) and exposure duration.
fer between two viewpoints would be a function of the corapo- of trials. (In another experiment [Bartram, 1976], essentially
nential similarity between the views, as long as the relations the same results were found with a same-different name-
among the components were not altered. This could be experi- matching task in which pairs of pictures were presented.) In the
mentally tested through priming studies with the degree of identical condition, the pictures were identical across the trial
priming predicted to be a function of the similarity (viz., com- blocks. In the different view condition, the same objects were
mon minus distinctive components) of the two views. If two depicted from one block to the next but in different orienta-
different views of an object contained the same components, tions. In the different exemplar condition, different exemplars,
RBC would predict that, aside from effects attributable to varia- for example, different instances of a chair, were presented, all
tions in aspect ratio, there should be as much priming as when of which required the same response. Bartram found that the
the object was presented at an identical view. An alternative naming RTs for the identical and different view conditions were
possibility to componential recovery is that a presented object equivalent and both were shorter than control conditions, de-
would be mentally rotated (Shepard & Metzler, 1971) to corre- scribed below, for concept and response priming effects. Bar-
spond to the original representation. But mental rotation rates tram theorized that observers automatically compute and ac-
appear to be too slow and effortful to account for the ease and cess all possible three-dimensional viewpoints when viewing a
speed with which transfer occurs between different orientations given object. Alternatively, it is possible that there was high
in depth of the same object. componential similarity across the different views and the ex-
There may be a restriction on whether a similarity function periment was insufficiently sensitive to detect slight differences
for priming effects will be observed. Although unfamiliar ob- from one viewpoint to another. However, in four experiments
jects (or nonsense objects) should reveal a componential simi- with colored slides, we (Biederman & Lloyd, 1985) failed to ob-
larity effect, the recognition of a familiar object, whatever its tain any effect of variation in viewing angle and have thus repli-
orientation, may be too rapid to allow an appreciable experi- cated Bartram's basic effect (or lack of effect). At this point,
mental priming effect. Such objects may have a representation our inclination is to agree with Bartram's interpretation, with
for each orientation that provides a different componential de- somewhat different language, but restrict its scope to familiar
scription. Bartram's (1974) results support this expectation objects. It should be noted that both Bartram's and our results
that priming effects might not be found across different views are inconsistent with a model that assigned heavy weight to the
of familiar objects. Bartram performed a series of studies in aspect ratio of the image of the object or postulated an underly-
which subjects named 20 pictures of objects over eight blocks ing mental rotation function.
142 IRVING BDEDERMAN
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
This document is copyrighted by the American Psychological Association or one of its allied publishers.
Different Exemplars Within an Object Class jects are most readily identified at a basic, as opposed to a subor-
dinate or superordinate, level of description. The componential
Just as we might be able to gauge the transfer between two representations described here are representations of specific,
different views of the same object based on a componential- subordinate objects, although their identification was often
based similarity metric, we might be able to predict transfer measured with a basic-level name. Much of the research sug-
between different exemplars of a common object, such as two gesting that objects are recognized at a basic level have used
different instances of a lamp or chair. stimuli, often natural, in which the subordinate-level exemplars
As noted in the previous section, Bartram (1974) also in- had componential descriptions that were highly similar to those
cluded a different exemplar condition, in which different ob- for a basic-level prototype for that class of objects. Only small
jects with the same name—different cars, for example—were
componential differences, or color or texture, distinguished the
depicted from block to block. Under the assumption that
subordinate-level objects. Thus distinguishing Asian elephants
different exemplars would be less likely to have common com-
from African elephants or Buicks from Oldsmobiles requires
ponents, RBC would predict that this condition would be slower
fine discrimination for their verification. The structural de-
than the identical and different view conditions but faster than
scriptions for the largest components would be identical. It is
a different object control condition with a new set of objects
not at all surprising that in these cases basic-level identification
that required different names for every trial block. This was
would be most rapid. On the other hand, many human-made
confirmed by Bartram.
categories, such as lamps, or some natural categories, such as
For both different views of the same object as well as different
dogs (which have been bred by humans), have members that
exemplars (subordinates) within a basic-level category, RBC pre-
have componential descriptions that differ considerably from
dicts that transfer would be based on the overlap in the compo-
nents between the two views. The strong prediction would be that one exemplar to another, as with a pole lamp versus a ginger jar
the same similarity function that predicted transfer between table lamp, for example. The same is true of objects that differ
different orientations of the same object would also predict the from their basic-level prototype, as penguins or sport cars. With
transfer between different exemplars with the same name. such instances, which unconfound the similarity between basic-
level and subordinate-level objects, perceptual access should be
at the subordinate {or instance) level, a result supported by a
The Perceptual Basis of Basic Level Categories recent report by Jolicoeur, Gluck, and Kosslyn (1984). In gen-
Consideration of the similarity relations among different ex- eral, then, recognition will be at the subordinate level but will
emplars with the same name raises the issue as to whether ob- appear to be at the basic level when the componential descrip-
HUMAN IMAGE UNDERSTANDING 143
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
This document is copyrighted by the American Psychological Association or one of its allied publishers.
Figure 26. Recoverable version of an object where the contour deletion is produced by an occluding surface.
(The object, a flashlight, is the same as that shown in Figure 25. The reader may note that the three-
dimensional percept in this figure does not occur instantaneously.)
tions are the same at the two levels. However, the ease of percep- tial recovery to handle the similarity of objects. Simply put,
tual recognition of nonprototypical exemplars, such as pen- similar objects will be those that have a high degree of overlap
guins, makes it clear that recognition will be at the level of the in their components and in the relations among these compo-
exemplar. nents. A similarity measure reflecting common and distinctive
The kinds of descriptions postulated by RBC may play a cen- components (Tversky, 1977) may be adequate for describing the
tral role in children's capacity to acquire names for objects. similarity among a pair of objects or between a given instance
They may be predisposed to employ different labels for objects and its stored or expected representation, whatever their basic-
that have different geon descriptions. When the perceptual sys- or subordinate-level designation.
tem presents a new description for an arrangement of large
geons, the absence of activation might readily result in the ques- The Perception of Nonrigid Objects
tion "What's that?"
For some categories, such as chairs, one can conceive of an Many objects and creatures, such as people and telephones,
extraordinarily large number of instances. Do we have a priori have articulated joints that allow extension, rotation, and even
structural descriptions for all these cases? Obviously not. Al- separation of their components. There are two ways in which
though we can recognize many visual configurations as chairs, such objects can be accommodated by RBC. One possibility,
it is likely that only those for which there exists a close structural as described in the previous section on the representation for
description in memory will recognition be rapid. The same ca- variation within a basic-level category, is that independent
veat that was raised about the Marr and Nishihara (1978) dem- structural descriptions are necessary for each sizable alteration
onstrations of pipe-cleaner animals in an earlier section must in the arrangement of an object's components. For example, it
be voiced here. With casual viewing, particularly when sup- may be necessary to establish a different structural description
ported by a scene context or when embedded in an array of for the left-most pose in Figure 29 than in the right-most pose.
other chairs, it is often possible to identify unusual instances as If this was the case, then a priming paradigm might not reveal
chairs without much subjective difficulty. But when presented any priming between the two stimuli. Another possibility is that
as an isolated object without benefit of such contextual support, the relations among the components can include a range of pos-
we have found that recognition of unfamiliar exemplars re- sible values (Marr & Nishihara, 1978). For a relation that al-
quires markedly longer exposure durations than those required lowed complete freedom for movement, the relation might sim-
for familiar instances of a category. ply be "joined." Even that might be relaxed in the case of ob-
It takes but a modest extension of the principle of componen- jects with separable parts, as with the handset and base of a
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
144
IRVING BIEDERMAN
Figure 27. A viewpoint parallel to the axes of the major components of a common object.
Figure 28. The same object as in Figure 27, but with a viewpoint not parallel to the major components.
HUMAN IMAGE UNDERSTANDING 145
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
This document is copyrighted by the American Psychological Association or one of its allied publishers.
telephone. In that case, it might be either that the relation is about a three-dimensional world can be made from the edges
"nearby" or else different structural descriptions are necessary in a two-dimensional image. These principles constitute a sig-
for attached and separable configurations. Empirical research nificant portion of the corpus of Gestalt organizational con-
needs to be done to determine if less restrictive relations, such straints. Given that the primitives are fitting simple parsed parts
as "join" or "nearby," have measurable perceptual conse- of an object, the constraints toward regularization characterize
quences. It may be the case that the less restrictive the relation, not the complete object but the object's components. RBC thus
the more difficult the identifiability of the object. Just as there provides, for the first time, an account of the heretofore unde-
appear to be canonical views of rigid objects (Palmer et al., cided relation between these principles of perceptual organiza-
1981), there may be a canonical "configuration" for a nonrigid tion and human pattern recognition.
object. Thus, the poses on the right in Figure 29 might be identi-
fied as a woman more slowly than would the poses on the left. References
Attneave, F. (1982). Pragnanz and soap bubble systems. In J. Beck (Ed.)
Conclusion Organization and representation in visual perception (pp. 11-29).
To return to the analogy with speech perception, the charac- Hillsdale. NJ: Erlbaum.
terization of object perception provided by RBC bears a close Ballard, D., & Brown, C. M. (1982). Computer vision. Englewood Cliffs,
NJ: Prentice-Hall.
resemblance to some current views as to how speech is per-
Barrow, H. G., & Tenenbaum. J. M. (1981). Interpreting line-drawings
ceived. In both cases, the ease with which we are able to code as three-dimensional surfaces. Artificial Intelligence, 17, 75-116.
tens of thousands of words or objects is solved by mapping that Bartlett, F. C. (1932). Remembering: a study in experimental and social
input onto a modest number of primitives—55 phonemes or psychology. New York: Cambridge Univ. Press.
36 components—and then using a representational system that Bartram, D. (1974). The role of visual and semantic codes in object
can code and access free combinations of these primitives. In naming. Cognitive Psychology, 6, 325-356.
both cases, the specific set of primitives is derived from dichoto- Bartram, D. (1976). Levels of coding in picture-picture comparison
mous (or trichotomous) contrasts of a small number (less than tasks. Memory & Cognition, 4, 593-602.
ten) of independent characteristics of the input. The ease with Beck, J., Prazdny, K., & Rosenfeld, A. (1983). A theory of textural seg-
which we are able to code so many words or objects may thus mentation. In J. Beck, B. Hope, & A. Rosenfeld (Eds.), Human and
machine vision (pp. 1-38). New \brk: Academic Press.
derive less from a capacity for coding continuous physical varia-
Bed, P. J., & Jain, R. C. (1986). Invariant surface characteristics for 3D
tion than it does from a perceptual system designed to represent object recognition in range images. Computer Vision, Graphics, and
the free combination of a modest number of categorized primi- Image Processing, 33, 33-80.
tives based on simple perceptual contrasts. Biederman, I. (1981). On the semantics of a glance at a scene. In M.
In object perception, the primitive components may have Kubovy & J. R. Pomerantz (Eds.), Perceptual organization (pp. 213-
their origins in the fundamental principles by which inferences 253). Hillsdale, NJ: Erlbaum.
146 IRVING BIEDERMAN
Biedcrman, I. (198S). Human image understanding: Recent experi- global information. Machine intelligence 6 (pp. 325-375). Edin-
ments and a theory. Computer Vision, Graphics, and Image Process- burgh: Edinburgh University Press.
ing. 32, 29-73. Hildebrandt, K. A. (1982). The role of physical appearance in infant
Biederman, I., Beiring, E., Ju, G., & Buckle, T. (1985). A comparison and child development. In H. E. Fitzgerald, E. Lester, & M. Young-
of the perception of partial vs. degraded objects. Unpublished manu- man (Eds.), Theory and research in behavioral pediatrics (Vol. I,pp.
script, State University of New York at Bufialo. 181-219). New York: Plenum.
Biederman, I., & Blickle, T. (1985). The perception of objects with de- Hildebrandt, K. A., & Fitzgerald, H. E. (1983). The infant's physical
leted contours. Unpublished manuscript, State University of New attractiveness: Its effect on bonding and attachment, infant Mental
York at Bufialo. fkalth Journal, 4, 3-12.
Biederman, 1., Blickle, T. W., Teitelbaum, R. C, Klatsky, G. J., & Mez- Hochberg, J. E. (1978). Perception (2nd ed.). Englewood Clifls, NJ:
zanotte, R. J. (in press). Object identification in multi-object, non- Prentice-Hall.
scene displays. Journal of Experimental Psychology: teaming. Mem- Hoffman, D. D., & Richards, W. (1985). Parts of recognition. Cogni-
ory, and Cognition. tion, IS, 65-96.
Biederman, I., & Ju. G., (in press). Surface vs. edge-based determinants Humphreys, G. W. (1983). Reference frames and shape perception.
of visual recognition. Cognitive Psychology. Cognitive Psychology, 15, 151-196.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
Biederman, I., Ju, G., & Clapper, J. (1985). The perception of partial Ittleson, W. H. (1952). The Ames demonstrations in perception. New
This document is copyrighted by the American Psychological Association or one of its allied publishers.
J. Beck, B. Hope, & A. Rosenfeld, (Eds.), Human and machine vision Trivers, R. (1985). Social evolution. Menlo Park, CA: Benjamin/Cum-
(pp. 341-364). New York: Academic Press. mings.
Perkins, D. N., & Deregowski, J. (1982). A cross-cultural comparison Tversky, A. (1977). Features of similarity. Psychological Review, 84,
of the use of a Gestalt perceptual strategy. Perception, 11,279-286. 327-352.
Pomeranu. J. R. (1978). Pattern and speed of encoding. Memory & Tversky, B., & Hemenway, K. (1984). Objects, parts, and categories.
Cognition, 5,235-241. Journal of Experimental Psychology General, 113, 169-193.
Ullman, S. (1984). Visual routines. Cognition. 18. 97-159.
Rock, I. (1983). The logic of perception. Cambridge, MA: MIT Press.
Virsu, V. (1971a). Tendencies to eye movement, and misperception of
Rock, I. (1984). Perception. New York: W. H. Freeman.
curvature, direction, and length. Perception & Psychophysics, 9, 65-
Rosen, E., Mervis, C. B., Gray, W., Johnson, D., & Boyes-Braem, P. 72.
(1976). Basic objects in natural categories. Cognitive Psychology, 8, Virsu, V. (1971b). Underestimation of curvature and task dependence
382-439. in visual perception of form. Perception & Psychophysics, 9, 339-
Ryan, T, & Schwartz, C. (1956). Speed of perception as a function of 342.
mode of representation. American Journal of Psychology, 69,60-69. Waltz, D. (1975). Generating semantic descriptions from drawings of
Shepard, R. N., & Metzler, J. (1971). Mental rotation of three dimen- scenes with shadows. In P. Winston (Ed.), The psychology of computer
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
sional objects. Science, 171, 701-703. vision (pp. 19-91). New York: McGraw-Hill.
This document is copyrighted by the American Psychological Association or one of its allied publishers.
Sugihara, K. (1982). Classification of impossible objects. Perception. Winston, P. A. (1975). Learning structural descriptions from examples.
11, 65-74. In P. H. Winston (Ed.), The psychology of computer vision (pp. 157-
Sugihara, K. (1984). An algebraic approach to shape-from-image prob- 209). New York: McGraw-Hill.
lems. Artificial Intelligence, 23,59-95. Witkin, A. P., & Tenenbaum, J. M. (1983). On the role of structure in
vision. In J. Beck, B. Hope, & A. Rosenfeld (Eds.), Human and ma-
Treisman, A. (1982). Perceptual grouping and attention in visual search
chine vision (pp. 481-543). New York: Academic Press.
for objects. Journal of Experimental Psychology: Human Perception
Woodworth, R. S. (1938). Experimental psychology. New York: Holt.
and Performance, S, 194-214.
Treisman, A., & Gelade, G. (1980). A feature integration theory of at- Received June 28, 1985
tention. Cognitive Psychology, 12, 97-136. Revision received June 3, 1986 •