Object recognition kunal

28/12/2014 Object Recognition
http://www.cs.rochester.edu/~nelson/research/recognition/recognition.html 1/8

Object Recognition Research
Randal C. Nelson
Department of Computer Science
University of Rochester
Appearancebased object recognition methods have
recently demonstrated good performance on a
variety of problems. However, many of these
methods either require good wholeobject
segmentation, which severely limits their
performance in the presence of clutter, occlusion, or
background changes; or utilize simple conjunctions of lowlevel
features, which causes crosstalk problems as the number of objects is
increased. We are investigating an appearancebased object
recognition system using a keyed, multilevel context representation,
that ameliorates many of these problems, and can be used with
complex, curved shapes. Pictures on this page are from a training
database we have used in system tests.
Specifically, we utilize distinctive intermediatelevel features in this
case automatically extracted 2D boundary fragments, as keys, which
are then verified within a local context, and assembled within a loose
global context to evoke an overall percept. The system demonstrates
extraordinarily good recognition of a variety of 3D shapes, ranging
from sports cars and fighter planes to snakes and lizards with full
orthographic invariance. We have performed a number of largescale
experiments, involving over 2000 separate test images, that evaluate
performance with increasing number of items in the database, in the

presence of clutter, background change, and occlusion, and also the
results of some generic classification experiments where the system is
tested on objects never previously seen or modeled. To our
knowledge, the results we report are the best in the literature for full
sphere tests of general shapes with occlusion and clutter resistance.
The basic idea is to represent the visual
appearance of an object as a loosely structured
combination of a number of local context
regions keyed by distinctive key features, or
fragments. A local context region can be
thought of as an image patch surrounding the key feature and
containing a representation of other features that intersect the patch.
Now under different conditions (e.g. lighting, background, changes in
orientation etc.) the feature extraction process will find some of these
distinctive keys, but in general not all of them. Also, even with local
contextual verification, such keys may well be consistent with a
number of global hypotheses. However, the fraction that can be found
by existing feature extraction processes is frequently sufficient to
identify objects in the scene, once the global evidence is assembled.
This addresses one of the principle problems of object recognition,
which is that, in any but rather artificial conditions, it has so far
proved impossible to reliably segment whole objects on a bottomup
basis. In the current system, local features based on automatically
extracted boundary fragments are used to represent multiple 2D
views (aspects) of rigid 3D objects, but the basic idea could be
applied to other features and other representations.

The basic recognition strategy is to utilize a
database (here viewed as an associative memory)
of key features embedded in local contexts, which
is organized so that access via an unknown key
feature evokes associated hypotheses for the
identity and configuration of all known objects that
could have produced such an embedded feature.
These hypotheses are fed into a second stage
associative memory, keyed by configurations,
which lumps the hypotheses into clusters that are mutually consistent
within a loose global context. This secondary database maintains a
probabilistic estimate of the likelihood of each cluster based on
statistics about the occurrence of the keys in the primary database.
The idea is similar to a multidimensional Hough transform without
the space problems. In our case, since 3D objects are represented by
a set of views, the configurations represent two dimensional
transforms of specific views. Efficient access to the associative
memories is achieved using a hashing scheme on parameters of the
keying features, followed by verification of the local context. As
mentioned above, this local verification step gives the voting features
sufficient power to substantially ameliorate well known problems
with false positives in Houghlike voting schemes. Details on
associative memory
A fundamental component of the approach is the use of distinctive
local features we call keys. A key is any robustly extractable part or
feature that has sufficient information content to specify a
configuration of an associated object plus enough additional, pose
insensitive (sometimes called semiinvariant) parameters to provide
efficient indexing. The local context amplifies the power of the
feature by providing a means of verification. This local verification
step is critical, because the invariant parameters of the key features
are relatively weak evidence, leading to a proliferation of high
scoring false hypotheses if used alone. This is a well known problem

with voting schemes, but can be alleviated if the voting features are
sufficiently powerful. In the current implementation we have utilized
keys based on extracted boundary fragments, both straight and
curved, but the method is by no means limited to such keys, and we
are looking at several complementary feature types. Details on keys
used.
In order to use the system with an object,
its appearances must be stored in the
associative memory. Currently, this is done
by obtaining a number of uncluttered
images of the object from different
directions. About 100 views are needed to
cover the entire viewing sphere for the
curvebased keys we have used. For each view, key features are
extracted, and a number of the strongest are stored in the memory
with associated information about the object and view that produced
them, and their relationship to an arbitrarily specified 2D
configuration (position, orientation, scale) for that view.
To recognize an object, that is to answer the question "what object is
in this image?", key features together with their local contexts are
extracted from the image, and fed into the associative memory. All
matches are retrieved, and for each match, the associated information
is used to compute a hypothesis about the identity, view, and
configuration of a possible object. This hypothesis is fed to a second,
"working" associative memory, where current hypotheses are stored.
If any matches are found, the evidence associated with them is
updated to reflect the new information. Otherwise a new hypothesis is
entered. The accumulation is not a flat voting process, but depends on
the frequency of occurrence of the feature over the entire database,
with uncommen features providing more evidence. The evidence
combination scheme is Bayesian if the features are independent (they
are not, but we don't have a better model, and the results are better

than flat voting). The hypothesis memory is them examined, and the
configuration with the most evidence selected as the most probable
answer.
To find an object of known characteristics in a
scene, that is to answer the question of the form
"where is the dog in this image?", the same
procedure is followed, except that key feature
matches are filtered on the basis of whether the
came from a view of a dog. This actually provides a rather powerful
mechanisms for partially indexed retrieval, since the filtering can
occur on any combination of attributes that we care to associate with
the features, either in the database, or from the image, e.g. "animal",
or "pink cup". Details of algorithm.
The approach has several advantages. First, because it is based on a
merged percept of local features rather than global properties, the
method is robust to occlusion and background clutter, and does not
require prior segmentation. This is an advantage over systems based
on principal components template analysis, which are sensitive to
occlusion and clutter. Second, entry of objects into the memory is an
active, automatic procedure. Essentially, the system explores the
object visually from different viewpoints, accumulating 2D views,
until it has seen enough not to mix it up with any other object it
knows about. Third, the method lends itself naturally to multimodal
recognition. Because there is no single, global structure for the model,
evidence from different kinds of keys can be combined as easily as
evidence from multiple keys of the same type. The only requirement
is that the configuration descriptions evoked by the different keys
have enough common structure to allow evidence combination
procedures to be used. This is an advantage over conventional
alignment techniques, which typically require a prior 3D model of
the object. Finally, the probabilistic nature of the evidence
combination scheme, coupled with the formal definitions for semi

invariance and robustness allow quantitative predictions of the
reliability of the system to be made.
We have run several largescale performance tests,
involving, altogether, over 2000 separate test images. In
these experiments we investigate variation in performance
with respect to increasing database size, clutter, and
occlusion. In forced choice experiments using clean test
images from a 24 object database, we obtain 97%
classification accuracy. Performance with 75% clutter and 25%
occlusion is in the 90%+ range. We have developed a statistical
model for predicting the performance in a variety of situations from a
few basic measurements of score distributions for clean test images
and pure clutter. We also ran a generic recognition experiment, where
the system was trained on several objects in each of several several
classes, (e.g. planes, snakes, cars), and asked to classify example
objects from the same generic classes, but not in the training set.
Details of experiments.
References
Andrea Selinger and Randal C. Nelson, ``A Perceptual Grouping
Hierarchy for AppearanceBased 3D Object Recognition'', Computer
Vision and Image Understanding, vol. 76, no. 1, October 1999, pp.83
92. Abstract, gzipped postscript (preprint)
Randal C. Nelson and Andrea Selinger ``LargeScale Tests of a
Keyed, AppearanceBased 3D Object Recognition System'', Vision
Research, Special issue on computational vision, Vol. 38, No. 1516,
Aug. 1998. Abstract, gzipped postscript (preprint)
Randal C. Nelson and Andrea Selinger ``A Cubist Approach to Object
Recognition'', International Conference on Computer Vision

(ICCV98), Bombay, India, January 1998, 614621. Abstract, gzipped
postscript, also in an extended version with more complete
description of the algorithms, and additional experiments.
Randal C. Nelson, Visual Learning and the Development of
Intelligence, In Early Visual Learning, Shree K. Nayar and Tomaso
Poggio, Editors, Oxford University Press, 1996, 215236. Abstract,
Randal C. Nelson, ``From Visual Homing to Object Recognition'' , in
Visual Navigation, Yiannis Aloimonos, Editor, Lawrence Earlbaum
Inc, 1996, 218250. Abstract,
Randal C. Nelson, ``MemoryBased Recognition for 3D Objects'',
Proc. ARPA Image Understanding Workshop, Palm Springs CA,
February 1996, 13051310. Abstract, gzipped postscript
Randal C. Nelson, ``3D Recognition Via 2stage Associative
Memory'', University of Rochester, Dept of Computer Science TR
565, January 1995. Abstract, gzipped postscript
Recap of Links in Text
Associative Memory
Key Features
Recognition Algorithm
Full Publication List
Back to research page

Back to Randal Nelson's home page

Object recognition kunal

More Related Content

Similar to Object recognition kunal

More from Kunal Kishor Nirala

Recently uploaded

Object recognition kunal