
2. General approaches to determining saliency
The term saliency was used by Tsotsos et al. [27] and Ol-
shausen et al. [25] in their work on visual attention, and by
Itti et al. [16] in their work on rapid scene analysis. Saliency
has also been referred to as visual attention [27, 22], un-
predictability, rarity, or surprise [17, 14]. Saliency esti-
mation methods can broadly be classified as biologically
based, purely computational, or a combination. In general,
all methods employ a low-level approach by determining
contrast of image regions relative to their surroundings, us-
ing one or more features of intensity, color, and orientation.
Itti et al. [16] base their method on the biologically plau-
sible architecture proposed by Koch and Ullman [19]. They
determine center-surround contrast using a Difference of
Gaussians (DoG) approach. Frintrop et al. [7] present a
method inspired by Itti’s method, but they compute center-
surround differences with square filters and use integral im-
ages to speed up the calculations.
Other methods are purely computational [22, 13, 12, 1]
and are not based on biological vision principles. Ma and
Zhang [22] and Achanta et al. [1] estimate saliency us-
ing center-surround feature distances. Hu et al. [13] es-
timate saliency by applying heuristic measures on initial
saliency measures obtained by histogram thresholding of
feature maps. Gao and Vasconcelos [8] maximize the mu-
tual information between the feature distributions of center
and surround regions in an image, while Hou and Zhang
[12] rely on frequency domain processing.
The third category of methods are those that incorporate
ideas that are partly based on biological models and partly
on computational ones. For instance, Harel et al. [10] create
feature maps using Itti’s method but perform their normal-
ization using a graph based approach. Other methods use
a computational approach like maximization of information
[3] that represents a biologically plausible model of saliency
detection.
Some algorithms detect saliency over multiple scales
[16, 1], while others operate on a single scale [22, 13]. Also,
individual feature maps are created separately and then
combined to obtain the final saliency map [15, 22, 13, 7], or
a feature combined saliency map is directly obtained [22, 1].
2.1. Limitations of saliency maps
The saliency maps generated by most methods have
low resolution [16, 22, 10, 7, 12]. Itti’s method produces
saliency maps that are just 1/256
th
the original image size
in pixels, while Hou and Zhang [12] output maps of size
64 × 64 pixels for any input image size. An exception is
the algorithm presented by Achanta et al. [1] that outputs
saliency maps of the same size as the input image. This is
accomplished by changing the filter size to achieve a change
in scale rather than the original image size.
Depending on the salient region detector, some maps
additionally have ill-defined object boundaries [16, 10, 7],
limiting their usefulness in certain applications. This arises
from severe downsizing of the input image, which reduces
the range of spatial frequencies in the original image con-
sidered in the creation of the saliency maps. Other methods
highlight the salient object boundaries, but fail to uniformly
map the entire salient region [22, 12] or better highlight
smaller salient regions than larger ones [1]. These short-
comings result from the limited range of spatial frequen-
cies retained from the original image in computing the final
saliency map as well as the specific algorithmic properties.
3. Frequency Domain Analysis of Saliency De-
tectors
We examine the information content used in the creation
of the saliency maps of five state-of-the-art methods from a
frequency domain perspective. The five saliency detectors
are Itti et al. [16], Ma and Zhang [22], Harel et al. [10],
Hou and Zhang [12], and Achanta et al. [1], hereby re-
ferred to as IT, MZ, GB, SR, and AC, respectively. We refer
to our proposed method as IG. The choice of these algo-
rithms is motivated by the following reasons: citation in lit-
erature (the classic approach of IT is widely cited), recency
(GB, SR, and AC are recent), and variety (IT is biologically
motivated, MZ is purely computational, GB is a hybrid ap-
proach, SR estimates saliency in the frequency domain, and
AC outputs full-resolution maps).
3.1. Spatial frequency content of saliency maps
To analyze the properties of the five saliency algorithms,
we examine the spatial frequency content from the original
image that is retained in computing the final saliency map.
It will be shown in Sec. 4.3 that the range of spatial frequen-
cies retained by our proposed algorithm is more appropriate
than the algorithms used for comparison. For simplicity, the
following analysis is given in one dimension and extensions
to two dimensions are clarified when necessary.
In method IT, a Gaussian pyramid of 9 levels (level 0 is
the original image) is built with successive Gaussian blur-
ring and downsampling by 2 in each dimension. In the case
of the luminance image, this results in a successive reduc-
tion of the spatial frequencies retained from the input im-
age. Each smoothing operation approximately halves the
normalized frequency spectrum of the image. At the end
of 8 such smoothing operations, the frequencies retained
from the spectrum of the original image at level 8 range
within [0, π/256]. The technique computes differences
of Gaussian-smoothed images from this pyramid, resizing
them to size of level 4, which results in using frequency con-
tent from the original image in the range [π/256, π/16]. In
this frequency range the DC (mean) component is removed