0% found this document useful (0 votes)
55 views11 pages

Reduced-Reference Image Quality Assessment Using D

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
55 views11 pages

Reduced-Reference Image Quality Assessment Using D

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/224393815

Reduced-Reference Image Quality Assessment Using Divisive Normalization-


Based Image Representation

Article  in  IEEE Journal of Selected Topics in Signal Processing · May 2009


DOI: 10.1109/JSTSP.2009.2014497 · Source: IEEE Xplore

CITATIONS READS
301 355

2 authors, including:

Zhou Wang
University of Waterloo
237 PUBLICATIONS   57,008 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Comprehensive Performance Evaluation of IQA Algorithms View project

Quality Assessment of Images undergoing Multiple Distortions and Degraded Reference Image Quality Assessment View project

All content following this page was uploaded by Zhou Wang on 12 November 2017.

The user has requested enhancement of the downloaded file.


202 IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, VOL. 3, NO. 2, APRIL 2009

Reduced-Reference Image Quality Assessment Using


Divisive Normalization-Based Image Representation
Qiang Li, Student Member, IEEE, and Zhou Wang, Member, IEEE

Abstract—Reduced-reference image quality assessment types of distortions between the reference and distorted images
(RRIQA) methods estimate image quality degradations with are fixed and known.
partial information about the “perfect-quality” reference image. Reduced-reference IQA (RRIQA) methods provide an inter-
In this paper, we propose an RRIQA algorithm based on a divisive
normalization image representation. Divisive normalization has esting tradeoff. They predict the quality degradation of an image
been recognized as a successful approach to model the perceptual with only partial information about the reference image, in the
sensitivity of biological vision. It also provides a useful image form of a set of RR features [1]. RRIQA measures supply a prac-
representation that significantly improves statistical independence tically useful and convenient tool in applications such as real-
for natural images. By using a Gaussian scale mixture statistical time visual information communications over wired or wireless
model of image wavelet coefficients, we compute a divisive normal-
ization transformation (DNT) for images and evaluate the quality networks, where they can be employed to monitor image quality
of a distorted image by comparing a set of reduced-reference degradations or control the network streaming resources on the
statistical features extracted from DNT-domain representations fly. Fig. 1 illustrates how an RRIQA system may be deployed.
of the reference and distorted images, respectively. This leads The system includes a feature extraction process at the sender
to a generic or general-purpose RRIQA method, in which no side and a feature extraction/quality analysis process at the re-
assumption is made about the types of distortions occurring in the
image being evaluated. The proposed algorithm is cross-validated ceiver side. The extracted RR features, or the side information,
using two publicly-accessible subject-rated image databases (the usually have a much lower data rate than the image data and are
UT-Austin LIVE database and the Cornell-VCL A57 database) typically transmitted to the receiver through an ancillary channel
and demonstrates good performance across a wide range of image [1]. It is often assumed that the ancillary channel is error-free.
distortions. However, this is not an absolutely necessary requirement since
Index Terms—Divisive normalization, image quality assessment, even partly decoded RR features may still be helpful in evalu-
reduced-reference image quality assessment (RRIQA), perceptual ating the quality of the distorted image, though the accuracy may
image representation, statistical image modeling. be affected. The ancillary channel may also be merged with the
distortion channel, in which the RR features would need to re-
I. INTRODUCTION ceive stronger protection (e.g., by error control coding) than the
image data during the transmission. Such examples include the
“quality-aware image” system proposed in [9]. At the receiver

I N RECENT years, there has been an increasing need of


accurate and easy-to-use image quality assessment (IQA)
algorithms in a variety of real world applications, including
side, the difference between the features extracted from the ref-
erence and distorted images is used to evaluate image quality
degradation. The feature extraction process at the receiver side
image compression, communication, printing, display, restora- may also be adapted according to the information obtained from
tion, segmentation, and fusion [1]. Most existing IQA methods the RR features received from the ancillary channel.
require full access to an original reference image that is assumed The general RRIQA framework described in Fig. 1 leaves
to have perfect quality. Without the reference image, the IQA flexibilities on the selection of RR features. This is indeed the
task becomes very difficult, and almost all existing no-reference major challenge in the design of RRIQA algorithms, where the
IQA metrics were designed for one or a set of predefined spe- appropriate RR features are desirable to:
cific distortion types (such as blocking [2]–[5] and blurring [5] 1) provide an efficient summary of the reference image;
in JPEG; and ringing [6], blurring [6] and wavelet quantization 2) be sensitive to a variety of image distortions;
effect [7], [8] in JPEG2000). They are unlikely to generalize for 3) be relevant to the visual perception of image quality.
evaluating images degraded with other types of distortions. In Another important aspect that has to be kept in mind in the se-
practice, these no-reference methods are useful only when the lection of RR features is to maintain a good balance between the
data rate of RR features and the accuracy of image quality pre-
diction. With a high data rate, one can include a large amount of
Manuscript received May 15, 2008; revised December 09, 2008. Current ver- information about the reference image, leading to more accurate
sion published March 11, 2009. The associate editor coordinating the review of
this manuscript and approving it for publication was Prof. Lina Karam.
estimation of image quality degradations, but it also becomes a
Q. Li is with the Department of Electrical Engineering, The University of heavy burden to transmit the RR features to the receiver. On the
Texas at Arlington, Arlington, TX 76019 USA. other hand, a lower data rate makes it easier to transmit the RR
Z. Wang is with the Department of Electrical and Computer Engineering, Uni- information, but more difficult for accurate quality estimation.
versity of Waterloo, Waterloo, ON N2L 3G1, Canada (e-mail: zhouwang@ieee.
org). In practical implementation and deployment, the maximal al-
Digital Object Identifier 10.1109/JSTSP.2009.2014497 lowed RR data rate is often given and must be observed. Overall,
1932-4553/$25.00 © 2009 IEEE

Authorized licensed use limited to: University of Waterloo. Downloaded on April 5, 2009 at 16:19 from IEEE Xplore. Restrictions apply.
LI AND WANG: REDUCED-REFERENCE IMAGE QUALITY ASSESSMENT 203

Fig. 1. General framework for the deployment of RRIQA systems.

the merits of an RRIQA system should not be gauged only by Although the method introduced in [9] achieved notable suc-
the quality prediction accuracy, but by a tradeoff between the cess, our further investigation has revealed some important lim-
accuracy and the RR data rate. itations. First, although the method performed quite well when
Three different but related types of approaches have been tested with individual distortion types (e.g., JPEG or JPEG 2000
employed in existing RRIQA methods [9]–[16]. The first type compression, blurring, or noise contamination), its performance
of approaches are based on modeling image distortions and degrades significantly when images with different types of dis-
are mostly developed for specific application environments tortions are tested together, as will be shown later in this paper.
[10]–[14]. For example, when the distortion type is known to Second, it uses a rather weak model of natural image statistics,
be standard image/video compression, a set of typical distor- as only marginal distributions of wavelet coefficients are consid-
tion artifacts such as blurring, blocking and ringing may be ered. It has been widely noticed that there exist strong dependen-
identified, and image features may be defined that are partic- cies between neighboring wavelet coefficients, which has been
ularly useful to quantify these artifacts [11], [12]. For another completely ignored by this method. Third, it also uses a rather
example, in [10], [13], a set of spatial and temporal features weak model for perceptual image representation, as wavelet de-
have been found to be effective in measuring the distortions composition is linear and cannot reflect the nonlinear mecha-
occurring in standard video compression and communication nisms used by the biological visual systems.
environment. The second type of approaches are based on In this paper, we propose a new RRIQA method that is in-
modeling the human visual system [15], [16], where perceptual spired by the recent success of the divisive normalization trans-
features motivated from computational models of low level form (DNT) as a perceptually and statistically motivated image
vision were extracted to provide a reduced description of the representation [17], [18]. In computational vision science, it has
image. One advantage of these approaches is that the perceptual long been hypothesized that the purpose of early visual sensory
features being employed are not directly related to any specific processing is to increase the statistical independence between
distortion system. As a result, RRIQA methods built upon them neuronal responses [19], [20]. However, linear decompositions,
could potentially be extended for general purpose. They may such as Fourier- and wavelet-types of transformations, only re-
also be trained on different types of distortions and produce duces the first-order correlation, but cannot reduce the higher
a variety of distortion-specific RRIQA algorithms under the order statistical dependencies [21]. In the literature of neural
same general framework. However, no study has been reported physiology, it has been shown that a local gain-control divisive
so far that applies these methods to the images with generic normalization model is powerful in accounting for the neuronal
distortions except for JPEG and JPEG2000 compression [15], responses in biological visual systems [22], [23]. This nonlinear
[16]. The third type of approaches are based on modeling gain-control mechanism is built upon linear transform models,
natural image statistics [9]. The basic assumption behind these where each neuronal response (or linear transform coefficient)
approaches is that most real-world image distortions disturb is normalized (divided) by the energy of a cluster of neighboring
image statistics and make the distorted image “unnatural.” The neuronal responses (neighboring coefficients). This process has
unnaturalness measured based on models of natural image sta- been shown to significantly reduce the statistical dependencies
tistics can then be used to quantify image quality degradation. of the original linear representation [21] and produce approx-
In [9], a generalized Gaussian density function is used to model imately Gaussian marginal distributions [24]. Similar models
the marginal statistics of the linear coefficients in wavelet has also been employed in real world image processing appli-
subbands, and the parameters of the fitting model are employed cations, including image compression [25] and image enhance-
as RR features. This general-purpose approach has achieved ment [18]. The strong perceptual and statistical relevance of di-
somewhat surprising success, as it does not require any training, visive normalization representation (as compared to linear de-
and has a rather low RR data rate, but still supplies reasonable compositions) motivated us to switch from the linear wavelet
performance when tested with a wide range of image distortion transform domain (as in [9]) to DNT domain in the design of
types [9]. our RRIQA method.

Authorized licensed use limited to: University of Waterloo. Downloaded on April 5, 2009 at 16:19 from IEEE Xplore. Restrictions apply.
204 IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, VOL. 3, NO. 2, APRIL 2009

II. DIVISIVE NORMALIZATION-BASED IMAGE REPRESENTATION

A. Computation of Divisive Normalization Transformation


A divisive normalization transform (DNT) is built upon a
linear image decomposition, followed by a divisive normaliza-
tion stage. The linear transformations may be discrete cosine
transform (DCT) (as in [25]) or wavelet-type of transforms (as
in [17], [18], [21]). Here, we assume a wavelet image decompo-
sition, which provides a convenient framework for localized rep-
resentation of images simultaneously in space, frequency (scale)
and orientation. Let represent a wavelet coefficient, then a nor-
malized coefficient is computed as , where is a posi-
tive divisive normalization factor that is calculated as the energy
of a cluster of coefficients that are close to the coefficient in
space, scale, and orientation.
Several different approaches have been used to compute the
normalization factor [17], [18], [21], [25]. Most of them use
a weighted sum of the squared neighboring coefficients plus a
positive constant [18], [21], [25]. This involves several param-
eters (the weights and the constant) that are sometimes difficult
to determine. They may be hand-picked (as in [25]) or chosen Fig. 2. (a) Original wavelet coefficients. (b) DNT coefficients. (c) Histogram of
to maximize the independence of the normalized response to original coefficients (solid curve) and a Gaussian curve with the same standard
deviation (dashed curve). (d) Histogram of DNT coefficients (solid) fitted with
an ensemble of natural images [21]. In [18], a global model of a Gaussian model (dashed).
Markov random field over the wavelet coefficients is assumed
and the parameters were derived by learning the model param-
eters using natural images. A more convenient approach is to the multiplier from the neighboring coefficient vector . The
derive the factor through a local statistical image model. In coefficient cluster moves step by step as a sliding window
particular, the Gaussian scale mixtures (GSM) model has found across a wavelet subband, resulting in a spatially varying nor-
to be very useful in this context [17]. A length- random vector malization factor . In our implementation, the normalization
is a GSM if it can be expressed as the product of two indepen- factor computed at each step is only applied to the center coef-
dent components: , where denotes equality in prob- ficient of the vector , and the normalized new coefficient
ability distribution, is a zero-mean Gaussian random vector becomes , where is the estimate of . A convenient
with covariance , and is a scalar random variable called method to obtain is by a maximum-likelihood estimation [17]
a mixing multiplier. In other words, the GSM model expresses given by
the density of a random vector as a mixture of Gaussians with
the same covariance structure but scaled differently (by
). Suppose that the mixing density is , then the density of
can be written as
(2)

(1) where the covariance matrix is estimated from


This GSM model has shown to be very useful to account for the entire wavelet subband before the estimation of local ,
both the marginal and joint statistics of the wavelet coefficients and is the length of vector , or the number of neighboring
of natural images [17], where the vector is formed by clus- wavelet coefficients.
tering a set of neighboring wavelet coefficients within a sub-
band, or across neighboring subbands in scale and orientation. B. Image Statistics in Divisive Normalization Transform
The GSM model has also found successful applications such as Domain
image desnoing [26], image restoration [27], and image quality As will be shown in the next section, our RRIQA approach is
assessment [28]. essentially based on the statistics of the transform coefficients in
The general form of the GSM model allows for the mixing DNT domain and how they vary with image distortions. Before
multiplier to be a continuous random variable at each location the development of the specific RRIQA algorithm, it is useful
of the wavelet subbands. To simplify the model, we assume that to observe variations of image statistics before and after the
only takes a fixed value at each location (but varies over space DNT is applied. In Fig. 2, we compare the marginal distribu-
and subbands). The benefit of this simplification is that when tions of an original wavelet subband computed from a steerable
is fixed, is simply a zero-mean Gaussian vector with covari- pyramid decomposition [29] [Fig. 2(a)] and the same subband
ance . As a result, it becomes natural to define the nor- after DNT [Fig. 2(b)]. In Fig. 2(c), the original wavelet coeffi-
malization factor in the DNT representation as an estimate of cient histogram is compared with a Gaussian shape that has the

Authorized licensed use limited to: University of Waterloo. Downloaded on April 5, 2009 at 16:19 from IEEE Xplore. Restrictions apply.
LI AND WANG: REDUCED-REFERENCE IMAGE QUALITY ASSESSMENT 205

TABLE I
KLD BETWEEN THE MARGINAL DISTRIBUTIONS OF WAVELET/DNT COEFFICIENTS AND GAUSSIAN FIT

Fig. 3. (a) Conditional histograms between a parent and a child coefficients extracted from the original wavelet representation and (b) the corresponding DNT
representation.

same standard deviation. The significant gap between the two By contrast, in the DNT representation, the histogram of the
curves indicates that the original wavelet coefficients are highly child coefficients makes little difference when conditioned on
non-Gaussian. It has been shown that such histograms can be the magnitudes of the parent coefficients, as can be seen in
well-fitted with a generalized Gaussian density function (GGD) Fig. 3(b). This demonstration clearly shows that the DNT repre-
given by [30] sentations can significantly reduce the second-order dependen-
cies between the transform coefficients.
(3) C. Perceptual Relevance of Divisive Normalization
Representation
where (for ) is the Gamma func- The DNT image representation is not only an effective way
tion, and and are called the scale and power factors, respec- to reduce the statistical redundancies between wavelet coeffi-
tively. The Gaussian density is a special case of GGD when is cients, it is also highly relevant to biological vision. First, based
fixed to be 2. However, for the histograms of the wavelet coeffi- on the widely accepted hypothesis that the early visual sen-
cients of natural images, the best fitting value of typically lies sory processing is optimized to increase the statistical inde-
between 0.5 and 1.0 [31]. By contrast, the histogram of the coef- pendence between neuronal responses (subject to certain phys-
ficients after DNT can be well-fitted with a Gaussian, as demon- ical limitations such as power consumption) through the evolu-
strated in Fig. 2(d). Similar behavior is observed for other nat- tion and development processes, the modeling of the biological
ural images. To provide a quantitative measure, we compute the visual system and the modeling of natural scene statistics are
Kullback–Leibler distance (KLD) [32] between the histogram dual problems [19]–[21]. Second, in the context of neural phys-
and the best-fitting Gaussian curve before and after DNT for a iology, it has been found that divisive normalization provides
set of natural images. The results are shown in Table I, where we an effective model to account for many recorded data of cell re-
can see that Gaussian fit is consistently better in DNT domain sponses in the visual cortex [22], [23]. It is also a useful frame-
for all test images. work in explaining the adaptations of neural responses with re-
Fig. 3 demonstrates the impact of DNT on the joint statis- spect to the variations of the visual environment [33]. Third, in
tics of wavelet coefficients. In Fig. 3(a) and (b), we show the psychophysical vision, it has been shown that the divisive nor-
conditional histograms of the coefficients extracted from two malization procedure can well explain the visual masking effect
neighboring subbands (a parent band and a child band) in the [34], [35], where the visibility of an image component (e.g., a
original wavelet decomposition and in the DNT representation, wavelet coefficient) is reduced in the presence of large neigh-
respectively. It can be observed that in the conditional histogram boring components (e.g., the wavelet coefficients close in space,
[ in Fig. 3(a)], the variance of a child coefficient scale, and orientation). Furthermore, the perceptual relevance
(vertical axis) is highly dependent on the magnitude of its parent of DNT image representation has also been demonstrated by
coefficient (horizontal axis). Such strong second-order variance testing its resilience to noise contamination as well as its effec-
dependency is confirmed by the significant difference between tiveness in image compression and image contrast enhancement
the widths of two cross-sections of the conditional histogram. [18].

Authorized licensed use limited to: University of Waterloo. Downloaded on April 5, 2009 at 16:19 from IEEE Xplore. Restrictions apply.
206 IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, VOL. 3, NO. 2, APRIL 2009

Fig. 4. Histograms of DNT coefficients in a wavelet subband under different types of image distortions. (a) Original “Lena” image. (b) Gaussian noise contami-
nated image. (c) Gaussain blurred image. (d) JPEG compressed image. Solid curves: histograms of DNT coefficients. Dashed curves: the Gaussian model fitted to
the histogram of DNT coefficients in the original image. Significant departures from the Gaussian model is observed in the distorted images (b), (c), and (d).

III. REDUCED-REFERENCE IMAGE QUALITY ASSESSMENT the dependencies between the original neighboring wavelet co-
efficients because of the involvement of the divisive normaliza-
A. DNT-Domain Statistics of Distorted Images tion process. We consider this as a major advantage of the pro-
The strong perceptual and statistical relevance of DNT posed approach (as compared to [9]) in capturing the joint sta-
image representation provides good justifications for the use tistics of wavelet coefficients while maintaining the simplicity
of DNT for RRIQA. In addition to that, we must also show of the algorithm. Moreover, the algorithm has a low data rate, as
that the statistics of DNT coefficients are sensitive to various only a small set of RR features are extracted from the reference
image distortions. To study this, we apply DNT to a set of image and are employed in quality evaluation of the distorted
images with different types of distortions and observe how image.
these distortions alter the statistics of the coefficients in DNT A convenient approach to measure the variations of the mar-
domain. This is demonstrated in Fig. 4, where the histogram of ginal probability distributions of the DNT coefficients between
the DNT coefficients of a wavelet subband can be well-fitted the original and distorted images (as being observed in Fig. 4)
with a Gaussian model [Fig. 4(a)]. However, when we draw is to compute the KLD between them
the same Gaussian model together with the histogram of the
DNT coefficients computed from Gaussian noise contaminated (4)
image [Fig. 4(b)], Gaussian blurred image [Fig. 4(c)], or JPEG
compressed image [Fig. 4(d)], significant changes are ob- where and are the probability density functions of
served. It is also interesting to see that the way the distribution the DNT coefficients in the same subband of the original and
changes varies with the distortion type. For example, Gaussian distorted images, respectively. To accomplish this, the DNT co-
noise contamination increases the width of the histogram, but efficient histograms of both the reference and distorted images
maintains the shape of Gaussian. By contrast, Gaussian blur must be available. The latter can be easily computed from the
reduces the width of the histogram and creates a much peakier distorted image, which is always available. The difficulty is in
distribution than Gaussian. These observations are important obtaining the DNT coefficient histogram of the original image.
because our RRIQA algorithm is based on quantifying the Using all the histogram bins as RR features would result in ei-
variations of DNT-domain image statistics as a measure of ther a heavy RR data rate (when the bin size is fine) or a poor
image quality degradation. approximation accuracy (when the bin size is coarse). To over-
come this problem, we make use of the important property that
B. Reduced-Reference Image Quality Assessment Algorithm the probability density function of the original DNT coef-
ficients can be well approximated with a zero-mean Gaussian
We propose an RRIQA algorithm by working with the mar-
model [as has been observed in Figs. 2(d) and 4(a)]
ginal distributions of DNT coefficients. Although this algorithm
still works with marginal distributions only (no explicit joint sta-
(5)
tistical model is employed, as in [9]), it does take into account

Authorized licensed use limited to: University of Waterloo. Downloaded on April 5, 2009 at 16:19 from IEEE Xplore. Restrictions apply.
LI AND WANG: REDUCED-REFERENCE IMAGE QUALITY ASSESSMENT 207

This model provides a very efficient means to summarize the probability density functions. In particular, two images with the
DNT coefficient histogram of the original image, such that only same KLD with respect to the original image may have different
one parameter is needed to describe it (as opposed to all the types of distortions, and visual quality assessment varies across
histogram bins). Furthermore, to account for the variations be- distortion types. Adding these features not only provides new
tween the model and the true distribution, we compute the KLD means to quantify the amount of distortions, but also supplies
between and as new information that helps the algorithm differentiate distortion
types. We have also carried out experiments to compare our IQA
(6) algorithm with and without these features, and we found that
adding these features lead to significant improvement in terms
and use it as an additional RR feature. This is computed for of the performance of image quality prediction. Since
each subband independently, resulting in two parameters ( and can be computed from the available distorted image and is al-
) for each subband. ready acquired when fitting the Gaussian model of (5), only two
In order to evaluate the quality of a distorted image, we es- new RR features, and , are added. Indeed, both of them are
timate the KLD between the probability density function close to zero because the probability distribution of DNT coef-
of the DNT coefficients computed from the distorted image and ficients of the original image is approximately Gaussian, which
the model estimated from the original image has zero skewness and kurtosis.
At each subband, we define the overall image distortion mea-
(7) sure as a linear combination of and in the
logarithmic domain

Combining this with the available RR feature , we ob-


tain an estimate of the KLD between and
(14)
(8)
where , and are weighting parameters. Finally, the
It can be easily shown that overall distortion of the distorted image is computed as the sum
of the distortion measures of all subbands
(9)
(15)

The estimation error can then be calculated as


C. Implementation Issues
(10)
To compute the DNT representation of an image, we first de-
compose the image using a steerable pyramid [29] with three
This error is small when and are close, which is true
scales and four orientations, as shown in Fig. 5. For each center
for typical natural images. With the additional cost of adding
coefficient at each subband, we define a DNT neighboring
one more RR parameter , (9) not only delivers a more
vector that contains 13 coefficients, including nine from the
accurate estimate of than (7), but also provides a useful
same subband (including the center coefficient itself), one from
feature that when there is no distortion between the original and
the parent band, and three from the same spatial location in the
distorted images (which implies that for all ),
other orientation bands at the same scale. An illustration is given
then both the targeted distortion measure and estimated
in Fig. 5. These coefficients are selected from the direct neigh-
distortion measure are exactly zero.
bors of the center coefficient because the magnitudes of clus-
In addition to , we also found the following measures
ters of wavelet coefficients tend to scale together [20] and thus
useful in improving the accuracy of image quality evaluation:
are more likely to share the same scale factor in the GSM
model described earlier. Increasing the size of the neighbor-
(11)
hood will increase the computational complexity of DNT cal-
(12) culation [specifically, the estimation of in (2)], but will not
(13) add extra RR features (because it only affects the DNT com-
putation and all other processes after DNT remain unaltered).
where , and are the standard deviation, the In our experiments, we did not observe significant variations of
kurtosis (the fourth-order central moment divided by the fourth the overall performance of the algorithm under slight changes of
power of the standard deviation and then minus 3), and the skew- the neighborhood, but more careful study on this issue remains
ness (the third-order central moment divided by the third power future work. After the DNT computation, four RR features are
of the standard deviation) of the DNT coefficients computed extracted from each subband of the original image, including
from the original and distorted images, respectively. These mea- and . This results in a total of 48 scalar RR fea-
sures provide further information about the shape changes of the tures for each original image.

Authorized licensed use limited to: University of Waterloo. Downloaded on April 5, 2009 at 16:19 from IEEE Xplore. Restrictions apply.
208 IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, VOL. 3, NO. 2, APRIL 2009

Fig. 5. Illustration of steerable pyramid decomposition and the selection of DNT neighbors. The neighboring coefficients include the 3 2 3 spatial neighbors
within the same subband, one parent neighboring coefficient and three orientation neighboring coefficients.

The evaluation of the KLD between probability density func- munications Laboratory at Cornell University. The LIVE data-
tions needs to be done numerically using histograms. For ex- base contains seven datasets of 982 subject-rated images cre-
ample, for (6), we compute ated from 29 original images with five types of distortions at
different distortion levels. The distortion types include 1) JP2:
JPEG2000 compression (2 sets), 2) JPG: JPEG compression (2
sets), 3) Noise: white noise contamination, 4) Blur: Gaussian
(16)
blur, and 5) FF: fast fading channel distortion of JPEG2000
compressed bitstream. The subjective test was carried out with
where and are the normalized heights of the th each of the seven data sets individually. A cross-comparison
set that mixes images from all distortion types is then used to
histogram bins, and is the number of bins in the histograms.
help align the subject scores across different data sets. The sub-
One problem with the subband quality measure of (14) is that
jective scores of all images are then adjusted according to this
when , or is close to zero, the measure becomes
alignment. The alignment process is rather crude. However, the
unstable. In our implementation, to avoid such instability, we
aligned subjective scores (all data) are still very useful refer-
compute ences, which are particularly important for testing general-pur-
pose IQA algorithms, for which cross-distortion comparisons
(17) are highly desirable. In the Cornell-VCL database, there are
totally 60 distorted images generated from three original im-
where is a positive constant. Another useful property of this ages. Six different types of distortions are included, which are
formulation is that the resulting distortion measure is always 1) FLT: quantization of the LH subbands of a five-level DWT
non-negative, and is zero when the original and distorted images of the image using the 9/7 filters, where the bands were quan-
tized via uniform scalar quantization with step sizes chosen such
are exactly the same.
that the RMS contrast of the distortions was equal, 2) NOZ:
Before applying the proposed algorithm for image quality as-
additive Gaussian white noise, 3) JPG: baseline JPEG com-
sessment, five parameters, , and , need to be learned
pression, 4) JP2: JPEG2000 compression using the 9/7 filters
from the data. It is important to cross-validate these parameters
and no visual frequency weighting; 5) DCQ: JPEG2000 com-
with different selections of the training and testing data. Details pression using the 9/7 filters with the dynamic contrast-based
will be given in the next section. For a given set of training im- quantization algorithm, which applies greater quantization to
ages and the associated subjective scores, we use the Matlab the fine spatial scales relative to the coarse scales in an attempt
nonlinear optimization routine fminsearch in the optimization to preserve global precedence, and 6) BLR: blurring by using a
toolbox to find the optimal parameters. Gaussian filter.
Three criteria are used to evaluate how well the objective
IV. VALIDATION scores predict the subjective scores: 1) Correlation coefficient
To validate the proposed RRIQA algorithm, two publicly- (CC) between the subjective/objective scores after a non-
accessible subject-rated image databases are used, which are linear mapping is computed to evaluate prediction accuracy,
the LIVE database [36] developed at Laboratory for Image and 2) Spearman rank-order correlation coefficient (ROCC) is
Video Engineering at The University of Texas at Austin and the calculated to evaluate prediction monotonicity, 3) Outlier ratio
Cornell-VCL A57 database [37] developed at the Visual Com- is used to evaluate prediction consistency, which is defined as

Authorized licensed use limited to: University of Waterloo. Downloaded on April 5, 2009 at 16:19 from IEEE Xplore. Restrictions apply.
LI AND WANG: REDUCED-REFERENCE IMAGE QUALITY ASSESSMENT 209

TABLE II
WAVELET AND DNT DOMAIN COMPARISON OF THE PROPOSED METHODS USING THE LIVE DATABASE

TABLE III
WAVELET AND DNT DOMAIN COMPARISON OF THE PROPOSED METHODS USING THE CORNELL-VCL DATABASE

TABLE IV
PERFORMANCE COMPARISON OF IQA ALGORITHMS USING THE LIVE DATABASE

the percentage of predictions outside the range of standard approach, and the training data and process, are exactly the
deviations between subjective scores. These criteria had been same. The test results on the LIVE database and the Cor-
used in the previous tests conducted by the video quality expert nell-VCL database are shown in Tables II and III, respectively,
group [38]. Since we do not have access to the raw subjective where the training data are the full LIVE database and the full
scores of the Cornell-VCL database, the standard deviations be- Cornell-VCL database, respectively. It can be concluded from
tween subjective scores for each test image cannot be computed. these tables that the overall performance is clearly improved
Therefore, only CC and ROCC comparisons are included for the from wavelet-domain to DNT-domain implementations.
Cornell-VCL database. The performance comparison with other IQA algorithms is
Our validation work has two major purposes. The first is to shown in Tables IV and V. To the best of our knowledge, the
verify that using DNT image representation is beneficiary for only other RRIQA algorithm that has a comparable low RR
the improvement of IQA algorithms. The second is to com- data rate and is designed for general-purpose is the method
pare the performance of the proposed method with existing IQA proposed in [9]. In addition to this method, we have also
algorithms. included peak signal-to-noise-ratio (PSNR), which is still the
To show the impact of DNT representation, we compare the most widely used full-reference IQA measure. Although such
performance of the proposed RRIQA algorithm implemented comparison is highly unfair to the proposed method and the
in the wavelet domain (linear steerable pyramid decomposition) method in [9] (PSNR requires full access to the original image,
and in the DNT domain (linear steerable pyramid decompo- as opposed to the 48 scalar features in the proposed method),
sition, followed by the nonlinear DNT process). Specifically, it provides a useful indication of the relative performance of
GGD is used to model the marginal distribution of wavelet the proposed algorithm. For any IQA algorithm that involves a
coefficients and Gaussian density is employed to model that of training process of the parameters, it is important to verify that
DNT coefficients. All other aspects of the algorithm, including the model is not overtrained. In other words, the performance
the standard deviation, skewness and kurtosis features, the of the algorithm should not change dramatically with different
KLD measure, the subband and overall quality measurement training data set. Therefore, in both Tables IV and V, we have

Authorized licensed use limited to: University of Waterloo. Downloaded on April 5, 2009 at 16:19 from IEEE Xplore. Restrictions apply.
210 IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, VOL. 3, NO. 2, APRIL 2009

TABLE V
PERFORMANCE COMPARISON OF IQA ALGORITHMS USING THE CORNELL-VCL DATABASE

included two versions of the proposed DNT-domain algorithm, “perfect-quality” original image, can we design image quality
where the only difference between them is that their model enhancement method that can correct or improve the quality
parameters ( , and ) are trained with the LIVE of the distorted image being evaluated? Finally, since the pro-
database or the Cornell-VCL database (using all images in posed RRIQA method is relevant to the quantification of the
both cases). Such a cross-validation process is useful to test the naturalness of images and does not use any knowledge about
robustness of the model. Not surprisingly, the test results are image distortion types, would it be possible to further develop
better when the parameters are trained with the same database it into a general-purpose no-reference image quality assessment
than the results obtained by cross-training the parameters (Note method?
that some image distortion types included in one database may
not be included in the other). However, in both cases and for REFERENCES
both databases, the proposed algorithm performs better than the [1] Z. Wang and A. C. Bovik, Modern Image Quality Assessment. San
method in [9]. In particular, it can be seen from both Tables IV Rafael, CA: Morgan & Claypool, Mar. 2006.
[2] H. R. Wu and M. Yuen, “A generalized block-edge impairment metric
and V that for the all-data cases, where all the images with for video coding,” IEEE Signal Process. Lett., vol. 4, no. 11, pp.
different distortion types are mixed together, the method in [9] 317–320, Nov. 1997.
does not perform well, and the improvement of the proposed [3] Z. Wang, A. C. Bovik, and B. L. Evans, “Blind measurement of
method is quite significant. Indeed, its CC and ROCC values blocking artifacts in images,” in Proc. IEEE Int. Conf. Image Process.,
Sep. 2000, vol. 3, pp. 981–984.
(for all-data cases) are comparable or even higher than the [4] Z. Yu, H. R. Wu, S. Winkler, and T. Chen, “Vision-model-based im-
full-reference PSNR measure. pairment metric to evaluate blocking artifact in digital video,” Proc.
IEEE, vol. 90, pp. 154–169, Jan. 2002.
[5] Z. Wang, H. R. Sheikh, and A. C. Bovik, “No-reference perceptual
V. CONCLUSION AND DISCUSSION quality assessment of JPEG compressed images,” in Proc. IEEE Int.
Conf. Image Process., Rochester, Sep. 2002, pp. 477–480.
We proposed an RRIQA algorithm using statistical features [6] P. Marziliano, F. Dufaux, S. Winkler, and T. Ebrahimi, “Perceptual
extracted from a divisive normalization-based image representa- blur and ringing metrics: Application to JPEG2000,” Signal Process.:
Image Commun., vol. 19, pp. 163–172, Feb. 2004.
tion. We demonstrate that such a DNT image representation has [7] H. R. Sheikh, Z. Wang, L. Cormack, and A. C. Bovik, “Blind quality as-
simultaneous perceptual and statistical relevance and its statis- sessment for JPEG2000 compressed images,” in Proc. IEEE Asilomar
tical properties are significantly changed under different types Conf. Signals, Syst., Comput., Nov. 2002, pp. 1403–1407.
[8] H. R. Sheikh, A. C. Bovik, and L. Cormack, “No-reference quality as-
of image distortions. These properties make it well-suited for the sessment using natural scene statistics: JPEG2000,” IEEE Trans. Image
development of RRIQA algorithms. Experimental verifications Process., vol. 14, no. 11, pp. 1918–1927, Nov. 2005.
with publicly-accessible subject-rated image databases suggest [9] Z. Wang, G. Wu, H. R. Sheikh, E. P. Simoncelli, E.-H. Yang, and A. C.
Bovik, “Quality-aware images,” IEEE Trans. Image Process., vol. 15,
that this new image representation leads to improved perfor- no. 6, pp. 1680–1689, Jun. 2006.
mance in the evaluation of image quality. The proposed algo- [10] S. Wolf and M. H. Pinson, “Spatio-temporal distortion metrics for
rithm has a relatively low data rate for RR features. It does not in-service quality monitoring of any digital video system,” Proc. SPIE,
vol. 3845, pp. 266–277, 1999.
make any assumption about the image distortion types, thus has [11] I. P. Gunawan and M. Ghanbari, “Reduced reference picture quality
the potential to be used for general-purpose in a wide range of estimation by using local harmonic amplitude information,” in Proc.
applications. London Commun. Symp., Sep. 2003, pp. 137–140.
[12] T. M. Kusuma and H.-J. Zepernick, “A reduced-reference perceptual
Several further questions may be asked from this work. First, quality metric for in-service image quality assessment,” in Proc. Joint
while the statistical features used in the proposed algorithm 1st Workshop Mobile Future and Symp. Trends Commun., Oct. 2003,
seem to be perceptually relevant and useful for IQA, is there pp. 71–74.
[13] S. Wolf and M. Pinson, “Low bandwidth reduced reference video
any better means to combine them into a single scalar quality quality monitoring system,” in Proc. Int. Workshop Video Process.
measure of the distorted image? Second, other than the vari- Quality Metrics for Consumer Electron., Scottsdale, AZ, Jan. 2005,
ance dependency that are well-captured by DNT, there are many CD-ROM.
[14] P. Le Callet, C. Viard-Gaudin, and D. Barba, “Continuous quality as-
other types of dependencies between neighboring wavelet coef- sessment of MPEG2 video with reduced reference,” in Proc. Int. Work-
ficients that are still missing, for example, local phase coher- shop Video Process. Quality Metrics for Consumer Electron., Scotts-
ence [39]. Is there any efficient way to incorporate these depen- dale, AZ, Jan. 2005.
[15] M. Carnec, P. Le Callet, and D. Barba, “An image quality assessment
dencies as well? Third, using the proposed RRIQA measure, method based on perception of structural information,” in Proc. IEEE
together with the statistical properties (RR features) about the Int. Conf. Image Process., Sep. 2003, vol. 3, pp. 185–188.

Authorized licensed use limited to: University of Waterloo. Downloaded on April 5, 2009 at 16:19 from IEEE Xplore. Restrictions apply.
LI AND WANG: REDUCED-REFERENCE IMAGE QUALITY ASSESSMENT 211

[16] M. Carnec, P. Le Callet, and D. Barba, “Visual features for image [35] A. B. Watson and J. A. Solomon, “Model of visual contrast gain control
quality assessment with reduced reference,” in Proc. IEEE Int. Conf. and pattern masking,” J. Opt. Soc. Amer., vol. 14, no. 9, pp. 2379–2391,
Image Process., Sep. 2005, vol. 1, pp. 421–424. 1997.
[17] M. J. Wainwright and E. P. Simoncelli, “Scale mixtures of gaussians [36] H. R. Sheikh, Z. Wang, A. C. Bovik, and L. K. Cormack, “Image and
and the statistics of natural images,” Adv. Neural Inf. Process. Syst., Video Quality Assessment Research at LIVE.” [Online]. Available:
vol. 12, pp. 855–861, 2000. http://live.ece. utexas.edu/research/quality/
[18] S. Lyu and E. P. Simoncelli, “Statistically and perceptually motivated [37] D. M. Chandler and S. S. Hemami, “VSNR: A wavelet-based visual
nonlinear image representation,” in Proc. SPIE Conf. Human Vision signal-to-noise ratio for natural images,” [Online]. Available: http://
Electron. Imaging XII, Jan. 2007, vol. 6492, pp. 649207–1–649207–15. foulard.ece.cornell.edu/dmc27/vsnr/vsnr.html.
[19] H. B. Barlow, , W. A. Rosenblith, Ed., “Possible principles underlying [38] P. Corriveau et al., “Video quality experts group: Current results and
the transformation of sensory messages,” in Sensory Commun.. Cam- future directions,” in Proc. SPIE Visual Commun. Image Process., Jun.
bridge, MA: MIT Press, 1961, pp. 217–234. 2000, vol. 4067, pp. 742–753.
[20] E. P. Simoncelli and B. Olshausen, “Natural image statistics and neural [39] Z. Wang and E. P. Simoncelli, “Local phase coherence and the percep-
representation,” Annu. Rev. Neurosci., vol. 24, pp. 1193–1216, May tion of blur,” in Adv. Neural Inf. Process. Syst. (NIPS03). Cambridge,
2001. MA: MIT Press, May 2004, vol. 16.
[21] O. Schwartz and E. P. Simoncelli, “Natural signal statistics and sensory
gain control,” Nature: Neurosci., vol. 4, pp. 819–825, Aug. 2001.
[22] D. J. Heeger, “Normalization of cell responses in cat striate cortex,”
Vis. Neural Sci., vol. 9, pp. 181–198, 1992. Qiang Li (S’06) received the B.S. and M.S. de-
[23] E. P. Simoncelli and D. J. Heeger, “A model of neuronal responses in grees from the Beijing Institute of Technology,
visual area MT,” Vis. Res., vol. 38, pp. 743–761, Mar. 1998. Beijing, China, in 2000 and 2003, respectively. He
[24] D. L. Ruderman, “The statistics of natural images,” Network: Comput. is currently pursuing the Ph.D. degree in electrical
Neural Syst., vol. 5, pp. 517–548, 1996. engineering at The University of Texas, Arlington.
[25] J. Malo, I. Epifanio, R. Navarro, and E. P. Simoncelli, “Non-linear His research interests include full-reference and
image representation for efficient perceptual coding,” IEEE Trans. reduced-reference quality assessment and statistical
Image Process., vol. 15, no. 1, pp. 68–80, Jan. 2006. models of the natural scene image and their applica-
[26] J. Portilla, V. Strela, M. J. Wainwright, and E. P. Simoncelli, “Image tion to image and video processing problems.
denoising using scale mixtures of Gaussians in the wavelet domain,” Mr. Li is a recipient of the IBM Student Paper
IEEE Trans. Image Process., vol. 12, no. 11, pp. 1338–1351, Nov. Award at the 2008 IEEE International Conference
2003. on Image Processing.
[27] J. Portilla and E. P. Simoncelli, “Image restoration using Gaussian
scale mixtures in the wavelet domain,” in Proc. IEEE Int. Conf. Image
Process., Barcelona, Spain, Sep. 2003, vol. 2, pp. 965–968.
[28] H. R. Sheikh and A. C. Bovik, “Image information and visual quality,” Zhou Wang (S’99–A’01–M’02) received the Ph.D.
IEEE Trans. Image Process., vol. 15, no. 2, pp. 430–444, Feb. 2006. degree from The University of Texas at Austin in
[29] E. P. Simoncelli, W. T. Freeman, E. H. Adelson, and D. J. Heeger, 2001.
“Shiftable multi-scale transforms,” IEEE Trans. Inf. Theory, vol. 38, He is currently an Assistant Professor in the De-
no. 2, pp. 587–607, Mar. 1992. partment of Electrical and Computer Engineering,
[30] S. G. Mallat, “Multifrequency channel decomposition of images and University of Waterloo, Waterloo, ON, Canada.
wavelet models,” IEEE Trans. Acoust., Speech, Signal Process., vol. Before that, he was an Assistant Professor in the
37, no. 12, pp. 2091–2110, Dec. 1989. Department of Electrical Engineering, The Univer-
[31] E. P. Simoncelli and E. H. Adelson, “Noise removal via Bayesian sity of Texas at Arlington, a Research Associate at
wavelet coring,” in Third Int. Conf. Image Process., Lausanne, Howard Hughes Medical Institute and New York
Switzerland, Sep. 1996, vol. I, pp. 379–382, IEEE Signal Process. University, and a Research Engineer at AutoQuant
Soc.. Imaging, Inc. His research interests include image processing, coding, com-
[32] T. M. Cover and J. A. Thomas, Elements of Information Theory. New munication, and quality assessment; computational vision and pattern analysis;
York: Wiley-Interscience, 1991. multimedia coding and communications; and biomedical signal processing.
[33] M. J. Wainwright, “Visual adaptation as optimal information transmis- He has more than 60 publications and one U.S. patent in these fields and is an
sion,” Vis. Res., vol. 39, pp. 3960–3974, 1999. author of Modern Image Quality Assessment (Morgan & Claypool, 2006).
[34] J. Foley, “Human luminance pattern mechanisms: Masking experi- Prof. Wang is an Associate Editor of IEEE SIGNAL PROCESSING LETTERS and
ments require a new model,” J. Opt. Soc. Amer., vol. 11, no. 6, pp. Pattern Recognition, and a Guest Editor of IEEE JOURNAL OF SELECTED TOPICS
1710–1719, 1994. IN SIGNAL PROCESSING: Special Issue on Visual Media Quality Assessment.

View publicationAuthorized
stats licensed use limited to: University of Waterloo. Downloaded on April 5, 2009 at 16:19 from IEEE Xplore. Restrictions apply.

You might also like