A Content-Aware Metric For Stitched Panoramic I - Supplement - 3
A Content-Aware Metric For Stitched Panoramic I - Supplement - 3
also introduce a stitched image quality assessment (SIQA) as MSE (Mean Squared Error) [23], PSNR ((Peak Signal-
dataset, which contains 408 groups of examples with per- to-Noise Ratio) [17], SSIM (Structural Similarity index) [6]
spective variations, which is made publicly available as part and VSI (Visual Saliency Induced index) [26]. These are
of the submission. powerful metrics in conventional image quality evaluations,
The paper is organized as follows. Section 2 discusses and can effectively grade images generated by global noise
previous works in stitched image quality assessment. Sec- addition or various encoding methods, but not designed for
tion 3 introduces our proposed metric. Experimental results the problem of SIQA.
are presented in section 4, and section 5 draws the conclu- Previous SIQA metrics. Much previous SIQA met-
sion. rics payed more attention to photometric error assessment
[10, 13, 22] rather than geometric errors. In [10] and [22],
2. Related Work geometric error assessment is omitted and the metrics fo-
Compared with the rapid evolution of stitching algo- cus on color correction and intensity consistency. [13] try
rithms in the last decade, previous literatures on SIQA to quantize the geometric error by computing the structure
seems insufficient and lagged. The recent applications of similarity index (SSIM) of high-frequency information of
the stitching technique have redirected its emphasis, with the stitched and unstitched image difference in the over-
the auto-adaptive cameras and freely-assembled rigs gen- lapping region. However, since unstitched images used
eralized, the imaging condition has largely been improved for test are directly cropped from the reference and have
and photometric errors introduced on the hardware-level be- no perspective variations, the effectiveness of the method
come less a concerning issue. Meanwhile, the demand for is unproven. In [5] an omni-directional camera system of
VR experience increases the demand for high quality, full- full-perspective is considered, but the work pays more at-
perspective panorama in super resolution. tention to assessing video consistency among subsequent
Stitching algorithm evaluations. For stitching algo- frames and only adopted a luminance-based metric around
rithms, ghosting and structure inconsistency artifacts that the seam. In [16], the gradient of the intensity difference
cause large perceived errors and visual discomfort are major between the stitched and reference image is adopted to as-
challenges [21, 3]. To evaluate how effective the algorithms sess the geometric error, however, the experiments are con-
are as to resolving such errors, many literatures choose to ducted on mere 6 stitched examples, and more experiments
directly compare the stitched images and judge perceptually are conducted on conventional IQA datasets, which avoids
[24, 14]. The illustration is straight-forward but subjective, the important and dwells on the trivial.
and in many cases the comparison is conducted on limited IQA-related datasets. The absence of an SIQA dataset
number of examples, which makes the evaluation less con- benchmark is another evidence of the problem being un-
vincing. Another way to evaluate stitching algorithm is to derstudied. Compared with the popularity of conventional
adopt classical IQA metrics to stitched images [1, 12] such IQA datasets like LIVE database[15] or JPEG 2000[9], the
2488
Figure 2. The proposed procedure for stitched image quality assessment.
situation for SIQA problem is obviously a drawback for the of local patches as Eq.(1):
development of stitching algorithms. Therefore, establish- ⎛ ⎞
P N2
ing a stitched image dataset of proper scale and formation
⎝ 1 2
is clearly a necessary move. Mlp = 2−1
|gi − μp | ⎠ (1)
p=1
N i=1
2489
saliency (VSI) method [26] is applied to each bounding- image. Here bins with large magnitude are considered an
box. VSI is an effective metric combining visual saliency, effective representation of structure, thus the structureness
edge similarity and chrominance consistency, which is in index ωstr is described as follows:
accordance with the desired measurement. Finally, we sum
B Btop
the index along the bounding-boxes to form the metric.
We rectify the geometric differences by warping the ωstr = Bmag + Bmag (5)
i=1 i=1
stitched image to the reference image using the calculated
LDOF field. The structured areas are located using the line where B is the number of bins, 30 bins are divided in our ex-
segment detector (LSD) [19] method, and a bounding-box periment and Btop is the number of bins with top magnitude
is imposed around each line with sufficient length. For all and in this paper we adopt 5 as Btop . The structureness in-
the bounding-boxes representing structured area, we sum dex is normalized between [0, 1] using the min-max method
the visual saliency score Sbbox to form the structure-guided and then further rectified. Fig. 3 illustrates typical examples
metric Ms is presented in Eq.(3): computing structureness. Finally, the content-aware adap-
tive metric is composed as Eq.(6):
B
Ms = Sbbox (3) M = ωstr · Ms + (1 − ωstr ) · Mg (6)
b=1
.
where B is the number of detected bounding boxes in each
stitched example.
4. Experimentation
Due to the diversity of content in stitched images, how
structured the content is should be considered. A scene with In this paper, we introduce a stitched image quality as-
unregulated textures like trees or clouds have quite different sessment dataset benchmark called SIQA dataset. Exten-
noticeable error types from a structured scenes with walls sive experiments are conducted on the SIQA dataset, in-
and furnitures. For instance, line breakage is a more notice- cluding the comparison between our proposed metric and
able error type on the edge of a desk than a flower, while classical IQA metrics, the validation of each metric compo-
ghosting is more salient on a flower represented as “dupli- nent, and the contrast between fixed-weight and content-
cation” of the flower. As a result, it is necessary to first aware adaptive combination mechanism. To analyze the
decide how structured a scene is before error quantification. combined metric and how each component takes effect,
As discussed earlier, the geometric error metric quan- we also studied the specific examples using each compo-
tifies the misalignment, and hence is suitable for texture nent solely. Results show the effectiveness of the pro-
distortions like ghosting. On the other hand, the structure- posed content-aware metric, achieving 94.36% precision
guided metric characterizes the shape and color inconsis- compared with the mean subjective opinion score (MOS).
tency. To combine them in a content-aware pattern, we de-
sign a metric that quantifies the “structureness” of a scene. 4.1. SIQA Dataset Benchmark
In our work, a more structured scene is assumed to contain The first version of our SIQA dataset is based on syn-
more long straight lines. The number, length and distribu- thetic virtual scenes, since we try to evaluate the proposed
tion of straight lines are integrated to form the structure- metric for various stitching algorithms under ideal photo-
ness index. If a scene is containing numerous long straight metric conditions. The images are obtained by establish-
lines, the mean length μl is supposed larger. On the other ing virtual scenes with the powerful 3D model tool—Unreal
hand, larger μl could also indicate a scene with few but Engine. A synthesized 12-head panoramic camera is placed
extra-long lines, thus it is also necessary to divide μl by at multiple locations of each scene, covering 360 degree sur-
the length variance σ. Lines are segmented using the LSD rounding view, and each camera has an FOV (field of view)
method and pooled into a 30-dimension histogram accord- of 90 degree. Exactly one image is taken for each of the 12
ing to their phase, thus the magnitude for each bin is com- cameras at one location simultaneously. Each camera view
puted by Eq.(4): is used as a full reference of the stitched view of its left and
right adjacent cameras, as demonstrated in Fig.4.
μl
Bmag = expLq /γ (4) SIQA dataset utilized twelve different 3D scenes vary-
σ+ ing from wild landscapes to structured scenes, two sets of
Q
stitched images are obtained using a popular off-the-shelf
where Q is the number of lines, and Lq is the length of q th stitching tool Nuke using different parameter settings, alto-
line within the bin. γ is the rectification parameter, used to gether 816 stitched samples, the original images are in high-
convert an unnormalized value to unit range; in this paper definition with 3k − by − 2k in size. Annotations from 28
we use one-tenth of the diagonal length for each stitched different viewers are integrated to decide on which one of
2490
Figure 4. The 12-head panoramic camera established in a vir-
tual scene using the Unreal Engine, and the formation of
stitched/reference image pairs for SIQA-dataset.
2491
Metric Precision with MOS RMSE
Quereshi et al.[13] 0.5343 0.6824
Solh et al.[16] 0.8554 0.3803
Proposed 0.9436 0.2374
2492
Figure 6. Examples of the two metric components complement each other. (a) is the example that geometric error metric score the stitched
image 1 higher, yet the local structure-guided metric score image 1 lower; (b) is the example which structure-guided metric score image 1
higher but the geometric error metric vice versa.
serve that in unstructured scenes like (a) when two stitched dataset, which we introduce as a dataset benchmark for
images have very similar structure, even similar distortions, SIQA problems. The large-scale dataset is laboriously con-
attention-based IQA metric fails while geometric error met- structed and is made publicly available for researchers in
ric successfully scored image 1 higher since the geometric the VR community for further research.
distance error between image 1 and reference is relatively
smaller. In structured scenes like (b) where diverse edge References
breakage and shape distortion exist, geometric error metric
fails to evaluate the differences while the structure-guided [1] E. Adel, M. Elmogy, and H. Elbakry. Image stitching
based on feature extraction techniques: a survey. Interna-
metric successfully captured the distorted areas, thus pro-
tional Journal of Computer Applications (0975-8887) Vol-
viding better decisions. Based on observation through such ume, 2014.
examples, the correctness of our previous conception that [2] M. Brown and D. G. Lowe. Automatic panoramic image
the two component complement each other shows. stitching using invariant features. International journal of
computer vision, 74(1):59–73, 2007.
5. Conclusion [3] C.-H. Chang, Y. Sato, and Y.-Y. Chuang. Shape-preserving
half-projective warps for image stitching. In Proceedings
We propose a quality assessment metric specifically de- of the IEEE Conference on Computer Vision and Pattern
signed for stitched images. We first analyze different er- Recognition, pages 3254–3261, 2014.
ror types typically encountered in image stitching, including [4] M. Harville, B. Culbertson, I. Sobel, D. Gelb, A. Fitzhugh,
how the errors are generated and rendered, and then arrive at and D. Tanguay. Practical methods for geometric and pho-
the most common visual distortions in SIQA—ghosting and tometric correction of tiled projector. In Computer Vision
and Pattern Recognition Workshop, 2006. CVPRW’06. Con-
structure inconsistency. To effectively characterize these
ference on, pages 5–5. IEEE, 2006.
distortion types, we propose to adaptively fuse a perceptive
[5] S. Leorin, L. Lucchese, and R. G. Cutler. Quality assessment
geometric error metric and a structure-guided metric.
of panorama video for videoconferencing applications. In
To capture perceptual ghosting which is mostly caused Multimedia Signal Processing, 2005 IEEE 7th Workshop on,
by geometric misalignment, we compute the local variance pages 1–4. IEEE, 2005.
of optical flow field energy between the distorted and refer- [6] L. Liu, H. Dong, H. Huang, and A. C. Bovik. No-reference
ence images, guided by detected saliency. For structure in- image quality assessment in curvelet domain. Signal Pro-
consistency, a powerful intensity and chrominance gradient cessing: Image Communication, 29(4):494–505, 2014.
index VSI is adopted and customized around the highly- [7] Y. Liu and B. Zhang. Photometric alignment for surround
structured areas of the stitched images. Based on under- view camera system. In Image Processing (ICIP), 2014
standing of the different purposes of these two metrics, we IEEE International Conference on, pages 1827–1831. IEEE,
propose to use a content-adaptive combination according to 2014.
the specific scene structure. Experimental results show the [8] S. Lu and C. L. Tan. Thresholding of badly illuminated doc-
ument images through photometric correction. In Proceed-
effectiveness of our proposed metric and confirm the cor-
ings of the 2007 ACM symposium on Document engineering,
rectness of the combination mechanism. The metric can be pages 3–8. ACM, 2007.
used to optimize various stitching algorithms. [9] A. K. Moorthy and A. C. Bovik. Blind image quality as-
Extensive experiments are conducted using our SIQA sessment: From natural scene statistics to perceptual quality.
2493
IEEE transactions on Image Processing, 20(12):3350–3364, [25] F. Zhang and F. Liu. Parallax-tolerant image stitching. In
2011. Proceedings of the IEEE Conference on Computer Vision
[10] P. Paalanen, J.-K. Kämäräinen, and H. Kälviäinen. Image and Pattern Recognition, pages 3262–3269, 2014.
based quantitative mosaic evaluation with artificial video. [26] L. Zhang, Y. Shen, and H. Li. Vsi: A visual saliency-induced
In Scandinavian Conference on Image Analysis, pages 470– index for perceptual image quality assessment. IEEE Trans-
479. Springer, 2009. actions on Image Processing, 23(10):4270–4281, 2014.
[11] F. Perazzi, A. Sorkine-Hornung, H. Zimmer, P. Kaufmann,
O. Wang, S. Watson, and M. Gross. Panoramic video from
unstructured camera arrays. In Computer Graphics Forum,
volume 34, pages 57–68. Wiley Online Library, 2015.
[12] Y. Qian, D. Liao, and J. Zhou. Manifold alignment based
color transfer for multiview image stitching. In Image Pro-
cessing (ICIP), 2013 20th IEEE International Conference
on, pages 1341–1345. IEEE, 2013.
[13] H. Qureshi, M. Khan, R. Hafiz, Y. Cho, and J. Cha. Quanti-
tative quality assessment of stitched panoramic images. IET
Image Processing, 6(9):1348–1358, 2012.
[14] C. Richardt, Y. Pritch, H. Zimmer, and A. Sorkine-Hornung.
Megastereo: Constructing high-resolution stereo panoramas.
In Proceedings of the IEEE Conference on Computer Vision
and Pattern Recognition, pages 1256–1263, 2013.
[15] H. R. Sheikh, Z. Wang, L. Cormack, and A. C. Bovik. Live
image quality assessment database release 2. 2005.
[16] M. Solh and G. AlRegib. Miqm: A novel multi-view images
quality measure. In Quality of Multimedia Experience, 2009.
QoMEx 2009. International Workshop on, pages 186–191.
IEEE, 2009.
[17] A. Tanchenko. Visual-psnr measure of image quality. Jour-
nal of Visual Communication and Image Representation,
25(5):874–878, 2014.
[18] W.-C. Tu, S. He, Q. Yang, and S.-Y. Chien. Real-time salient
object detection with a minimum spanning tree. In Proceed-
ings of the IEEE Conference on Computer Vision and Pattern
Recognition, pages 2334–2342, 2016.
[19] R. G. von Gioi, J. Jakubowicz, J.-M. Morel, and G. Ran-
dall. Lsd: A fast line segment detector with a false detection
control. IEEE transactions on pattern analysis and machine
intelligence, 32(4):722–732, 2010.
[20] P. Weinzaepfel, J. Revaud, Z. Harchaoui, and C. Schmid.
Deepflow: Large displacement optical flow with deep match-
ing. In Proceedings of the IEEE International Conference on
Computer Vision, pages 1385–1392, 2013.
[21] T. Xiang, G.-S. Xia, and L. Zhang. Image stitch-
ing with perspective-preserving warping. arXiv preprint
arXiv:1605.05019, 2016.
[22] W. Xu and J. Mulligan. Performance evaluation of color
correction approaches for automatic multi-view image and
video stitching. In Computer Vision and Pattern Recognition
(CVPR), 2010 IEEE Conference on, pages 263–270. IEEE,
2010.
[23] W. Xue, L. Zhang, X. Mou, and A. C. Bovik. Gradient mag-
nitude similarity deviation: A highly efficient perceptual im-
age quality index. IEEE Transactions on Image Processing,
23(2):684–695, 2014.
[24] J. Zaragoza, T.-J. Chin, M. S. Brown, and D. Suter. As-
projective-as-possible image stitching with moving dlt. In
Proceedings of the IEEE Conference on Computer Vision
and Pattern Recognition, pages 2339–2346, 2013.
2494