J. Vis. Commun. Image R.
25 (2014) 14931506
Contents lists available at ScienceDirect
J. Vis. Commun. Image R.
journal homepage: www.elsevier.com/locate/jvci
Engineering wireless broadband access to IPTV
Laith Al-Jobouri, Martin Fleury , Mohammed Ghanbari
University of Essex, Wivenhoe Park, Colchester CO4 3SQ, United Kingdom
a r t i c l e
i n f o
Article history:
Received 18 October 2013
Accepted 20 June 2014
Available online 30 June 2014
Keywords:
H.264/AVC
IPTV
Video codec
Video streaming
WiMAX
Application-layer FEC
Data-partitioning
Error resilience
a b s t r a c t
IPTV is now extending to wireless broadband access. If broadband video streaming is to achieve competitive quality the video stream itself must be carefully engineered to cope with challenging wireless channel conditions. This paper presents a scheme for doing this for H.264/AVC codec streaming across a
WiMAX link. Packetization is an effective tool to govern error rates and, in the paper, source-coded
data-partitioning serves to allocate smaller packets to more important data. A packetization strategy is
insufcient in itself, as temporal error propagation should also be addressed by insertion of intra-coded
data. It may be necessary to include redundant packets when channel conditions worsen. The whole
should be protected by application-layer rateless coding. Therefore, the contribution of the paper is a
complete scheme comprised of various protection measures aimed at robust IPTV streaming. Due to computational overheads, the scheme is aimed at the new generation of smartphones with GHz CPUs.
2014 Elsevier Inc. All rights reserved.
1. Introduction
Due to the exibility of network delivery, Internet Protocol
TeleVision (IPTV) is attractive as an alternative to digital TV over
terrestrial channels, though it may suffer from: delays due to congestion; and packet losses leading to uctuations in video quality.
It does represent an intrusion into personal privacy, as individual
viewing habits can be tracked, although advertisers may benet
from this facility. Whereas terrestrial TV is limited to a xed channel broadcast schedule, i.e. linear TV, in contrast IPTV can offer
pause-TV (when a partially-viewed program is cached for later
viewing), catch-up TV [1] and time-shifted TV (when a program
is re-aired in a different time zone), as well as varieties of Videoon-Demand (VoD). It also opens up the possibility of interactive
TV and hybrid TV (a combination of terrestrial broadcasting with
network delivery).
A crucial aspect of engineering IPTV is conguring the video
codec in order to achieve good performance over a wireless link
for exible access. In this paper, we demonstrate effective codec
conguration for wireless IPTV video streaming. In this way, the
paper also provides a reference for content providers to pre-encode
their video using a suitable codec conguration for broadband
wireless networks, especially WiMAX. The paper also shows that
multiple protection measures, as detailed in the paper, are necessary as a complete protection scheme for such an IPTV service
Corresponding author.
E-mail address: eum@essex.ac.uk (M. Fleury).
http://dx.doi.org/10.1016/j.jvcir.2014.06.013
1047-3203/ 2014 Elsevier Inc. All rights reserved.
and we show how our scheme achieves robust delivery of video.
As bursty error conditions present a major problem to compressed
video stream, the research reports performance over broadband
wireless access links experiencing this type of error pattern. That
problem arises [2] because isolated errors are less harmful than
the same number of errors appearing as a contiguous burst, as
video compression relies on prediction from prior data. The paper
will be of interest to those charged with deploying a broadband
wireless IPTV service, as it contains an assessment of the current
prospects for wireless IPTV.
In this paper, H.264/Advanced Video Codec (AVC) [3] standard
codecs are assumed, as though the High Efciency Video Codec
(HEVC) was standardised at the beginning of 2013, experience
shows that it will take many years to be deployed, if indeed it is
extensively used for streaming over wireless, given the absence
of error resilience and concealment features [4]. The absence of
error protection features may arise either because HEVC is
intended to be used with Dynamic Adaptive Streaming over HTTP
(DASH), a multi-streaming system for reliable TCP [5] or because
HEVCs considerable coding gain is achieved by abandoning Macroblocks (MBs) for tree-structured Coding Units. HEVC is suitable for
720p High Denition (HD) network delivery, where the up to 50%
lower bitrates over H.264/AVC compensate for the high frame
rates, at least 50 frame/s at that resolution. Indeed, HEVC delivery
chains have made quicker progress [6] than expected, though evaluation has shown that, without persistent HTTP connections,
streaming is subject to delays and interruptions when using DASH
[7].
1494
L. Al-Jobouri et al. / J. Vis. Commun. Image R. 25 (2014) 14931506
The underlying transport protocol for HTTP is TCP, which
retransmits packets when acknowledgments fail. This implies that
there is no need for the channel coding used in our paper. However, over a wireless channel, TCP has difculty distinguishing
between packet loss due to congestion and packet loss due to
transmission errors, implying that packets will be retransmitted
even if they may be again lost through a persistently poor wireless
channel. DASH compensates for TCPs behavior by allowing the client to control the bit-rate of the stream being downloaded (by
selection between a set of streams with differing bit-rates). Unfortunately, during periods of heavy network congestion, DASH
defaults to predominantly TCP control rather than client control.
Study of a Netix application [8] showed service interruptions of
around 300 s, which is larger than a typical Windows wired device
buffer of 240 s. The risk to display interruption on an Android
device is still greater, as the typical buffer size is only 30 s. From
the evidence of the performance studies referred to in [8], the performance of DASH has until recently been relatively poorly understood. The main reason [9] that commercial companies have
implemented such multi-stream systems is not delivered video
quality, which is difcult to monitor in a DASH-based system,
but the compatibility of DASH with off-the-shelf web servers and
existing content-delivery networks. The simpler UDP transport
system, which is the subject of this paper: reduces the risk of video
service interruption; does not rely on network over-provisioning
[9]; and as a result potentially represents a greener solution.
The authors robust scheme for streaming video over wireless
broadband access has been simulated for IEEE 802.16 (WiMAX)
systems [10]. WiMAX may have recently lost the technological
competition with Long Term Evolution (LTE) [11] in developed
Western countries but it is still being deployed in rural areas
[12] and in countries where 3G cellular phone coverage is poor,
including Africa and the Middle East. WiMAX is also attractive
[13] for backhauling from local IEEE 802.11 networks. Though
the authors scheme as a whole has been analysed in previous
works such as in [14,15], the individual codec settings have not
been analysed in isolation. In fact, it has been suggested to the rst
author that a very helpful service to the system developer community would be to analyze codec settings independently of each
other, which is what this paper sets out to do.
The authors scheme combines data-partitioning as a form of
error resilience [16] with the addition of Forward Error Correction
(FEC) using rateless channel coding [17] at the application layer,
together with retransmission of extra redundant data when
required. Retransmission is limited to one round to avoid accumulating latencies. The current authors have introduced into their
video streaming scheme various error resilience measures [18]
that exist in the literature: different rates and types of IntraRefresh (IR) Macroblocks (MBs) [19], Constrained Intra Prediction
(CIP) [20] and where necessary, redundant data-partitioned Network Abstraction Layer (NAL) units (codec level packets) [21].
The impact of each of these measures is analyzed in isolation from
each other, for Constant Bit Rate (CBR) as well as Variable Bit Rate
(VBR) streaming. CBR allows storage and bandwidth capacity to be
planned in advance, at a cost in video quality uctuations. VBR
enables greater compression efciency relative to CBR, which is
why it is generally used for disc storage. VBR can benet from
two or even three-pass encoders, which are unsuitable for live
video compression. The relative merits of CBR and VBR are further
discussed in [22]. In general, we have noticed that many researchers on similar topics have not explored the effect of video settings
on their schemes and the authors believe that it is important to
investigate this aspect of any protection scheme in order to understand the overall outcome of the scheme. In particular, it is important to determine how critical the video codec settings are to the
success of a protection scheme. Is the scheme robust to changes
or should certain settings be set within a given range?
The remainder of this paper is organized as follows. The following Section adds further information on how IPTV can be delivered
to the consumer, as well as supplying background technical context. Section 3 selects research from the literature that reveals
the motivation behind the codec-based approach to video streaming used in this paper. Particular attention is paid to video streaming over WiMAX in that section. Section 4 then outlines the
approach taken to protect IPTV streams. As the main focus of this
paper is the conguration of the scheme rather than the scheme
itself, the description is necessarily brief. Section 5 continues by
detailing how the video codec settings were modelled in order to
provide the evaluation that appears in Section 6. Section 7 summarizes the ndings and rounds off with some observations about
video streaming in this environment.
2. Context
Readers requiring further information on: IPTV, technical context of H.264/AVC data-partitioning, and/or Raptor codes can consult this Section.
2.1. IPTV delivery
There are two basic forms of IPTV: (1) That delivered over
closed or proprietary managed networks, including cable TV, often
to set-top boxes (STBs) that perform decoding and possibly decryption; and (2) That delivered over the unmanaged public Internet,
often displayed directly on desktop or laptop computer screens.
The latter may be called Web or Internet TV [23] or latterly
Over-the-Top (OTT) TV, as a way of distinguishing it from the service that the telecommunications companies hoped to challenge
terrestrial broadcasters with by means of a high-quality service
delivered over closed networks. Increasingly the industry trend,
apart from cable TV, is towards the latter model of IPTV, i.e. OTT
TV. There is no place for STBs when mobile IPTV [24] is delivered,
hence, our interest in OTT TV. The two forms of IPTV may differ in:
the picture quality of the stream offered (lower for unmanaged);
the amount of content available (higher for unmanaged); the formats that the video is compressed to (the need to update the STB
is an issue); the way content is secured (unmanaged delivery
may not be encrypted or employ selective encryption [25]
instead); and the frame resolutions presented (lower resolutions
for unmanaged delivery). A exible way to deliver IPTV is to setup (and tear-down) connections [26] using the Real-Time Streaming Protocol (RTSP) (over reliable TCP transport), with subsequent
data delivery using the Real-time Transport Protocol (RTP) over
UDP transport. The Real-time Control Protocol (RTCP) can also
allow end-to-end feedback to the IPTV server to control the bit
rate. RTSP also can support so-called trick mode functionality such
as rewind and pause.
To allow unmanaged delivery, OTT TV, to compete with Standard TV (SDTV) Quality-of-Experience (QoE), TV and video material
can be locally cached [27], reducing latency to the access link,
which in our paper is represented by a WiMAX base to subscriber
station link. Whether streaming over managed or unmanaged networks, packets will be lost at the access network. This is the case,
whether over broadband wireless or xDSL (Digital Subscriber Line),
as the latter also suffers from burst errors [28]. Prior to that,
whether managed or unmanaged networks are in use, video
streams are aggregated over high-capacity optical networks, such
as in Swisscoms Bluewin IPTV service [29]. Only Passive Optical
Networks (PONs) can reduce the error rates at the access link but
1495
L. Al-Jobouri et al. / J. Vis. Commun. Image R. 25 (2014) 14931506
currently PONs are not widely deployed. The approach adopted in
this paper is potentially applicable across a range of access link
types, as it provides protection against error bursts and causes only
moderate delays. If a multicast service is required, it can also be
readily adapted from its unicast form by dropping acknowledgments to the base station and increasing the level of FEC. This possibility was tested by us in [30] and, therefore, this paper connes
itself to unicast delivery.
Outer
coding
Redundant
nodes
LT
coding
+
2.2. H.264 codec
Features of the H.264 codec are used in this work. H.264/
Advanced Video Coding (AVC) standardization was initiated by
the Video Coding Experts Group (VCEG), which is a working group
of the International Telecommunication Union (ITU-T). The Joint
Video Team carried out the nal task of developing H.264/AVC
[3], a co-operative effort of both Moving Picture Expert Group
MPEG and VCEG in 2003. Fig. 1 shows the video frame structure.
A video-frame in H.264/AVC is divided into MBs [31], which are
the basic unit for motion prediction. Each MB can be further subdivided into blocks that are transform-coded in order to de-correlate the data. For transmission, a video frame can be split into
slices, each of which occupies a network packet. When data-partitioning is enabled, each slice can be further sub-divided in up to
three partitions, data-partitions-A, -B and -C, each of which can
occupy a separate network packet. It will be necessary to split a
network packet if its size exceeds the Maximum Transport Unit
(MTU) of the network.
2.3. Rateless channel coding
We selected Raptor codes [32] as our rateless code. Fig. 2 is a
simple diagram of how a Raptor encoder works. k represents
source symbols and m represents received symbols. A systematic
Raptor code is arrived at [32] by rst applying the inverse of the
inner code to the rst k symbols before the outer pre-coding step.
Systematic channel codes separate the redundant coding data from
the information data, allowing the information data to be passed
directly to the source code decoder without channel decoding if
no errors are detected (usually by means of checksums).
For a rateless code, even if there were no channel errors, there is
a very small probability that the decoding will fail (the probability
converges to zero in polynomial time in the number of input symbols), which failure can be modeled statistically by the scheme
introduced in [28]. In compensation, these codes are completely
exible, have a theoretical linear decode computational complexity, and generally their overhead is considerably reduced compared
to xed erasure codes. Furthermore, if the packets are pre-encoded
with an inner code, as in Raptor codes to achieve a systematic code,
a weakened Luby Transform (LT) [33], with reduced computational
complexity compared to the original LT [28], can be applied to the
symbols and their redundant symbols.
In [32], an implementation of Raptor code is reported decoding
at several gigabits per second on a 2.4 GHz Intel Xeon processor. By
Frame
Macro
block
DP-A
DP-B
DP-C
DP-A
DP-B
DP-C
DP-A
DP-B
Frame
slices
DP-C
Fig. 1. Video frame structure, DP-A = data-partition-A and similarly for DP-B and
DP-C.
Fig. 2. Raptor encoding scheme.
way of comparison, in 2013 smartphones generally had a four-core
processor running at around 1.5 GHz, though high-end smartphones had 64-bit processor clock speeds of around 2.2 GHz. As a
measure of its acceptability for mobile multimedia, Raptor coding
has been adopted for two wireless standards, Digital Video Broadcasting (DVB) for handheld devices (DVB-H) [34] and Third Generation Partnership Project (3GPP) Multimedia Broadcast/Multicast
Services (MBMSs) [35], though neither uses the codes for unicast
or accepts repair packet requests, as occurs in this paper. Importantly though, using Raptor code in these standards at the blocklevel rather than the byte-level, as herein, can signicantly impede
its throughput.
Notice that in these standards [34,35], application-layer Raptor
coding is applied at the packet-erasure correction level, while
physical-layer channel coding occurs at the bit-level. If physicallayer correction fails according to checksum detection [36] then a
packet is marked as an erasure to be corrected at the application
layer. However in our paper, upon physical-layer correction failure,
packets are not marked as erased but their data are passed to the
application layer for byte-level Raptor code correction. A related
procedure is proposed for mobile phones in [37], when, after physical-layer correction failure, partially decoded packets are passed
to the application layer for block-level correction.
In a Raptor code, a belief-propagation (BP) algorithm can be
employed to decode the inner LT code. A BP implementation in
[38] treated decoding as a soft decoding task rather than correction
of erasures. The alternative is a hardware implementation of Raptors inner code using Gaussian elimination. Gaussian elimination
has a total complexity of order O(nk2) to solve a linear equation
system with n output symbols and k unknowns (the original data
symbols), resulting in a linear per data symbol complexity of
O(nk). The outer code in [28] is selected as a Low-Density Paritycheck Code (LDPC), which is also an optional physical-layer code
in IEEE 802.11n, either using Gaussian elimination or in [39] a BP
algorithm in hardware. As Gaussian elimination is at the heart of
Raptor decoding, optimizations of the algorithm intended for hardware implementation on mobile phones [40] are available.
3. Related research
A number of research papers have considered some aspects of
the video streaming method used by us. In [41] a packetization
method was presented for robust H.264 video transmission over
an IEEE 802.11 wireless local area network (WLAN) congured as
home network. Video robustness was enhanced by using small
NAL units and by retrieving possible error-free IP packets from
the received MAC frame. An aggregation scheme with a recovery
mechanism was deployed and evaluated via simulation. For xed
physical-layer resources, the system provided a 2.5-dB gain in
video quality (PSNR) compared to making no NAL packetization
adjustments for similar throughput efciency. Equally an 80%
1496
L. Al-Jobouri et al. / J. Vis. Commun. Image R. 25 (2014) 14931506
improvement in throughput was achieved for a similar video quality. However, data partitioning as a way of varying NAL unit sizes
was not used. The work in [41] was tailored to IEEE 802.11 WLANs
using the Distributed Coordination Function (DCF) for negotiation
of access to the wireless channel. As a result, data throughput
was strongly inuenced by the data frame size. Broadband wireless
systems tend to employ centralized packet scheduling, typically
through Time Division Multiple Access (TDMA) with pre-congured frame sizes into which packets from each node (subscriber
station) are packed. Consequently, data partitioning for such systems is a more appropriate way of managing packetization issues.
Research in [42] used Forward Error Correction (FEC) and Automatic Repeat re-Quest (ARQ) to support streaming over WiMAX,
exploiting features of the WiMAX standard. In particular, channel
state information held at WiMAX stations served to dynamically
construct the MAC Protocol Data Unit (MPDU). The size of these
units was thus determined such that the packet dropping probability was minimized without compromising goodput. Simulation
results showed that the ARQ-enabled adaptive algorithm was
always better than the non-adaptive algorithm. The scheme in
[42] anticipated our work but no particular application was catered
for in [42] other than it should be a real-time streaming one and,
indeed, the authors of [42] were particularly concerned with voice
over IP (refer to [43]). Because our work is specialized for video
streaming it can apply aspects of video source coding to streaming.
We also utilize adaptive channel coding rather than the block
codes of [42]. While we utilize ARQ to request additional FEC data
according to channel conditions, ARQ in [42] was employed to
change the MPDU size, as the possibility of altering the channel
coding rate was not available to the authors of [42].
In [16] the researchers compared non-scalable video coding
with data partitioning using H.264/AVC under similar application
and channel constructions for conversational applications over
mobile channels. The experimental results showed that by using
the data-partitioning scheme the number of entirely lost frames
can be lowered and the probability of poor-quality decoded video
can be reduced. In the data-partitioning scheme of [16], differential
protection was achieved by selecting from a set of discrete channel
coding rates, through punctured convolutional codes. However, in
order to determine the protection level, an optimization procedure
became necessary to minimize potential distortion. This procedure
depended on the quantization parameter (QP) and the coding rate
for each partition. The wireless channel characteristics also had to
be known in advance by the encoder. However, leaving aside the
computational complexity of the optimization search in [16], there
is another key difference between the scheme of [16] and this
paper. In [16] no feedback occurred, so that it was not possible to
request additional redundant data. In fact, when using punctured
convolutional codes in [16] (rather than the rateless codes used
herein) it was not possible to generate additional redundant data.
Data-partitioning can be viewed as a simplied form of SNR or
quality layering. Extended quality layering can also be applied to
video streaming across WiMAX. In [44], adaptive multicast streaming was proposed for WiMAX using the Scalable Video Coding
(SVC) extension for H.264. WiMAX channel conditions were monitored in order to vary the bitrate accordingly. Unfortunately, the
subsequent decision of the JVT standardization body for H.264/
AVC not to support ne-grained scalability (FGS) implied it will
be harder to respond to channel volatility in the way proposed in
[44], which is one reason why we do not employ SVC and a number
of other issues surrounding SVC are mentioned in the rest of this
paragraph. Other work concerned with video streaming over
WiMAX links has also investigated: combining scalable video with
multi-connections in [45]; and comparing [46] H.264/SVC with
H.264/AVC for WiMAX. However, the data-dependencies between
layers in H.264/SVC medium-grained scalability remain a concern.
Unlike in FGS, enhancement layer packets may successfully arrive
but not be able to be reconstructed if key pictures also fail to arrive.
Besides, for commercial one-way streaming, simulcast is now
likely to be preferred to H.264/SVC for the reasons outlined in
[47]. In [47], it was found that the extra overhead from sending
an SVC stream compared to an H.264/AVC stream meant that the
cost of bandwidth consumption outweighed the reduced storage
cost of SVC once more than 64 sessions had occurred (assuming
16 simulcast streams or 16 video layers per session). In another
comparison [48], it was proposed that scalable video with unequal
error protection cannot provide any advantage over H.264/AVC
with equal error protection in a wireless environment, due to the
overhead of scalable video coding compared to that of single-layer
coding.
4. Outline of streaming system
This Section outlines an effective video streaming system for
WiMAX that provides error resilience through source-code data
partitioning [16] and which works without the need to apply privileged protection to the high-priority partitions. As already mention in Section 1, the main point of this paper is to draw
attention to the importance of correctly conguring the video
codec parameters. Therefore, this Section provides a basic outline
only and the interested reader is referred to the authors other
works, such as [14] and [15], for more details and variants of our
streaming system.
The H.264/AVC QP is set by us in such a way that temporallycoded texture data occupies a larger part of a frames data than
that occupied by data in each of the other two partitions. If this
texture data should be dropped, it can be replaced more easily
through error concealment at the decoder than data in the other
two partitions. Error concealment (backward error correction)
[49] is a non-normative feature [50] of H.26/AVC that nevertheless
is present in the H.264/AVC JM reference code (found at http://
iphome.hhi.de/suehring/tml/). The texture data in partition-C is
packetized in a WiMAX MAC Service Data Unit (MSDU), within a
MAC Protocol Data Unit (MPDU) [10,51].
In contrast to the authors counter-intuitive approach, the intuitive approach is to give special protection to the partition-A,
which for predictively-coded frames includes motion vectors as
well as other important parameters. Partition-B, bearing intracoded MBs, may also be given special protection, as it contains spatially-coded data whenever suitable references in other video
frames are not available. As an example of the Unequal Error Protection (UEP) approach, in [52] hierarchical modulation was
employed to favor those partitions with more important data for
the reconstruction of the video frame. For example, partition-A
motion vectors can be used for motion copy error concealment
when partition-C data are lost and, therefore, the intuitive
approach is to protect partition-A (possibly combined with partition-B). Readers should consider for themselves from the evaluation of this paper whether UEP measures are necessary.
Though no special protection is given, it is still necessary to protect the bit-stream (without privileging partitions A and B) against
the risk of packet loss. This was achieved through Equal Error Protection (EEP) of FEC provision. Application-layer rateless coding
[53] was selected for its exibility and its linear computational
complexity at both the decoder and the encoder. To avoid long
latencies, which would occur if packet-level FEC were to be
applied, redundant data were added to the packets themselves,
treating the bytes within each packet as the data symbols. Again
to reduce latency, just a single Automatic Repeat reQuest (ARQ)
was made if the available data were insufcient to reconstruct a
corrupted packet. Additional rateless redundant data were added
L. Al-Jobouri et al. / J. Vis. Commun. Image R. 25 (2014) 14931506
to the next available packet to be transmitted, according to the
amount calculated in the way specied by us elsewhere [14].
In an extension to the basic IPTV streaming scheme, we also
added redundant NAL unit packets. A discussion of alternative
redundant packet schemes is postponed to Section 6.4.
5. Simulation model
This Section details how we set about modeling the WiMAX
system for the evaluation of Section 6.
5.1. Simulation system
Fig. 3 shows the overall data ow of the simulation system. Raw
video (.YUV) is encoded by H.264/AVC into compressed form. The
compressed le is passed for later extraction of any dropped packets. At the same time, a video trace le is generated, which will
become an input into the ns-2 simulator [54]. The trace le contains the size of each video packet and the transmission schedule.
However, the trace le does not contain any video data. After simulation using the wireless channel model, the simulator outputs a
list of sent and dropped packets. This is used to lter from the original compressed data le any data that did not arrive at the destination. The le is subsequently decoded by the H.264/AVC
decoder. The decoder outputs a raw video le in YUV (YCbCr) format. This le would normally be displayed but, for the simulation,
it is compared with the original raw input video. This allows the
objective video quality to be calculated as Peak Signal-to-Noise
Ratio (PSNR), through a pixel-by-pixel comparison.
Two video clips with different source-coding characteristics
were employed in the tests in order to judge the dependency of
the results upon video source-coding complexity. The rst test
sequence was Paris, which is a studio scene with two upper body
images of presenters and moderate motion. The background is of
moderate to high spatial complexity leading to larger slices. The
other test sequence was Football, which has rapid movements
and consequently has high temporal coding complexity. Both
sequences were encoded at Common Intermediate Format (CIF)
(352 288 pixel/picture). CIF resolution was used for ready com-
Video file
.YUV
PSNR
calculation
JM 14.2
H.264/AVC
encoder
Reconstructed
YUV file
Generate
video trace
JM 14.2
H.264/AVC
decoder
Receiving
node
Sending
node
Network
channel model
Sent trace
packets
Received trace
packets
Fig. 3. Simulation system.
Filtered video
stream
Checking
dropped
packets
1497
parison with the prior work of others on video communication to
mobile devices.
5.2. Wireless channel model
The GilbertElliott channel mode employed in this work has
been used by researchers in the wireless eld [55,56] because of
its ability to model error burst patterns as experienced at the receiver. This channel model was introduced into the ns-2 event-driven
simulator [54]. The GilbertElliott channel model itself is a twostate Markov chain. It is based on: good and bad states; the probabilities of these states; and the probabilities of the transition
states between them. In the case of the bad state, losses happen
with higher probability while in the good state losses happen with
lower probability. PGG refers to the probability of being in the good
state and PGB is the probability of a transition from the good state
to the bad state. PBB is the probability of being in the bad state and
PBG refers to the probability of a transition from the bad to good
state. PGG (PBB) can be interpreted as the probability of remaining
in the good (bad) state, given that the previous state was good
(bad). Conversely, PGB represents the probability that given that
the previous state was good, a transition is made from the good
to the bad state. By the law of total probability, all probabilities
sum to one (certainty). Therefore, we have PGG + PGB = 1, resulting
in (1). A similar argument for the bad state leads to (2).
PGG 1 PGB
PBB 1 PBG
For the stochastic process to remain stationary in time,
pG pGB pB pBG
where pG and pB are the steady state probabilities of being in a good
or bad state respectively. Again by the law of total probability,
pB = 1 pG. Substituting this expression for pB into (3) easily leads
to:
pG
pBG
pBG pGB
Similarly, pG = 1 pB. Substituting this expression for pG into
(3) easily leads to:
pB
pGB
pBG pGB
The GilbertElliott model good and bad states have their own
error distributions that are independent of the process of arriving
or leaving those states, i.e. forming a Hidden Markov Model. Suppose the probability of packet loss is pG and pB in the good and
bad states respectively. Then the average packet loss rate produced
by a GilbertElliott channel is given in (6) by the usual expression
for the expectation of a probability distribution.
p pG pG pB pB
To model the effect of slow fading at the packet-level, as the GilbertElliott models parameters, the PGG was set to 0.96,
PBB = 0.95, PG = 0.01 and PB = 0.02. Additionally, it is still possible
for a packet not to be dropped in the channel but nonetheless to be
corrupted through the effect of fast fading (or other sources of
noise and interference). This byte-level corruption was modelled
by the second GilbertElliott model, with the same parameters
(applied at the byte level) as that of the packet-level model except
that PB (now probability of byte loss) was increased to 0.165. Effectively, this second model emulates fast fading between good and
bad conditions.
The slow fading parameters in the GilbertElliott model were
established empirically by a long series of trial simulations aimed
1498
L. Al-Jobouri et al. / J. Vis. Commun. Image R. 25 (2014) 14931506
at establishing packet loss rates (PLRs) that were known from
experience to be critical to the acceptable reconstruction of a video
stream by a decoder. Specically, PLRs below 5% can be tolerated
without protection measures, while above 10% it is difcult for
error resilience measures and/or channel coding to reconstruct a
video sequence. This observation can be generalized in the above
dual GilbertElliott model to achieve data loss rates in the range
510%. It is only realistic to specify a data loss range because of
the statistically nature of the models behaviour. In this paper,
we have presented results in the data loss range that was selected
as critical. It is perfectly possible for a link to be benign and no data
losses to occur but clearly one should engineer a streaming solution to cope with critical conditions. As the channel coding protection in this paper is adaptive, the problem of transmitting
redundant overhead in a benign channel is reduced. As extra
redundant data cannot be transmitted more than once, the possibility of attempting video repair when no amount of protection
will work is also avoided. Fig. 4 is an example from preliminary
tests in which the GilbertElliott PB parameter is systematically
varied while the other GilbertElliott parameters were set as
above. This allows the relationship to the PLR for four reference
video clips to be found. By observation, the relationship is approximately linear.
In [57] for an IEEE 802.15.4 wireless link, the authors take
extensive measurements of a bursty link to arrive at a metric for
burstiness, which is then found to be a generalisation of the burstiness in a GilbertElliott model. However, the GilbertElliott
model differed from ours in representing the hidden bad state to
result in the loss of all packets and likewise the good state to result
in all packets surviving. An interesting observation of the authors
of [57] is that burstiness tends to occur when a mobile device is
at the limits of the range of a radio link, so that small changes in
the physical channel cause the signal to drop below the receivers
sensitivity. In [56], the estimation of model parameters from channel measurements is also treated for wired Internet examples.
The physical (PHY) layer settings selected for WiMAX simulation on ns-2 [58] are given in Table 1. The antenna was modelled
for comparison purposes as a half-wavelength dipole. The Time
Division Duplex (TDD) frame length was set to 5 ms, as this is
the only value supported by the WiMAX Forum [59]. As mentioned
under modulation, the physical-layer FEC convolutional code rate
25
Packet lost %
20
15
10
Foreman
Car Phone
Highway
Paris
0.03
0.06
0.09 0.12
0.15
Parameter
Value
PHY
Frequency band
Bandwidth capacity
Duplexing mode
Frame length
Max. packet length
Raw data rate
IFFT size
Modulation
Guard band ratio
DL/UL ratio
Channel model
SS transmit power
BS transmit power
Approx. range to SS
Antenna type
Antenna gains
SS antenna height
BS antenna height
OFDM
5 GHz
10 MHz
TDD
5 ms
1024 B
10.67 Mbps
1024
16-QAM 1/2
1/8
3:1
GilbertElliott
245 mW
20 W
1 km
Omni-directional
0 dBD
1.2 m
30
PHY = Physical layer, IFFT = Inverse Fast Fourier Transform, DL/UL = Downlink/
Uplink, QAM = Quadrature Amplitude Modulation, SS = Subscriber Station,
BS = Base Station, TDD = Time Division Duplex, OFDM = Orthogonal Frequency
Division Multiplexing.
was . Video was transmitted over the downlink with UDP transport (see Section 2.1). In order to introduce sources of trafc congestion, an always available FTP (File Transfer Protocol) source
was modelled with TCP transport to a Subscriber Station (SS) from
the WiMAX Base Station (BS). Likewise a CBR source with packet
size of 1000 byte and inter-packet gap of 0.03 s was downloaded
to another SS.
6. Evaluation of codec settings
This Section evaluates in turn the codec congurations introduced in Section 1.
5.3. WiMAX simulation conguration
Table 1
IEEE 802.16 (WiMAX) parameter settings.
0.18
0.21
0.24 0.27
0.3
PB
Fig. 4. Example plot showing the relationship between GilbertElliott PB parameter
and packet loss rate for several CIF video sequences at 30 Hz, with data-partitioning
applied.
6.1. Intra-Refresh Macroblocks
As previously mentioned, in H.264/AVC data partitioning
[16,60], motion vectors (MVs) are packed into partition-A bearing
NAL units, allowing motion copy error concealment at the decoder
to partially reconstruct a picture, despite missing partition-C NAL
units (containing quantized transform coefcient residuals). Partition-B NAL units contain intra-coded (spatially encoded) MBs,
which are substituted for inter-coded MBs according to encoder
implementation (only the decoder input format is standardized
in H.264/AVC). Therefore, when Intra-Refresh (IR) MBs are
included alongside naturally intra-encoded MBs, partition-B slices
grow in size. This means that data-partitioned video compression
provides a convenient way to examine the effect of various
amounts of IR MB provision. Once the H.264/AVC encoder has
formed a NAL unit, it can also provide a RTP header prior to encapsulation by IP/UDP network protocol headers.
A point to note is the different way that random IR MBs are
specied in the H.264/AVC JM 14.2 implementation compared to
that of cyclic IR line intra update. In random IR MB, a maximum
percentage of IR can be specied, which percentage includes
already encoded IR MB. If the given quota of IR MB is already largely occupied by naturally encoded MBs (those encoded to cover
newly revealed objects or to improve the quality of the video up
to a given bit budget or for some other encoder-dependent reason),
then only a small amount of extra randomly inserted MBs will be
added. In contrast, if a line of IR MBs is inserted then these MBs
1499
L. Al-Jobouri et al. / J. Vis. Commun. Image R. 25 (2014) 14931506
are added in addition to those intra-coded MBs that have already
been included by the encoder, as shown in Fig. 5.
Football was VBR encoded with a Group-of-Pictures (GoP) structure of IPPP. . . at 30 frame/s. (Refer forward to Section 6.3 to see
the tests that led to choice of the more favourable IPPP. . . GoP
structure.) From Table 2, it is apparent that, as the percentage of
IR MBs is increased, the size in bytes of partition-B increases for
the same QP. Because more MBs are assigned to partition-B, the
size of partition-C reduces. And because of the large amount of naturally-encoded intra MBs, this effect is gradual until 25% of random
IR MBs are added. 25% of random IR MBs is shown in Table 2, as
that amount approximately corresponds to the total partition-B
size if cyclic line intra update is turned on instead (with approximately the same number of MBs).
Fig. 6 illustrates the range of metrics we extracted from our
codec conguration tests. However, it should be noted that, in
Fig. 6, increasing the provision of IR MBs from 2% to 5%, then to
6% and nally 25% (in the case of MB line intra update), increases
the throughput and, hence, the bandwidth requirements in respect
to co-existing trafc. The 25% IR commitment tested is large in
practice, due to the coding inefciency of spatial reference coding.
It results in larger packets and the size of packets is the most
important factor affecting the percentage of dropped packets, as
is also evident from the decrease in dropped packet percentages,
Fig. 6a, when the QP increases (and video quality decreases). From
Fig. 6a and c, higher-quality video (QP = 20, 25) benets from random insertion (and this effect is reversed for low-quality video).
From the objective video quality (PSNR) resulting, Fig. 6c, it can
be seen that reducing the IR MB percentage to 2% actually
improves the PSNR at QPs 30 and 35. From this one can conclude
that there is little if anything to be gained by raising the IR percentage. This is not surprising, as at lower QPs the encoder has a greater
bit budget and, hence, can include more naturally-encoded intra
MBs. In addition, the main effect of reducing the percentage of IR
MBs is that the size of partition-B-bearing packets is reduced. In
turn, this makes these packets less likely to be affected by channel
conditions, especially burst errors arising from fast fading. Consequently, more partition-B packets survive intact, again causing a
relative gain in video quality.
Packet end-to-end delay, Fig. 6d, is the mean delay of those
packets unaffected by channel conditions. The results show that
this is generally small in duration, though with a tendency to
increase due to the propagation delay of larger packets at lower
Table 2
Mean size of different partitions in bytes for Football at various QPs.
QP
2% Intra-refresh MB
5% Intra-refresh MB
6% Intra refresh MB
20
25
30
35
1842
1687
1459
1117
2678
1697
1047
572
3889
2533
1496
688
1845
1690
1463
1120
2767
1763
1082
595
3867
2511
1482
682
1846
1696
1467
1123
2810
1793
1098
604
3850
2502
1479
681
QP
20
25
30
35
25% Intra-refresh MB
MB line intra update
1893
1746
1505
1146
3450
2216
1346
729
3669
2379
1405
646
1885
3683
1498
1143
3385
2160
1312
716
3683
2400
1414
652
QPs. Signicantly, a lower percentage of IR MBs, in addition to better quality video in many cases, actually results in lower delay.
Turning to corrupted packets, there are larger percentages of
corrupted packets at higher QPs, Fig. 6b. These are packets that
have not been repaired completely by the adaptive channel coding
scheme. Because of the additional transmission time, the mean
end-to-end delay of corrupted packets is higher than other packets,
Fig. 6e. In fact, it is the extent of the delay that is the main contribution of corrupted packets to the quality-of-service. However,
though the percentage of corrupted packets increases at higher
QPs, the packet sizes decrease due to the reduced coding efciency.
Smaller packets take less time to re-transmit compensating for the
increase in the number of corrupted packets.
6.2. Constrained intra prediction setting
In order to decode partition-B and -C, the decoder must know
the location from which each MB was predicted, which implies
that partitions B and C cannot be reconstructed if partition-A is
lost. Though partition-A is independent of partitions B and C, CIP
should be set in the codec conguration [20] to make partition-B
independent of partition-C. (Though reference [20] refers to a proposal to add constrained inter prediction to H.264/AVC, it also
describes constrained intra-prediction, which is already a part of
the codec standard.) By setting this option, partition-B MBs are
no longer predicted from neighboring inter-coded MBs. This is
because the prediction residuals from neighboring inter-coded
Fig. 5. Differences between random intra-refresh MBs (upper frames with 6%) and MB cyclic line intra update (lower frames).
1500
L. Al-Jobouri et al. / J. Vis. Commun. Image R. 25 (2014) 14931506
30
2% Intra
refresh MB
5% Intra
refresh MB
6% Intra
refresh MB
MB Line Intra
Update
24
18
12
6
Corrupted packets %
Dropped packets %
30
2% Intra
refresh MB
18
5% Intra
refresh MB
12
6% Intra
refresh MB
MB line Intra
Update
6
0
0
25
30
20
35
30
QP
(a)
(b)
40
32
2% Intra
refresh MB
5% Intra
refresh MB
6% Intra
refresh MB
MB Line
Intra Update
24
16
8
0
25
30
35
35
0.12
2% Intra
refresh MB
0.09
5% Intra
refresh MB
0.06
6% Intra
refresh MB
MB Line Intra
Update
0.03
0
20
25
30
QP
QP
(c)
(d)
Corrupted packets mean delay (s)
20
25
QP
Packet mean end-to-end delay (s)
20
Mean PSNR (dB)
24
35
0.15
0.12
2% Intra
refresh MB
0.09
5% Intra
refresh MB
0.06
6% Intra
refresh MB
MB Line
Intra Update
0.03
0
20
25
30
35
QP
(e)
Fig. 6. Mean performance metrics for Football with 2%, 5%, 6% IR MBs and MB line intra update.
MBs reside in partition-C and cannot be accessed by the decoder if
a partition-C packet is lost. There is a by-product of increasing
packet sizes due to a reduction in compression efciency but the
increase in size may be justied in error-prone environments.
The two video clips (Paris and Football) were VBR encoded at
CIF, with a GOP structure of IPPP (for choice of GoP structure again
refer forward to Section 6.3) . . . at 30 Hz and with 5% IR MBs
(choosing a lower percentage as a result of the tests in Section 6.1).
Table 3 presents NAL unit sizes for the Paris sequence with and
without CIP, showing the increase in partition-B and -C sizes,
which results from the loss in encoding efciency. Notice that
any NAL units that are above the maximum packet size of 1024
B in Table 3 are later constrained by the encoder to the MTU when
forming an RTP packet prior to encapsulation by network headers.
The reason for this precaution is to avoid these larger NAL units
being segmented at the link layer, avoiding the possible separation
of header information from NAL data.
At lower QPs, i.e. higher-quality video, the relative size of partition-C NAL units means that the more important partition-A and -B
packets are less likely to suffer channel error. However, the larger
packet sizes mean that congestion may have more of an impact,
because longer packets take longer to transmit and free the
Table 3
Paris video sequence: Mean NAL unit size in bytes according to partition type and
video quality.
QP
20
25
30
35
Without CIP (B)
With CIP (B)
740
601
473
331
495
336
210
124
4097
2058
838
281
738
601
473
330
504
356
238
149
4154
2093
857
291
channel. At higher QPs, the advantage of differential packet sizes
is lost but the generally smaller packet sizes compensate to some
extent. There is also a small growth in partition-B and -C packet
size when CIP is turned on.
From Table 4 for Football one sees that though the relative ranking of sizes between the partition types is similar, the actual sizes
are larger than those for Paris (see Table 3). The larger sizes are due
to the temporal coding complexity of Football. For high QP, the relatively larger size of partition-A NAL units compared to the other
partitions NAL units may create a problem, as it does not result
1501
L. Al-Jobouri et al. / J. Vis. Commun. Image R. 25 (2014) 14931506
Table 4
Football video sequence: Mean NAL unit size in bytes according to partition type and
video quality.
Without CIP (B)
20
25
30
35
With CIP (B)
1845
1690
1463
1120
2767
1763
1082
595
3867
2511
1482
682
1845
1681
1431
1092
2870
1845
1080
494
4326
2873
1754
925
Corrupted packets %
QP
35
Paris with
CIP
28
Paris
without
CIP
Football
with CIP
21
14
Football
without
CIP
0
20
25
30
35
QP
Fig. 8. Paris and Football video sequences: Protection scheme corrupted packets,
with and without CIP.
30
25
Mean PSNR (dB)
in a relatively reduced risk of loss of packets bearing partition-A
NAL units. Also of concern is the number of NAL units that are
above the maximum packet size, causing more than one packet
to be sent.
Fig. 7 shows the effect of the various schemes on packet drops
when streaming Paris and Football. The Fig. assesses the effect of
turning on CIP. As turning on CIP can increase packet sizes, due
to less efcient source coding, there is a risk of more channel
errors. From Fig. 7, the larger packet drop rates at QP = 20 for Football will have a signicant effect on the video quality. However, the
packet size changes with and without CIP have little effect on the
packet drop rate in the case of the less active Paris. In fact, the pattern of channel errors can actually result in a decrease in packet
drops when Paris is streamed with CIP turned on.
Fig. 8 shows the numbers of corrupted packets arising from
simulated fast fading. Nevertheless, the effect of the corrupted
packets on video quality only occurs if a packet cannot be reconstructed after application of the adaptive retransmission scheme.
That is if the packet bearing the repair data is itself corrupted or
dropped, then the original corrupted packet cannot be reconstructed. Fig. 8 also shows that turning CIP on generally results
in more corrupt packets and, hence, additional delay due to
retransmissions.
Examining Fig. 9 for the resulting PSNR, one can see that the
video quality is generally below 31 dB, and, hence, would probably
be ranked as fair (not good) according to the ITU P.800s recommendation, originally intended for subjective testing but used in an
objective as guide to quality in papers such as [61]. This video quality is similar in general terms to the video quality reported in [61],
though for multicast streaming without feedback. Nevertheless,
one observes from Fig. 9 that, in all cases, the inclusion of CIP in
our scheme results in improved objective video quality at all QPs
tested and both for Paris and the more active Football sequence.
Therefore, CIP is to be recommended.
As previously remarked, the impact of corrupted packets, given
the inclusion of retransmitted additional redundant data, is largely
Paris with
CIP
20
15
Paris without
CIP
10
Football
with CIP
Football
without CIP
5
0
20
25
30
35
QP
Fig. 9. Paris and Football video sequences: Protection scheme video quality (PSNR),
with and without CIP.
seen in additional delay. There is an approximate doubling in per
packet delay between the total end-to-end delay for normal packet
transmission, Fig. 10, and that of corrupted packets, Fig. 11. Nevertheless, the delays remain in the tens of millisecond range, except
for when QP = 20, i.e. broadcast quality video, when large packet
sizes imply longer transmission times. This type of delay range is
acceptable even for interactive applications but may create unacceptable latency if it forms part of a longer network path.
6.3. Group of Pictures structure
Attention should also be given to the Group of Pictures (GoP)
structure. For the purposes of comparison, we tested two different
Paris with
CIP
Paris without
CIP
Football with
CIP
Football
without CIP
0
20
25
30
35
QP
Fig. 7. Paris and Football sequences: Protection scheme packet drops, with and
without CIP.
Packet mean end-to-end delay (s)
Dropped packets %
10
0.015
0.012
Paris with
CIP
0.009
Paris
without CIP
0.006
Football
with CIP
0.003
Football
without CIP
0
20
25
30
35
QP
Fig. 10. Paris and Football video sequences: Protection scheme mean end-to-end
packet delay, with and without CIP.
1502
L. Al-Jobouri et al. / J. Vis. Commun. Image R. 25 (2014) 14931506
Table 5
Mean P-picture size (bytes) according to QP for two different GoP structures.
0.02
0.015
Paris with
CIP
QP
Paris
without CIP
20
25
30
35
Football
with CIP
0.01
Football
Paris
IPPP. . .
IBBP. . .
IPPP. . .
IBBP. . .
8905
6301
4185
2444
15,520
10,311
6381
3756
5102
2824
1398
647
7590
4249
2238
1100
Football
without CIP
0.005
0
20
25
30
35
QP
Fig. 11. Paris and Football video sequences: Protection scheme mean delay of
corrupted packets, with and without CIP.
GoP structures: IPPP. . . (i.e. one initial I-picture and all P) and
IBBPBBP. . . (i.e. insertion of bi-predictive B-pictures for greater
coding efciency but still with one initial I-picture). Before starting
those tests, we examined the effect of the two different GoP structure types on the sizes of video frames. One potential impact of the
GoP structure is that B-pictures increase coding efciency at a cost
in coding complexity due to the need to reference at least two
other pictures. Unfortunately, with the inclusion of B-pictures,
the mean size of P-pictures increases, as a result of the increased
reference distance between P-pictures. For example, for QP = 20,
the IBBPBBP. . . mean P-picture size is as high as 15 kB, which
may well result in a series of large packets.
The tests were again performed on Paris and Football. Both
sequences were VBR encoded video at 30 Hz, CIF, with 2% IR MBs
randomly inserted. Recall from Section 6.1 that inserting 2% of IR
MBs gave the best video quality and the lowest delay, which is
why this percentage of IR MBs was selected for these tests. CIP
was congured and the two different GoP types (IPPP. . . and
IBBP. . .) generated the video traces.
The effect of adding B-pictures is evident in Fig. 12, in which
many more packets are dropped for Football at QP = 20. Dropped
packets includes buffer overow and outright channel drops.
Repeated simulations of the IBBP. . . conguration of Football for
QP = 20 conrmed the packet drop spike at this setting. The effect
derives from the larger P-pictures that are the result of inserting Bpictures into the GoPs, when P-pictures are already large because
of the lower QP setting. The result is a series of large P-picturebearing packets. As previously mentioned, from Table 5, the Foot-
ball mean P-picture size at QP = 20 for IBBP. . . is almost twice the
size of that when an IPPP. . . GoP structure is congured and similarly twice the size of P-pictures for either of the Paris GoP congurations at QP = 20 in Table 5. The IBBP. . . GoP structure
particularly impacts the more temporally complex sequence,
resulting in large B-partition packets. A series of large packets
results in more transmission errors and more packet drops if the
send buffer becomes saturated. The general reduction in packet
drop numbers in Fig. 12 for decreasing QP, results in an increase
in video quality in Fig. 13. Moreover, an IPPP. . . GoP structure is
preferable except at QP = 35. However, at QP = 35 all GoP structures result in a PSNR of over 25 dB.
Corrupted packet levels are generally high, Fig. 14, but in
respect to GoP structure it seems that an IPPP. . . structure may
be more favorable for temporally complex content (Football) and
vice versa for less active sequences. As was previously remarked,
the main consequence of higher levels of packet corruption, after
the application of the proposed adaptive scheme, is in greater
delay for a greater percentage of packets, Fig. 15. Apart from the
anomaly at QP = 20 for Football with an IBBP. . . GoP structure, as
35
28
Mean PSNR (dB)
Corrupted packets mean delay (s)
0.025
Football
IPPP...
21
Football
IBBP...
14
Paris
IPPP...
Paris
IBBP...
0
20
25
QP
30
35
Fig. 13. Video quality (PSNR) for two different GoP structures and content.
20
Football
IPPP...
Football
IBBP...
Paris
IPPP...
Paris
IBBP...
Dropped packets %
16
12
Corrupted packets %
35
28
Football
IPPP...
21
Football
IBBP...
14
Paris
IPPP...
Paris
IBBP...
0
20
25
30
35
20
25
QP
30
35
QP
Fig. 12. Dropped packets for differing GoP structure and content.
Fig. 14. Percentage of corrupted packets for two different GoP structures and
content.
1503
L. Al-Jobouri et al. / J. Vis. Commun. Image R. 25 (2014) 14931506
Corrupted packets mean delay (s)
0.08
Football
IPPP...
0.06
Football
IBBP...
0.04
Paris
IPPP...
Paris
IBBP...
0.02
0
20
25
30
35
QP
Packets Mean end-to-end delay (s)
Fig. 15. Corrupted packet mean delay for two different GoP structures and content.
0.06
Football
IPPP...
0.045
Football
IBBP...
Paris
IPPP...
0.03
Paris
IBBP...
0.015
0
20
25
30
35
QP
Fig. 16. Mean end-to-end packet delay for differing GoP structure and content.
a result of many more large packets each resulting in more transmission delay (refer back to the discussion of Fig. 12 and Table 5),
corrupted packet delay is approximately twice that of normal
packet end-to-end delay, Fig. 16, reecting the single retransmission of extra redundant that is permitted. Again for Football,
QP = 20 the mean packet delay is much higher in Fig. 16 as a result
of many more large P-picture packets each with more transmission
delay when using an IBBP. . . GoP structure. It was also conrmed
that for a moderate increase in mean packet size, making partition-B completely independent of partition-C through CIP resulted
in a small (a few dB) improvement in video quality, whenever the
QP setting allowed sufcient packets to be delivered.
and data-partitioned slices co-exist in the Extended prole, they
are not jointly implemented in the JM implementation of H.264/
AVC [65] and, in fact, appear to not to be implemented at all in
most other software codec implementations such as QuickTime,
Nero, and LEAD randomly to name a few. However, when employing data-partitioning, it is also possible through repeated runs of
the encoder to create an additional stream of all partition-A slice
packets or an additional stream consisting of partition-A and partition-B packets or, indeed, a complete duplicate version of the original stream.
To test the use of duplicate packets, the same two video clips
(Paris and Football) with different source coding characteristics
were employed in the tests to judge content dependency. Both
sequences were VBR encoded, with a GoP structure of IPPP. . . at
30 Hz. 5% IR MB data were added to each picture, increasing the
size of partition-B packets (refer to Table 2). Again a low percentage of IR MBs was added as a result of the gains in video quality
experienced in Section 6.1s tests from avoiding higher percentages
and, as before, the favorable IPPP. . . GoP structure was selected as a
result of the tests in Section 6.3.
The duplicate NAL units do not amount to a change in bitrate
because the packets are simply replicated. However, the end-toend packet delay will obviously increase because of the interleaving of the duplicate slice packets. In the redundant/duplicate NAL
unit scheme, retransmission of extra redundant data was scheduled for all corrupted packets, even if two packets duplicated each
other. This is because it is not possible to know in advance whether
the extra redundant data will arrive for any one of the two packets.
This provision has a signicant effect in improving the video quality at higher QPs. The reason is that retransmitting extra redundant
data by two alternative means increases the chance that a packet
can be reconstructed.
Fig. 17 shows the effect of the various schemes on packet drops
when streaming Paris. Data-partition in the Figure legend refers to
sending no redundant/duplicate packets. Redundant X refers to
sending duplicate redundant packets containing data-partitions
of partition type(s) X in addition to the rateless coded data-partition packets. From Fig. 17, the larger packet drop rates at
QP = 20, due to the larger packets, will have a signicant effect
on the video quality.
Fig. 18 shows the pattern of corrupted packet losses arising
from simulated fast fading. There is actually an increase in the percentage of packets corrupted if a completely redundant/duplicate
stream is sent (partitions A, B, and C), though this percentage is
taken from corrupted original and redundant/duplicate packets.
However, the effect of the corrupted packets on video quality only
occurs if a packet cannot be reconstructed after application of the
adaptive retransmission scheme.
6.4. Redundant NAL units
5
Dropped packets %
To improve video quality it is possible to provide redundant
NAL unit packets. There is a variety of ways of providing redundant
packets.
It is possible to use redundant picture slices [62] or duplicate
picture slices [63]. Redundant picture slices employ a higher QP
and, hence, coarser quantization than the original slices. Those
interested in investigating the use of redundant slices further
should notice that methods to rene the selection of redundant
slices [64] have also been designed. However, in our scheme using
redundant picture slices will cause additional drift between the
encoder and the decoder, if (say) a partition-A packet was matched
with a partition-C packets data with a different QP. Therefore, we
tested the inclusion of duplicate picture slices/data partitions using
the same QP as the original slice.
Even so, there is an implementation issue when employing
duplicate picture slices. Though both redundant/duplicate slices
Data
partition
Redundant
A
Redundant
A,B
Redundant
A,B,C
0
20
25
30
35
QP
Fig. 17. Paris sequence protection schemes packet drops.
L. Al-Jobouri et al. / J. Vis. Commun. Image R. 25 (2014) 14931506
Corrupted Packets %
30
24
Data
partition
18
Redundant
A
12
Redundant
A,B
Redundant
A,B,C
0
20
25
30
Corrupted packet mean delay (s)
1504
0.15
Data
partition
0.12
Redundant
A
0.09
Redundant
A,B
0.06
Redundant
A,B,C
0.03
0
35
20
QP
25
30
35
QP
Fig. 18. Paris sequence protection schemes corrupted packets.
Fig. 20. Paris sequence protection schemes mean delay for corrupted packets.
28
Data
partition
21
Redundant
A
14
Redundant
A,B
Redundant
A,B,C
0
20
25
30
Packet mean end-to-end delay (s)
Mean PSNR (dB)
35
0.12
Data
partition
Redundant
A
Redundant
A,B
Redundant
A,B,C
0.09
0.06
0.03
35
20
25
QP
Fig. 19. Paris sequence protection schemes video quality.
Examining Fig. 19 for the resulting video quality (PSNR), one
sees that data partitioning with FEC protection, when used without
redundant/duplicate packets, is insufcient to bring the video
quality to above 31 dB, that is to a good quality. However, it is
important to notice that sending duplicate redundant partition-A
packets alone (without redundant packets from other partitions)
is also insufcient to raise the video quality to a good rating
(above 31 dB). Therefore, to raise the video quality to a good level
requires not only the application of the adaptive rateless channel
coding scheme but also the sending of duplicate data streams.
The impact of corrupted packets, given the inclusion of retransmitted additional redundant data, is again largely seen in additional delay. As before, there is an approximate doubling in per
packet delay between the total end-to-end delay for corrupted
packets Fig. 20 and normal packet end-to-end delay, Fig. 21. It
must be recalled, though, that for the redundant schemes there is
up to twice the number of packets being sent. Therefore, delay is
approximately further doubled, still though with end-to-end
delays remaining in the tens of millisecond range. As previously
noted, this type of delay range is acceptable even for interactive
applications, but may contribute to additional delay if the WiMAX
link forms part of a longer network path.
The experimental results for Football are summarized in Table 6.
Table 6 shows how packet drops and losses are reected in video
quality. Very large numbers of packets are dropped at QP = 20
because of the larger packet sizes. However, there is a threshold
effect, as the numbers of dropped packets decline quickly with
increasing QP (as packet sizes reduce). The protection pattern for
redundant/duplicate packets is accentuated compared to Paris, in
the sense that providing duplicate redundant versions of more
than just partition-A packets is now clearly seen to be preferable.
Given that in Quality-of-Experience subjective tests for mobile
30
35
QP
Fig. 21. Paris sequence protection schemes mean end-to-end delay for normal
packets.
Table 6
Football redundant/duplicate NAL unit results.
QP
Data-partition
Redundant A, B
Redundant A, B, C
2.69
2.50
1.44
1.44
13.55
1.38
0.46
0.38
23.12
4.88
0.12
0.00
Corrupted packets (%)
20
30.76
25
30.64
30
27.56
35
21.92
31.37
30.79
30.79
16.55
22.72
30.27
27.42
24.73
14.51
25.11
26.97
22.35
PSNR
20
25
30
35
23.65
26.47
29.45
27.65
23.11
27.58
29.16
27.59
19.65
33.92
34.50
30.73
Dropped packets (%)
20
6.92
25
4.23
30
3.97
35
1.66
(dB)
19.98
20.96
21.16
23.27
Redundant A
devices [66], news scenes rather than sport are preferred by viewers, it may be advisable to favor content without rapid motion,
especially if small footballs or similar sports balls need to be
tracked by the viewer.
6.5. Discussion
As is normal in a research environment and in papers of this
nature, we have made a set of simulations of IPTV video streaming
rather than a testbed. The disadvantage of a testbed, whatever its
merits in terms of verisimilitude is a lack of exibility. Therefore,
L. Al-Jobouri et al. / J. Vis. Commun. Image R. 25 (2014) 14931506
in an ideal world both types of tests should be conducted. In the
absence of a testbed then detailed simulation can be indicative of
the streaming performance if not a conclusive demonstration. Both
types of test environments may not anticipate all the physical conditions that may arise such as distances from the base station,
types of channel fading, and multi-path effects. It should be
noticed though that the GilbertElliott channel model used in the
tests is not a physical model of channel conditions but a model
of the burst errors experienced at the receiver. In other words, it
models the error patterns rather than the physical conditions that
given rise to error conditions. This distinction may explain the popularity of this type of channel model for studies of video communication over wireless networks.
Certain aspects of the proposed scheme such as data-partitioning, CIP, IR MB percentage, and GoP structure can be congured at
encoding time, allowing pre-storage of TV programs within server
banks. Unfortunately, the wireless environment is notoriously volatile, especially if mobile stations are present. It is for this reason
that we include the rateless channel coding element that allows
the video server, herein collocated at the WiMAX base station, to
adapt the rateless channel coding rate to adapt to channel conditions. Compared to (say) dropping frames the adaptive channel coding solution is more ne-grained, as it occurs at the packet level.
Consequently, the user experience is less likely to be disrupted.
Other adaptive solutions are possible that take advantage of
exibilities within the hardware. For example, the authors have
themselves experimented with a form of WiMAX adaptive modulation in [67]. The disadvantage of such approaches is that they are
technology dependent in the sense that the rmware within a base
station may need to be modied, sometimes through cross-layer
intervention. A software adaptive approach such as the one using
adaptive rateless coding in this paper can more easily be deployed
and modied for differing broadband wireless technologies.
7. Conclusion
Both FEC and data-partitioning of IPTV video streams are a way
of providing graceful quality degradation in a form that will work
in good and difcult wireless channel conditions. This paper
showed that video conguration also affect the video quality,
dropped packets and delay. In that respect, it was shown that it
is better to include a small percentage of IR MBs that can build
their effect over time than employ the cyclic IR line update scheme.
Packet size, which is determined by content, video quality, and GoP
structure is an important determinant of packet drops. The use of
equal error protection is a way of taking advantage of the natural
packet size differential, which is in inverse order of the priority
of the data partitions. Thus, smaller packet lengths already confer
a lower risk of channel error. However, the inverse size order of
data partitions (larger partition-A and -B) was seen to occur when
smaller QPs were chosen. It was also conrmed that for a moderate
increase in mean packet size, making partition-B completely independent of partition-C resulted in a small but signicant improvement in video quality.
An interesting observation is that there is a need to reduce
packet size to reduce packet loss, despite the combined effect of
redundant packets and application adaptive channel coding. This
is because during bursty error conditions (as was simulated by
the GilbertElliott channel model) it is possible that both the original packet and its redundant counterpart may be dropped or corrupted. However, this effect is dependent on choice of QP, as a
low QP can lead to high packet drop rates with poor video quality.
In general, in poor channel conditions with both slow and fast fading, it is not sufcient to employ just application-layer FEC unless
stream replication also takes place.
1505
The evaluation tests performed in this paper employed two reference video sequences from a number of such sequences that
video coding experts provide as a test of codec implementations.
Another valid approach, particularly when developing a specic
product is to select from video actually requested by members of
the public. For example, video requests can be selected from a real
trace le and used to select popular videos that can subsequently
be used in tests. In fact, this approach represents future work for
this research. Future work will also conrm the validity of the
wireless channel model by streaming video over a live WiMAX
link.
Acknowledgments
Laith Al-Jobouri, who has recently completed his doctorate at
the University of Essex, UK, would like to thank his nancial sponsors at the Ministry of Science and Technology, Baghdad, Iraq.
References
[1] D. de Veeschauwer, Z. Avramova, S. Wittevrongel, H. Bruneel, Transport
capacity for a catch-up television service, in: Proc. of 7th European Conf. on
Interactive Television, 2009, pp. 161170.
[2] Y.J. Liang, J.G. Apostolopoulos, B. Girod, Analysis of packet loss for compressed
video: effect of burst losses and correlation between error frames, IEEE Trans.
Circuits Syst. Video Technol. 18 (7) (2008) 861874.
[3] T. Wiegand, G.J. Sullivan, G. Bjontegaard, A. Luthra, Overview of the H.264/AVC
video coding standard, IEEE Trans. Circuits Syst. Video Technol. 13 (7) (2003)
560576.
[4] T. Schierl, M.M. Hannuksela, Y.-K. Wang, S. Wenger, System layer integration of
high efciency video coding, IEEE Trans. Circuits Syst. Video Technol. 22 (12)
(2012) 18711884.
[5] I. Sodagar, The MPEG-DASH standard for multimedia streaming over the
Internet, IEEE Multimedia 18 (4) (2011) 6267.
[6] S. Pecheron, Is HEVC ready for prime time?, Comms, Eng., Design (CED) Mag.
39 (9) (2013) 2224.
[7] S. Lederer, S. Mller, C. Timmerer, Dynamic adaptive streaming over HTTP
dataset, in: Proc. of ACM Utimedia Conf., 2012.
[8] J. Martin, F.Y. Fu, N. Wourms, T. Shaw, Characterizing Netix bandwidth
consumption, in: Proc. IEEE Consumer and Commun. Conf., 2013, pp. 230235.
[9] M. Ryu, H. Kim, U. Ramachandran, Why are state-of-the-art ash-based multitiered storage systems performing poorly for HTTP video streaming? in: Proc.
ACM Workshop on Network and Operating Systems Support for Digital Audio
and Video, 2012.
[10] J.G. Andrews, A. Ghosh, R. Muhammed, Fundamentals of WiMAX:
Understanding Broadband Wireless Networking, Prentice Hall, Upper Saddle
River, NJ, 2007.
[11] E. Dahlman, S. Parkvall, J. Skld, P. Beming, 3G Evolution, second ed., Academic
Press, Amsterdam, 2008.
[12] O.I. Hillestad, A. Perkis, V. Genc, S. Murphy, J. Murphy, Delivery of on-demand
video services in rural areas via IEEE 802.16 broadband wireless access
networks, in: Proc. of 2nd ACM Workshop on Wireless Multimedia, 2006, pp.
4352.
[13] D. Niyato, E. Hossain, Integration of WiMAX and WiFi: optimal pricing for
bandwidth sharing, IEEE Commun. Mag. 45 (5) (2007) 14146.
[14] L. Al-Jobouri, M. Fleury, M. Ghanbari, Protecting H.264/AVC data-partitioned
video streams over broadband WiMAX, Adv. Multimedia [online journal]
(2012) 11 pages (article ID 129517).
[15] L. Al-Jobouri, M. Fleury, M. Ghanbari, Intra-refresh provision for datapartitioned H.264 video streaming over WiMAX, Consumer Electron. Times 2
(3) (2013) 137145.
[16] T. Stockhammer, M. Bystrom, H.264/AVC data partitioning for mobile video
communication, in: Proc. of IEEE Int. Conf. on Image Processing, Singapore,
2004, pp. 545548.
[17] D.J.C. MacKay, Fountain codes, IEE Proc.: Commun. 152 (6) (2005) 10621068.
[18] T. Stockhammer, W. Zia, Error-resilient coding and decoding strategies for
video communication, in: P.A. Chou, M. van der Schaar (Eds.), Multimedia in IP
and Wireless Networks, Academic Press, Burlington, MA, 2007, pp. 1358.
[19] I.A. Ali, S. Moiron, M. Fleury, M. Ghanbari, Packet prioritization for H.264/AVC
video with cyclic intra-refresh line, J. Vis. Commun. Image Represent. 24 (4)
(2013) 486498.
[20] Y. Dhondt, S. Mys, K. Vermeirsch, R. Van de Walle, Constrained inter
prediction: removing dependencies between different data partitions, in:
Proc. of Advanced Concepts for Intelligent Visual Systems, Delft, 2007, pp.
720731.
[21] S. Wenger, H.264/AVC over IP, IEEE Trans. Circuits Syst. Video Technol. 13 (7)
(2003) 645655.
[22] B. Waggoner, Compression for Great Video and Audio: Master Tips and
Common Sense, Focal Press, Burlington, MA, 2010.
[23] W. Simpson, Video Over IP, second ed., Focal Press, Burlington, MA, 2008.
1506
L. Al-Jobouri et al. / J. Vis. Commun. Image R. 25 (2014) 14931506
[24] S. Park, S-H. Jeong, Mobile IPTV: approaches, challenges, standards and QoS
support, IEEE Internet Comput. 13 (3) (2009) 2331.
[25] M.N. Asghar, M. Ghanbari, M. Fleury, M. Reed, Efcient selective encryption
with H.264/SVC CABAC bin-strings, in: Proc. of IEEE Int. Conf. on Image
Processing, 2012, pp. 26452648.
[26] B. Bing, 3D and HD Broadband Video Networking, Artech House, Norwood, MA,
2010.
[27] M. Verhoeyen, D. de Vleeschlauwer, D. Robinson, Public IP network
infrastructure evolutions to support emerging digital video services, Bell
Labs Tech. J. 14 (2) (2009) 3955.
[28] M. Luby, T. Stockhammer, M. Watson, Application layer FEC in IPTV services,
IEEE Commun. Mag. 45 (5) (2008) 95101.
[29] J.-N. Hwang, Multimedia Networking, Cambridge University Press, Cambridge,
UK, 2009.
[30] L. Al-Jobouri, M. Fleury, M. Ghanbari, Multicast and unicast video streaming
with rateless channel-coding over wireless broadband, in: Proc. IEEE Int. Conf.
on Consumer Communications and Networking, 2012, pp. 737741.
[31] I.E. Richardson, The H.264 Advanced Video Compression Standard, second ed.,
Wiley & Sons, Chichester, UK, 2010.
[32] A. Shokorallahi, Raptor codes, IEEE Trans. Information Theory 52 (6) (2006)
25512567.
[33] M. Luby, LT codes, in: Proc. of the 34rd Annual IEEE Symp. on Foundations of
Computer Science, 2002, pp. 271280.
[34] Digital Video Broadcasting (DVB); IP Datacast over DVB-H: Content Delivery
Protocols, ETSI Technical Specication, Rev. V1.2.1, 2006.
[35] 3GPP TS 26.346, Technical Specication Group Services and System Aspects;
Multimedia Broadcast/Multicast Service (MBMS); Protocols and Codecs, 3GPP
Technical Specication, Rev. V7.4.1, June 2007.
[36] M. Luby, T. Gasiba, T. Stockhammer, M. Watson, Reliable multimedia
download delivery in cellular broadcast networks, IEEE Trans. Broadcast. 53
(1) (2007) 235246.
[37] T. Gasiba, T. Stockhammer, J. Afzal, W. Xu, System design and advanced
receiver techniques for MBMS broadcast services, in: Proc. IEEE Int. Conf.
Commun., 2006, pp. 54445450.
[38] H. Jenkac, T. Mayer, T. Stockhammer, W. Xu, Soft decoding of LT-codes for
wireless broadcast, in: Proc. IST Mobile Summit, 2005.
[39] M. Peyic, H.A. Baba, I. Hamzaoglu, M. Keskinoz, Low power IEEE 802.11n LDPC
decoder hardware, Microprocess. Microsyst.: Embedd. Hardware Des. 36 (3)
(2012) 159166.
[40] T. Mladenov, S. Nooshabadi, K. Kim, Implementation and evaluation of Raptor
codes on embedded systems, IEEE Trans. Comput. 60 (12) (2011) 16781691.
[41] P. Ferr, A. Doufexi, J. Chung-How, A.R. Nix, D.R. Bull, Robust video
transmission over wireless LANs, IEEE Trans. Vehicular Technol. 57 (4)
(2008) 25962602.
[42] M. Chatterjee, S. Sengupta, S. Ganauly, Feedback-based real-time streaming
over WiMAX, IEEE Wireless Commun. 14 (1) (2007) 6471.
[43] M. Chatterjee, S. Sengupta, VoIP over WiMAX, in: S. Ahson, M. Ilyas (Eds.),
WiMAX, Applications, CRC Press, Boca Raton, FL, 2008. pp. 5577.
[44] O. Hillested, A. Perkis, V. Genc, S. Murphy, J. Murphy, Adaptive H.264/MPEG-4
SVC video over IEEE 802.16 broadband wireless access networks, in: Proc. of
the International Packet Video Workshop, 2007, pp. 2635.
[45] H.-H. Juan, H.-C. Huang, C.Y. Huang, T. Chiang, Cross-layer mobile wireless
MAC designs for the H.264/AVC scalable video coding, Wireless Netw. 16 (1)
(2008) 113123.
[46] J. Casampere, P. Sanshez, T. Villameriel, J. Del Ser, Performance evaluation of
H.264/MPEG-4 scalable video coding over IEEE 802.16e networks, in: Proc. of
[47]
[48]
[49]
[50]
[51]
[52]
[53]
[54]
[55]
[56]
[57]
[58]
[59]
[60]
[61]
[62]
[63]
[64]
[65]
[66]
[67]
IEEE Int. Symp. on Broadband Multimedia Systems and Broadcasting, 2009, pp.
16.
H. Kalva, V. Adzic, B. Furht, Comparing MPEG AVC and SVC for adaptive HTTP
streaming, in: Proc. of IEEE Int. Conf. on Consumer Electronics, 2012, pp. 160
161.
T. Stockhammer, Is ne granular scalable video coding benecial for wireless
video applications? in: Proc. of IEEE Int. Conf. on Multimedia and Expo, vol. 1,
2003, pp. 193196.
M. Fleury, S. Moiron, M. Ghanbari, Innovations in video error resilience and
concealment, Recent Patents Signal Process. 1 (2) (2011) 111.
V. Varsa, M. Hannuksela, Y.-K. Wang, Non-normative error concealment
algorithms, in: Proc. 14th Meeting of ITU-T VCEG, doc.VCEG-N62, 2001.
M. Fleury, R. Rouzbeh, S. Saleh, L. Al-Jobouri, M. Ghanbari, Enabling WiMAX
video streaming, in: U.D. Dalal, Y.P. Kosta (Eds.), WiMAX, New Developments,
Vukavar, Croatia, 2009, pp. 213238, ISBN: 978-953-7619-53-4.
B. Barmada, M.M. Ghandi, E.V. Jones, M. Ghanbari, Combined Turbo coding and
hierarchical QAM for unequal error protection of H.264 coded video, Signal
Process.: Image Commun. 21 (5) (2006) 390395.
D.J.C. MacKay, Fountain codes, IEE Proc.: Commun. 152 (6) (2005)
10621068.
T. Issariwakul, E. Hossain, An Introduction to the ns-2 Simulator, second ed.,
Springer Verlag, Berlin, Germany, 2012.
J. Ebert, A. Willing, A GilbertElliott Bit Error Model and its Efcient Use in
Packet Level Simulation, Technical University Berlin, Telecommunication
Networks Group, TKN Technical Report Series, TKN-99-002, 1999.
G. Halinger, O. Hohlfeld, The GilbertElliott model for packet loss in real time
services on the Internet, in: Proc. of 14th GI/ITG Conf. on Measurement,
Modelling, and Evaluation of Computer and Commun. Systs., 2008, pp. 269
283.
K. Srinivasan, M.A. Kazandjieva, S. Agarwal, P. Levis, The b-factor: measuring
wireless link burstiness, in: Proc. 6th ACM Conf. on Embedded Network Sensor
Systems, 2008, pp. 2942.
F.C.D. Tsai et al., The design and implementation of WiMAX module for ns-2
simulator, in: Proc. Workshop on ns2, article no. 5, 2006.
C. So-In, R. Jain, A.-K. Tamini, Capacity evaluation of IEEE 802.16e WiMAX, J.
Comput. Syst., Networks, Commun. (2010) 12 pages [online].
S. Mys, P. Lambert, W. De Neve, SNR scalability in H.264/AVC using data
partitioning, in: Proc. Pacic Rim Conf. in Multimedia, 2006, pp. 329338.
G. Muntean, P. Perry, L. Murphy, A new adaptive multimedia streaming system
for all-IP multi-service networks, IEEE Trans. Broadcast. 50 (1) (2004) 110.
I. Radulovic, P. Frossard, Y.K. Wang, M.H. Hannuksela, A. Hallapuro, Multiple
description H.264 video coding with redundant pictures, IEEE Trans. Circuits
Syst. Video Technol. 20 (1) (2010) 144148.
S.H.G. Chan, X. Zheng, Q. Zhang, W.-W. Zhu, Y.-Q. Zhang, Video loss recovery
with FEC and stream replication, IEEE Trans. Multimedia 8 (2) (2006)
370381.
P. Ferr, D. Agraotis, D. Bull, A video error resilience redundant slices
algorithm and its performance relative to other xed redundancy schemes,
Signal Process.: Image Commun. 25 (3) (2010) 163178.
A.M. Tourapis, K. Shring, G. Sullivan, H.264/14496-10 AVC reference software
manual, in: Proc. 31st Meeting of the Joint Video Team, London, 2009.
F. Agboma, A. Liotta, Addressing user expectations in mobile content delivery,
Mobile Information Syst. 3 (34) (2007) 153164.
R. Razavi, B. Tanoh, M. Fleury, M. Ghanbari, Unequal protection using adaptive
burst prole selection for WiMAX video streaming, Electron. Lett. 44 (4) (2008)
867868.