Overview_of_H.264.pdf

1
Overview of the H.264/AVC
Overview of the H.264/AVC
Video Coding Standard
Video Coding Standard
報告人: 吳志峰
指導教授:楊竹星教授
網際網路系統實驗室

2
Discrete Cosine Transform
Discrete Cosine Transform

3
Introduction
Introduction
• In 1998, the Video Coding Experts Group
(VCEG) issued a call for proposal on a project
called H.26L, with the target to double the
coding efficiency.
• In 2001, VCEG and MPEG formed a Joint Video
Team (JVT)
• Multiple names for this “Advanced Video
Coding”
– H.264 by ITU
– MPEG 4 Part 10 by ISO

4
Introduction
Introduction
• It is aimed at very low bit rate, real-time,
low end-to-end delay, and mobile
applications such as conversational
services and Internet video
• Enhanced visual quality at very low bit
rates and particularly at rate below 24kb/s

5
Applications for AVC/H.264
Applications for AVC/H.264
ƒ Entertainment Video
ƒ Conversational services
ƒ Video on demand or Streaming Services
ƒ Multimedia messaging services (MMS)
ƒ Over ISDN, DSL, cable, LAN, wireless and
mobile network

6
Pre-Processing Encoding
Source
Destination
Post-Processing
& Error Recovery
Decoding
Scope of Standard
• Only the Syntax and Decoder are standardized:
– Permits optimization beyond the obvious
– Permits complexity reduction for implementability
The
The Scope
Scope of Picture and Video Coding
of Picture and Video Coding
Standardization
Standardization

7
Structure of H.264/AVC video coder
Structure of H.264/AVC video coder
• VCL: designed to efficiently represent the video content
• NAL: formats the VCL representation of the video and
provide head information for conveyance by a variety of
transport layers or storage media

8
H.264 Improved Prediction Method
H.264 Improved Prediction Method
• Variable block-sized motion compensation with
small block size
• Quarter-sample-accurate motion compensation
• Motion vectors over picture boundaries
• Multiple reference picture motion compensation
• Weighted prediction
• Directional spatial prediction for intra coding
• In-loop deblocking filter

9
H.264 Enhanced Design
H.264 Enhanced Design
• Small block-size transform
• Short word length transform
• Arithmetic entropy coding
• Context-adaptive entropy coding

10
Robustness to Data Errors/Losses
Robustness to Data Errors/Losses
• Flexible slice size
• Flexible macroblock ordering (FMO)
• Arbitrary slice ordering (ASO)
• Redundant pircture
• Data Partition
• SP/SI synchronization/switching pictures

11
VCL
VCL
• Block based hybrid video coding approach
• Each coded picture is represented in
block-shaped units of associated luma and
chroma samples called macroblocks.
• Source coding algorithm is a hybrid of
inter-picture to exploit temporal statistical
dependencies and transform coding of the
prediction residual to exploit spatial
statistical dependencies

12
Color Space Conversion
Color Space Conversion

13
Sub
Sub-
-sampling of Chrominance
sampling of Chrominance
4:2:0 format

14
Picture, Frame, Field
Picture, Frame, Field
• A coded video consists of a sequence of coded
picture
• A frame contains two fields, a top and a bottom
field
• If two fields of a frame are captured at different
time instants, the frame is referred to as an
interlaced frame, and otherwise it is referred to
as a progress frame
• A coded picture can represent either an entire
frame of a single field

15
Interlace Video
Interlace Video

16
Adaptive Frame/Field Coding
• In interlaced frame, two adjacent rows tend to show a
reduced degree of statistical dependency when
compared to progressive frames.
• To provide high coding efficiency
– To combine two field and to code then as single coded frame
(frame mode)
– To code them as separate coded field (field mode)
– To combine two field as a single frame, but coding the frame to
split the pairs of two vertical adjacent macroblocks into either
pairs of two field or frame macroblock.
• The choice of the three options can be made adaptively
for each frame in a sequence

17
• The first two options is referred to as picture-adaptive
frame/field (PAFF) coding
• The frame field encoding decision can be made
independently for each vertical pair of macroblocks
– This coding option is referred to as macroblock-adpative
frame/field (MBAFF) coding
– For a macroblock pairs that is coded in frame mode, each
marcoblock contains frame lines.
– For a macroblock pairs that is coded in field mode, top
macroblock contains top field lines and the bottom macroblock
contains bottom field lines.

18
• Scanning order

19
Slice and Slice Groups
• Slices are a sequence of
macroblocks which is in raster
scan order when not using FMO
• Slice are self-contained
• Flexible macroblock ordering
(FMO) modified the way how
pictures are partitioned into slices
by utilizing slice groups

20
• Each Slice group is defined by a macroblock to slice
group map
• Each slice group can be partitioned into one or more
slice in raster scan order
• I slice: macroblocks are coded using intra prediction
• P slice: coding type in I slice and inter prediction with at
most one motion compensated prediction signal
• B slice: coding type in P slice and inter prediction with
two motion compensated prediction signal

21
SP and SI slices
SP and SI slices
• SP: switching P slice
• SI: switching I slice

22
SP and SI slices
SP and SI slices

23
Intra Prediction
Intra Prediction
Entropy
Coding
Scaling & Inv.
Transform
Motion-
Compensation
Control
Data
Quant.
Transf. coeffs
Motion
Data
Intra/Inter
Coder
Control
Decoder
Motion
Estimation
Transform/
Scal./Quant.
-
Input
Split into
Macroblocks
16x16 pixels
Intra-frame
Prediction
De-blocking
Filter
Output
Video
Signal

24
Intra Prediction
Intra Prediction
• Two mode for luma block
– Intra 4x4
• 9 modes
• Used in texture area
– Intra 16x16
• 4 modes
• Used in flat area
• One mode for chorma block
– Similar to intra 16x16

25
Intra Prediction
Intra Prediction
• In all slice-coding types, Intra_4x4, Intra_16x16, I_PCM
are supported
• Intra prediction across slice boundary is not used
• I_PCM allow the encoder directly send the values of the
encoded sample
Intra 16X16

26
Intra Prediction
Intra Prediction
Intra 4x4
• When E-H are not available, they are replaced by
sample D

27
4x4 Intra Prediction Mode
• Mode 2: DC prediction
If(A-D and E-H are in the slice)
Prediction=(A+B+C+D+E+F+G+H+4)/8
else if (A-D exist and E-H not exist)
Prediction=(A+B+C+D+2)/4
else if (A-D not exist and E-H exist)
Prediction=(E+F+G+H+2)/4
else
Prediction=128
I A B C D
E a b c d
F e f g h
G i j k l
H m n o p

28
• Mode 0 • Mode 1
I A B C D
E a b c d
F e f g h
G i j k l
H m n o p
I A B C D
E a b c d
F e f g h
G i j k l
H m n o p

29
I A B C D
E a b c d
F e f g h
G i j k l
H m n o p
Q A B C D E F G H
I a b c d
J e f g h
K i j k l
L m n o p
M
N

30
Q A B C D E F G H
I a b c d
J e f g h
K i j k l
L m n o p
M
N
Q A B C D E F G H
I a b c d
J e f g h
K i j k l
L m n o p
M
N

31
Q A B C D E F G H
I a b c d
J e f g h
K i j k l
L m n o p
M
N
Q A B C D E F G H
I a b c d
J e f g h
K i j k l
L m n o p
M
N

32
Chroma
Chroma intra prediction
intra prediction
• Independent to luma prediction mode
• Similar to luma 16x16 macroblock type
– Mode 0 : vertical prediction
– Mode 1 : horizontal prediction
– Mode 2 : DC prediction
– Mode 3 : plane prediction

33
Chroma
Chroma intra DC prediction
intra DC prediction
S0-S4 out of picture:
A = B = C = D = 128
– A = (S2 + 2)>>2
– B = (S2 + 2)>>2
– C = (S3 + 2)>>2
– D = (S3 + 2)>>2
– A = (S0 + 2)>>2
– B = (S1 + 2)>>2
– C = (S0 + 2)>>2
– D = (S1 + 2)>>2
S0-S4 all in the picture:
– A = (S0 + S2 + 4)>>3
– B = (S1 + 2)>>2
– C = (S3 + 2)>>2
– D = (S1 + S3 + 4)>>3
A B
C D
S1
S0
S2
S3
MB Boundary
MB Boundary
8
pixels
S0-S4 : Sum of 4 boundary pixels

34
Intra4x4 Prediction Mode Prediction
Intra4x4 Prediction Mode Prediction
• Each Macroblock has 16 4x4 blocks
• The default prediction is mode 2: DC Prediction
use_most_probable_mode
most_probable_mode = min(PA, PB)
If (remaining_mode_selector< most_probable_mode)
then intra_pred_mode = remaining_mode_selector
else intra_pred_mode = remaining_mode_selector+1

35
Intra 16x16 Prediction Mode Coding
Intra 16x16 Prediction Mode Coding
• Prediction Mode, AC, coded block pattern (CBP)
– CBP: nncccc
• c: AC, n: nc (Table 7-14)
– Intra_16X16_x_y_z
• x: prediction mode
• y: nc
• z: AC
– x = (mb_type-1) & 3
– n = y
– If (z=1) then cccc = 1111 else ccccc=0000

36
Coded Block Pattern
Coded Block Pattern
There may be 2x2 nonzero coefficients and there is at least
one nonzero chroma AC coefficient present. In this case we
need to send 10 EOBs (2 for DC coefficients and 2x4=8 for
the 8 4x4 blocks) for chroma in a macroblock.
nc=2
There are nonzero 2x2 transform coefficients. All chroma AC
coefficients = 0. Therefore we do not send any EOB for
chroma AC coefficients.
nc=1
no chroma coefficients at all.
nc=0:
CBPY: Least Significant bits of CBP contain information on which of
4 8x8 lum blocks in a macroblock contains nonzero
coefficients
CBP=CBPY+16 X nc

37
MC/ME
MC/ME
Entropy
Coding
Scaling & Inv.
Transform
Motion-
Compensation
Control
Data
Quant.
Transf. coeffs
Motion
Data
Intra/Inter
Coder
Control
Decoder
Motion
Estimation
Transform/
Scal./Quant.
-
Input
Split into
Macroblocks
16x16 pixels
Intra-frame
Prediction
De-blocking
Filter
Output
Video
Signal

38
Motion Compensation
• Various block sizes and shapes for motion
compensation
• 1/4 sample accuracy
– 6 tap filtering to 1/2 sample accuracy
– simplified filtering to 1/4 sample accuracy
• Allow motion vectors over picture boundary
• Multiple reference pictures
• Generalized B-frames
• B-frame prediction weighting
Inter Frame Prediction

39
Multiple reference frames
Multiple reference frames
• Multiple reference frame (PTYPE) indicates
possibility of prediction from more than
one previous decoded picture, the exact
frame to be used must be signaled

40
• Each P macroblock type corresponds to a specific
partition
• A maximum of 16 motion vectors may be transmitted for
a single P macroblock
剛張

41
The partition of macroblock
The partition of macroblock

42
Partition Example
Partition Example

43
Sub
Sub-
-Pixel Motion Vector
Pixel Motion Vector

44
Fractional Sample
Fractional Sample
The half-sample are obtained by a
one-dimensional 6-tap filter
• b1=(E-5F+20G+20H-5I+J)
h1=(A-5C+20G+20M-5R+T)
• b=(b1+16)>>5
h=(h1+16)>>5
• j1=cc-5dd+20h1+20m1-5cc+ff
j=(j1+512)>>10
The quarter sample positions a, c,
d, n, f, I, k ,q are derived by
interpolation.
• a=(G+b+1)>>1
• E=(b+h+1)>>1

45
Motion Vector Prediction
• Median Prediction
– Block Based
If C not exist then C=D
If B, C not exist then prediction = VA
If A, C not exist then prediction = VB
If A, B not exist then prediction = VC
Otherwise Prediction = median(VA,,VB,VC)

46
• 8x16
– Left: If left block is available then Predict from
left, otherwise median prediction

47
Motion Estimation
Motion Estimation
• Median Prediction for
MB to find search
center
• Spiral Search
• Calculate 4x4 block
SAD
• Combine to large
block
• Half & Quarter-
Sample Refinement
24
14
12
10
23
22
8
2
7
21
20
6
0
5
19
18
4
1
3
17
16
13
11
9
15
Orgiginal
MB
Search center
Reference frame
8
2
7
6
0
5
4
1
3

48
SADBlock
SADBlock
BlockSAD[reference frame][blocktype][block4x4][maxpos]
Blocktype 6 Blocktype 5 Blocktype 4
15
11
7
3
10
8
2
0
14
13
12
10
9
8
6
5
4
2
1
0
15
11
7
3
12
8
4
0
14
14
13
12
10
10
9
8
6
6
5
4
2
2
1
0
11
15
11
7
3
3
10
9
8
2
1
0
14
13
12
10
9
8
6
5
4
2
1
0

49
SADBlock
SADBlock
15
11
7
3
2
0
14
13
12
10
9
8
6
5
4
2
1
0
Blocktype 3
15
11
7
3
8
0
14
13
12
10
9
8
6
5
4
2
1
0
Blocktype 2
15
11
7
3
0
14
13
12
10
9
8
6
5
4
2
1
0
Blocktype 1

50
P_SKIP type
P_SKIP type
• For this type, neither a quantized prediction error
signal, nor a motion or reference index
parameter is transmitted
• The reference picture is located at index 0 in the
multi-picture buffer
• The motion vector is predicted from motion
vector predictor
• It’s used for large are with no change or constant
motion.

51
About Motion Vector Cost
About Motion Vector Cost
( )
( )
[ ] [ ]
{ }
16
2
_
_
_
_
_
_
,
_
,
_
,
_
,
_
_
_
_
y
pred
y
cand
mvbits
x
pred
x
cand
mvbits
factor
lambda
SAD
y
pred
x
pred
y
cand
x
cand
factor
lambda
Cost
MV
SAD
Cost
MV
factor
lambda
Distortion
J
−
+
−
×
+
=
+
=
×
+
=
λ
Lambda = QP2QUANT [max (0, img->QP-12)]
QP2QUANT[40] = { 1, 1, 1, 1, 2, 2, 2, 2,
3, 3, 3, 4, 4, 4, 5, 6,
6, 7, 8, 9,10,11,13,14,
16,18,20,23,25,29,32,36,
40,45,51,57,64,72,81,91 }
lambda_factor = 216 * lambda + 0.5

52
Entropy
Coding
Inv. Scal. &
Transform
Motion-
Compensation
Control
Data
Quant.
Transf. coeffs
Motion
Data
Intra/Inter
Coder
Control
Decoder
Motion
Estimation
Transform/
Scal./Quant.
-
Input
Video
Signal
Split into
Macroblocks
16x16 pixels
Intra-frame
Prediction
De-blocking
Filter
Output
Video
Signal
Transform and Quantization

53
• H.264 utilizes transform coding of the residual.
ƒ 4x4 Block Integer Transform
ƒ Main Profile: Adaptive Block Size
Transform (8x4,4x8,8x8)
ƒ Repeated transform of DC coeffs for 8x8 chroma and 16x16 Intra
luma blocks
1 1 1 1
2 1 1 2
1 1 1 1
1 2 2 1
⎡ ⎤
⎢ ⎥
− −
⎢ ⎥
=
⎢ ⎥
− −
⎢ ⎥
− −
⎢ ⎥
⎣ ⎦
H

54
• There are 62 (0-61) quantization parameters
• Increase of 1 in quantization parameters means
an increase of quantization step size by
approximately 12%
• Increase of 6 mans an increase of quantization
step size by a factor of 2

55
• The quantized transform coefficients of
a block are scanned in a zig-zag
fashion and transmitted using entropy
coding method
• The 2x2 DC coefficients of the chroma
are scanned in raster-scan order 0
0
0
0
0
0
0
1
0
1
-1
0
0
-1
3
0
• Inverse transform can be implemented using only
additions and bit-shifting operation of 16-bit integer
value

56
Entropy
Coding
Inv. Scal. &
Transform
Motion-
Compensation
Control
Data
Quant.
Transf. coeffs
Motion
Data
Intra/Inter
Coder
Control
Decoder
Motion
Estimation
Transform/
Scal./Quant.
-
Input
Video
Signal
Split into
Macroblocks
16x16 pixels
Intra-frame
Prediction
De-blocking
Filter
Output
Video
Signal
Entropy Coding
Entropy Coding

57
Variable Length Coding
Variable Length Coding
• Exp-Golomb code is used universally for all
symbols except for transform coefficients
• Context adaptive VLCs for coding of transform
coefficients
– No end-of-block, but number of coefficients is
decoded
– Coefficients are scanned backwards
– Contexts are built dependent on transform
coefficients

58
• Usage of adaptive probability models for most
symbols
• Exploiting symbol correlations by using contexts
• Restriction to binary arithmetic coding
– Simple and fast adaptation mechanism
– Fast binary arithmetic codec based on table look-ups
and shifts only
• Average bit-rate saving over CAVLC 10-15%
Context
Context-
-based Adaptive Binary Arithmetic Codes
based Adaptive Binary Arithmetic Codes
(CABAC)
(CABAC)

59
Deblocking
Deblocking Filter
Filter
Entropy
Coding
Scaling & Inv.
Transform
Motion-
Compensation
Control
Data
Quant.
Transf. coeffs
Motion
Data
Intra/Inter
Coder
Control
Decoder
Motion
Estimation
Transform/
Scal./Quant.
-
Input
Split into
Macroblocks
16x16 pixels
Intra-frame
Prediction
De-blocking
Filter
Output
Video
Signal

60
1) Without Filter 2) with H264/AVC Deblocking
Deblocking
Deblocking Filter
Filter

61
Reconstruction Filter
Reconstruction Filter
• block edges are smoothed, improving the
appearance of decoded images
• the filtered macroblock is used for motion
compensated prediction of further frames in the
encoder, resulting in a smaller residual after
prediction.
• intra-coded macroblocks are filtered, but intra
prediction is carried out using unfiltered
reconstructed macroblocks to form the prediction.

62
Filter Applied Order
Filter Applied Order

63
Deblocking
Deblocking Filter
Filter

64
Boundary Strength
Boundary Strength
P0, P1,
Q0, Q1
P0, P1,
Q0, Q1
P0, P1,
Q0, Q1
P0, P1, P2,
Q0, Q1, Q2
Bs=0 (no
filtering)
neither p or q is intra coded; neither p or q contain coded
coefficients; p and q have same reference frame and
identical motion vectors
Bs=1
neither p or q is intra coded; neither p or q contain coded
coefficients; p and q have different reference frames or a
different number of reference frames or different motion
vector values
Bs=2
neither p or q is intra coded; p or q contain coded
coefficients
Bs=3
p or q is intra coded and boundary is not a macroblock
boundary
Bs=4
(strongest
filtering)
p or q is intra coded and boundary is a macroblock
boundary

65
Filter Decision
Filter Decision
• Filter is applied only if
– Bs > 0
– |p0-q0|, |p1-p0| and |q1-q0| are each less than a
threshold alpha or beta
Alpha = ALPHA_TABLE[indexA]
Beta = BETA_TABLE[indexB]
indexA=QP+AlphaC0Offset
indexB=QP+BetaOffset
QP=(MBp->QP+MBq->QP)/2

66
Filter of edges with Bs=4
Filter of edges with Bs=4
If |p2-p0|<Beta & |p0-q0|<round(Alpha/4)
P0=(p2+2p1+2p0+2q0+q1)/8 // 5 tap
P1=(p2+p1+p0+q0)/4 // 4 tap
P2=(2p3+3p2+p1+p0+q0)/8 // luma only
Else
P0=(2p1+p0+q1)/4

67
Filter of edges with Bs<4
Filter of edges with Bs<4
dif = clip3(-C,C,((q0-p0)<<2+(p1-q1)+4)>>3)
P0=clip3(0, 255, p0+dif)
If |p2-p0| < Beta
P1=p1+Clip3(-C0,C0,(p2+(p0+q0)>>1-(p1<<1))>>1)
• C = C0 + (|p2-p0|<Beta) + (|q2-q0|<Beta) //for luma
• C = C0 + 1 //for chro
2
5
2
3
2
0
1
8
1
6
1
4
1
3
1
1
1
0
9
8
7
6
6
5
4
4
4
3
3
3
2
2
2
2
1
Bs = 3
1
7
1
5
1
3
1
2
1
1
1
0
8
8
7
6
5
5
4
4
3
3
3
2
2
2
2
1
1
1
1
1
Bs = 2
1
3
1
1
1
0
9
8
7
6
6
5
4
4
4
3
3
3
2
2
2
2
1
1
1
1
1
1
1
Bs = 1
5
1
5
0
4
9
4
8
4
7
4
6
4
5
4
4
4
3
4
2
4
1
4
0
3
9
3
8
3
7
3
6
3
5
3
4
3
3
3
2
3
1
3
0
2
9
2
8
2
7
2
6
IndexA
C0 Table

68
H.264 Profiles
H.264 Profiles
• H.264/AVC currently has three Profiles
– Baseline (good for most applications up
through D-Cinema)
– Main (adds interlace, B-Slices and CABAC
efficiency gains)
– Profile X (the so-called streaming profile)

69
H.264 Profiles
H.264 Profiles
• Baseline (Progressive, Videoconferencing &
Wireless)
– I and P picture types (not B)
– In-loop De-blocking filter
– Progressive pictures and Interlaced pictures
– 1/4-sample motion compensation
– Tree-structured motion segmentation down to 4x4
block size
– VLC-based entropy coding (UVLC and CAVLC)

70
H.264 Profiles
H.264 Profiles
• Main Profile
– All Baseline features except enhanced error resilience
features
– B pictures
– CABAC
– Adaptive Block-Size Transform (8x4, 4x8, 8x8)
– MB-level frame/field switching
– Adaptive weighting for B and P picture prediction
– Interlace

71
H.264 Profiles
H.264 Profiles
• New Profile X
– All Baseline features
– B pictures
– More error resilience: Data partitioning
– SP/SI switching pictures

Overview_of_H.264.pdf

More Related Content

Similar to Overview_of_H.264.pdf (20)

More from JunZhao68 (20)

Recently uploaded (20)

Overview_of_H.264.pdf