Developement and Implementation of an MPEG1 Layer III Decoder on x86 and TMS320C6711 platforms
Farina Simone
(Braidotti Enrico)
DECODING PROCESS
Input File
Retrieving File Information
Huffman Decoding
Requantization
Alias Reconstruction
Reordering
Stereo Processing
Hybrid Synthesis (IMDCT, Windowing, Overlap-Add)
Frequency Inversion
Synthesis Polyphase Filterbank
PCM output samples
ALIAS RECONSTRUCTION
Not encoded signal
It is performed only when using long blocks: this means only when using pure long blocks or mixed blocks. Same signal encoded using long blocks
Lets see what long/short blocks are
Same signal encoded using short blocks
HYBRID SYNTHESIS
IMDCT (Inverse Modified Discrete Cosine Transform)
Subbands are backward transformed separately depending on
block length.
6-point IMDCT
When short blocks are used (pre-echoes masking)
18-point IMDCT When long blocks are used
HYBRID SYNTHESIS
Fast IMDCT algorithm (Szu-Wei Lee )
Based on simmetric properties of cosine function It needs a rearranging stage to restore values to their original positions Drastically reduces number of operations if compared to direct implementation
(short/long)
Direct Implementation Fast IMDCT (Szu-Wei Lee) Improvement 216 / 648 33 / 43 84.7 % / 93.3 %
+ (short/long)
180 / 612 69 / 115 61.7 % / 81.2 %
HYBRID SYNTHESIS
Windowing
Once transformed, subbands are windowed according to value of block_type (subbands with short blocks are separately transformed for each window and then overlapped)
Overlap-adding First half of transformed blocks is overlapped with second half of the corresponding blocks in the previous granule
FREQUENCY INVERSION
Every second sample in every second subband has to be multiplied by -1.
SYNTHESIS POLYPHASE FILTERBANK
Composed of several steps, it turns out to be the most
This process produces 32 PCM audio samples. 576 / granule 1152 / frame (equal to 26 ms of audio @ 44,1 kHz)
time-consuming
stage of the overall decoding process
SYNTHESIS POLYPHASE FILTERBANK
Polyphase Matrixing
It is a cosine-like transform (non standard ) The direct computation involves a 6432 matrix and requires almost of decoding time Needs optimization to perform real-time decoding
K. Konstantinides algorithm
32-point Fast DCT ([Link])
SYNTHESIS POLYPHASE FILTERBANK
Konstantinides Algorithm
SYNTHESIS POLYPHASE FILTERBANK
FCT Algorithm (Byeong-Gi Lee )
Using trigonometrical properties a 2M DCT can be performed by 2M-1 2-point DCTs Direct computation =N + = N ( N-1 ) FCT
= N/2 log2 ( N )
+ < 3 N/2 log2 ( N )
WAVE STANDARD
Individuated by a 44-byte header, holds information about:
sampling frequency
number of channels ...
Uncompressed PCM audio samples (normally with 16 bits/sample resolution) stored in following way:
Istante di campionamento
Canale 1 (Left) 2 (Right)
1 (Left) 2 (Right)
1 (Left) 2 (Right)
PERFORMANCE ANALYSIS
PC Performances
The decoder, without optimization, works in real time on the following CPUs:
The decoder, with optimization , reaches 17,5 on Pentium IV CPU
PERFORMANCE ANALYSIS
C6711 DSK Performances
The decoder, without optimization, doesnt work in real time on the board
Parallel port is used for data transfer and its very very slow Most algorithms need optimization (only Huffman Decoding is optimized) Code needs some ASM optimization to use the full-potential of the board architecture Whole decoding process (except data transfer TO external hard disk) takes about 10 times more than needed to work in real time. With optimization it is an easy goal to reach.
PERFORMANCE ANALYSIS
Time-occupation of optimized processes on C6711 DSK:
PERFORMANCE ANALYSIS
Time-occupation of optimized processes on C6711 DSK:
PERFORMANCE ANALYSIS
Time-occupation of other processes on C6711 DSK: