Level Decoder Architecture for CAVLD of H.264 Video Compression Standard

Level Decoder Architecture for CAVLD of H.264 Video

154 XXIII SIM - South Symposium on Microelectronics

2. Context Adaptive Variable Length Decoder

The decoder receives a compressed bitstream from the NAL inputs [RIC 03]. The data elements are entropy decoded and reordered to produce a set of quantized coefficients, which are sent to the inverse quantization block. In Baseline profile, the residual block data is decoded using CAVLD scheme, and other variable-length coded units are decoded using Exp-Golomb codes [SAL 00][INT 03][RIC 03].

The CAVLD process can be divided in the following main operations:

 Decoding the number of coefficients and trailing ones:

The first step in CAVLD is to decode the total number of non-zero coefficients (TotalCoeffs) and the number of trailing +/-1 values (T1). The TotalCoeffs is ranged from 0 to 16, while T1 is ranged from 0 to 3. There are five choices of look-up tables in this part: Num-VLC0 (nC= -1), Num-VLC1 (0<=nC<2), Num-VLC2 (2<=nC<4), Num-VLC3 (4<=nC<8), and FLC (8<=nC).

 Decoding the sign of each T1:

From the above coeff_token process, we can know the value of T1. Thus, a single bit is used to decode the sign (0=+1, 1=-1) of each T1 in reverse order.

 Decoding the levels of the remaining non-zero coefficients:

The level of each non-zero coefficient is decoded in reverse order. The choice of VLC tables to decode each level is decided accordingly to the magnitude of each successive coded level, i.e. the context adaptive feature in CAVLD.

 Decoding the total number of zeros before the last coefficient:

The total_zeros tables are applied for decoding AC 4x4 blocks or DC 2x2 blocks. The VLC tables to decode the total zeros is decided accordingly to the total number of the non-zero coefficients in the current AC 4x4 or DC 2x2 blocks.

 Decoding each run of zeros:

The number of zeros preceding each non-zero coefficient is decoded in reverse order. The VLC table for each run of zeros is chosen depending on the previous number of zeros (zeroleft) which were not decoded yet.

3. Level Decoder Architecture

The designed architecture is formed by a simple datapath, a finite state machine to control it, and a buffer.

The process to decode a level has a regular structure and it does not need tables, as cited above. The process is carried out through the reading of bits from the input and a level value is generated respecting a set of steps.

A level code consists of a prefix and a suffix. The prefix is defined as a sequence of zeros, that has its size determined by tam_prefix, until de first bit one. The suffix has a variable size, from 0 to 6 bits. The size of suffix is adapted as the level decoding process is performed, relying on previous levels magnitudes. After read prefix and suffix codes from input, the CodeNo is computed, as shown in (1) below. In (1), table indicates the size of suffix.

CodeNo = ((tam_prefix x (2^table)) + suffix) (1)

From CodeNo, it is possible to reach the level value. If the CodeNo is even, the level is calculated as (2) or (3).

Level = ((CodeNo + 2) /2) (2)

Level = - ((CodeNo + 1) /2) (3)

The datapath of the designed architecture is shown in Fig. 2. It is formed by 2 adders, 1 multiplier, 1 decoder, 4 registers, 2 multiplexers, the module responsible to adapt the decoding table, the module that arrange the buffer output in the suffix register input, a controlled NOT and a shifter.

Fig.2 - Datapath for level decoder

The finite states machine is shown in Fig. 3. It has 10 states that work as follows.

The state 0 reset the architecture. The state 1 start the decoding process for a set of levels. The 2 and 3 states are responsible to read the prefix and suffix from the input, respectively. The state 3 takes just 1 cycle because the suffix bits are read in parallel from the buffers content. Thus, the CodeNo can be generated as described above. This operation is done in state 4.

In state 5 the level value is generated from CodeNo. After that, some states were inserted to solve the context adaptability and special cases. The state 6 adapt the suffix size based on the last decoded level. The state 7 and 8 are used to increment or decrement the level magnitude, a special case specified in H.264/AVC standard. Finally, the state 9 indicates that a level code is ready and then the FSM returns to state 1, where a new cycle, for the next level, begins.

Fig.3 - Finite states machine for level control

4. Synthesis Results

The synthesis of the Level Decoder architecture was targeted to the Altera Stratix-III EP2S15F484C2 FPGA [ALT 08]. These results are presented in Table 1.

Tab.1 - Synthesis Results Architectur

e Logic Cells Frequency (MHz)

Throughput (Msamples/s) Level

Decoder 251 171.44 61.83

Device EP3SE50F484C2

Through the obtained results it was possible to verify that the level decoder used 251 logic cells of the target FPGA. In performance terms, the critical path estimated by the synthesis tool was of 5.83 ns, allowing a maximum operation frequency of 171.44 MHz.

Considering the results from the software evaluation, it was possible to estimate a mean quantity of 14 cycles to decode of a level. Thus, with an operation frequency of 171.44 MHz this architecture is able to generate 12.25 million of levels per second. Moreover, from the software valuation, it was possible to determine that a level is generated in average at each 5.1 samples of video. Then, the throughput of this architecture was estimated in 61.83 million of samples per second.

Although the designed architecture has a simple datapath, it presented a high performance, reaching the requirements necessary to process SDTV (720x480 pixels) frames in real time.

5. Conclusions and Future Works

This work designed an architecture to level decoding, presented in Entropy Decoder of H.264/AVC video compression standard.

The designed solution presented good results. The average processing rate presented is able to process SDTV (720x480 pixels) frames in real time, fulfilling the requirements of H.264/AVC standard.

As future works, is intended to use a first one detector, as seen in [ALL 06], aiming to increase prefix reading speed and so maximizing the architecture capabilities. Moreover, the integration of level decoder with the other building modules of CAVLD and Exp-Golomb Decoder, will allow a complete solution for Entropy Decoder.

156 XXIII SIM - South Symposium on Microelectronics

6. References

[ALL 06] M. Alle, J. Biswas, S. K. Nandy, High Performance VLSI Architecture Design for H.264 CAVLC Decoder. IEEE Application-specific Systems, Architectures and Processors, 2006.

[ALT 08] ALTERA Corporation. Available in: http://www.altera.com

[INT 07] INTERNATIONAL TELECOMMUNICATION UNION. Joint Video Team (JVT), Available in:

<www.itu.int/ITUT/studygroups/com16/jvt/>.

[INT 03] INTERNATIONAL TELECOMMUNICATION UNION. ITU-T Recommendation H.264 (05/03): advanced video coding for generic audiovisual services. 2003.

[INT 05] INTERNATIONAL TELECOMMUNICATION UNION. ITU-T Recommendation H.264 (03/05):

Advanced Video Coding for Generic Audiovisual Services. 2005.

[JOI 03] Joint Video Team, Draft ITU-T Recommendation and Final Draft International Standard of Joint Video Specification, ITU-T Rec. H.264 and ISO/IEC 14496-10 AVC, Maio 2003.

[RIC 03] I. Richardson, H.264 and MPEG-4 Video Compression – Video Coding for Next-Generation Multimedia. Chichester: John Wiley and Sons, 2003.

[SAL 00] D. Salomon, Data Compression: The Complete Reference. New York: Springer, 2000.

High Efficiency Hardware Design for Binary Arithmetic Decoder

No documento SIM 2008 23 (páginas 151-155)