Improved context modeling for coding quantized transform coefficients in video compression

June 22, 2017 | Autor: Detlev Marpe | Categoría: Video Compression, Video Codec, High Resolution, Context Model, Probability Model

Share Embed

Laporkan tautan ini

Descripción

28th Picture Coding Symposium, PCS2010, December 8-10, 2010, Nagoya, Japan

IMPROVED CONTEXT MODELING FOR CODING QUANTIZED TRANSFORM COEFFICIENTS IN VIDEO COMPRESSION Tung Nguyen1, Heiko Schwarz1, Heiner Kirchhoffer1, Detlev Marpe1, and Thomas Wiegand1,2 [ tung | hschwarz | kirchhof | marpe | wiegand ]@hhi.de 1

2

Fraunhofer Institute for Telecommunications Heinrich Hertz Institute Image Processing Department Einsteinufer 37, D-10587 Berlin, Germany

Image Communication Chair Department of Telecommunication Systems Technical University of Berlin Einsteinufer 17, D-10587 Berlin, Germany

ABSTRACT Recent investigations have shown that the support of extended block sizes for motion-compensated prediction and transform coding can significantly increase the coding efficiency for high-resolution video relative to H.264/AVC. In this paper, we present a new context-modeling scheme for the coding of transform coefficient levels that is particularly suitable for transform blocks greater than 8x8. While the basic concept for transform coefficient coding is similar to CABAC, the probability model selection has been optimized for larger block transforms. The proposed context modeling is compared to a straightforward extension of the CABAC context modeling; both schemes have been implemented in a hybrid video codec design that supports block sizes of up to 128x128 samples. In our simulations, we obtained overall bit rate reductions of up to 4%, with an average of 1.7% with the proposed context modeling scheme. Index Terms— context modeling, transform coding 1.

INTRODUCTION

The most successful class of video coding designs is based on a combination of block-based motion compensation and transform coding. The state-of-the-art video coding standard H.264/AVC [1][2] supports block sizes ranging from 4x4 to 16x16 samples for motion compensation and transform sizes of 4x4 and 8x8 samples. Recent investigations [3] have shown, however, that the usage of larger block sizes for both motion compensation and transform coding can provide significant coding efficiency improvements, in particular for high-resolution source material (720p, 1080p, and beyond). Consequently, most proposals submitted as response to the latest Joint Call of the ITU-T Q.6/16 (VCEG) and ISO/IEC JTC1/WG11 (MPEG) for new video coding technologies [4] include large block transforms of up to 128x128 samples. A highly efficient entropy coding scheme for transform coefficient levels (i.e., quantized transform coefficients) is specified in H.264/AVC for the context-based adaptive bi-

978-1-4244-7135-5/10/$26.00 ©2010 IEEE

nary arithmetic coding (CABAC) [5]. It has been initially designed [6] for 4x4 transform blocks and has been later extended to 8x8 transform blocks in a straightforward way. The transform coefficient coding consists of two steps. In the first step, a significance map specifying the locations of transform coefficients unequal to 0, which are also referred to as significant transform coefficient levels, is coded. In the second step, the absolute values and signs of the significant transform coefficients are coded in reverse scanning order. For coding the significance map, the used probability models are selected depending on the scan position inside a transform block, while the probability model selection for the absolute values is based on the values of already coded transform coefficient values inside a transform block. Our investigations show that a straightforward extension of the CABAC context modeling to larger block sizes is suboptimal, as will be further explained in sec. 2. We propose an improved context modeling that is particularly suitable for transform blocks larger than 8x8. The significance map is coded using an adaptive scan and the corresponding probability models are selected depending on already coded symbols for neighboring transform coefficients. For coding the absolute transform coefficient level values, a transform block is partitioned into 4x4 blocks and a set of probability models is adaptively chosen for each of these 4x4 blocks based on the transform coefficient level values in already coded 4x4 blocks of the same transform block. 2.

CONTEXT MODELING IN H.264/AVC

In this section, we briefly review the CABAC transform coefficient coding of H.264/AVC and point out disadvantages of a straightforward extension to larger block sizes. For further details on the CABAC transform coefficient coding, the reader is referred to [1][5][6]. As mentioned in the introduction, the CABAC transform coefficient coding proceeds in two steps. In the first step, a significance map specifying the locations of significant transform coefficient level is coded. For each coefficient in a pre-defined scanning order (typically, a zig-zag

- 378 -

scan), a binary symbol SIG is transmitted, which specifies whether the corresponding transform coefficient level is unequal to 0. If the SIG symbol indicates a significant transform coefficient level, a second binary symbol LAST is sent, which specifies whether the current transform coefficient level is the last significant transform coefficient level inside the block. In the original design for 4x4 blocks, a distinct probability model – also referred to as context model in the following – is used for each scanning position for both the SIG and LAST flags. In the extension for 8x8 blocks, a context model is used for four successive scanning positions. The signs and absolute values of the significant transform coefficient levels are coded in reverse scanning order. All signs are coded using a fixed non-adaptive probability model. For coding the absolute values, a binary symbol ONE is transmitted first, which signals whether the absolute value is equal to 1 or greater than 1. If the absolute value is greater than 1, a further syntax element ABS is transmitted specifying the absolute value minus 2. The non-binary syntax element ABS is first binarized using a concatenated unary/Exp-Golomb binarization. All binary symbols (bins) of the unary part are coded with the same adaptive probability model, while the bins of the Exp-Golomb part are coded with a fixed non-adaptive probability model. Depending on the already coded absolute values for a block, one of five context models is selected for coding the ONE flag and the bins of the unary part of the ABS syntax element. Let NG1 and NE1 be the number of already coded absolute values greater than 1 and the number of already coded absolute values equal to 1, respectively. One of the five context models for the ONE flag is chosen if NG1 is greater than 0. If NG1 is equal to 0, a separate context model is selected for each value of Min( 3, NE1 ). For the unary bins of the ABS syntax element, a distinct context model is used for each value of Min( 4, NG1 ). The CABAC transform coefficient coding, which was originally designed for 4x4 blocks, has been extended to 8x8 blocks by using the same context model for the SIG and LAST flags of four successive scanning positions. The same concept can be used for a further extension to blocks greater than 8x8. However, we observed that the significant transform coefficients for large transform blocks are usually concentrated in particular regions of the transform block, which are dependent on the predominant structure of the transformed signal. Consequently, using the same context model for a number of successive transform coefficients in scanning order is suboptimal. Using a separate context model for each scan position is also inappropriate, since the corresponding increase of the number of context models results in a slow adaptation and often inaccurate estimates. As a further aspect, also the context modeling for the syntax elements ONE and ABS becomes unsuitable for larger block sizes, since a disproportionate part of these syntax elements is coded with the same probability model, although the corresponding bins usually have different statistics. Based on

these observations, we propose a modified context modeling for the syntax elements SIG, LAST, ONE, and ABS that reduces the mentioned inefficiencies for large transform blocks. 3.

PROPOSED CONTEXT MODELING

In order to reduce the inefficiencies of a straightforward extension of the CABAC transform coefficient coding to block sizes larger than 8x8, we propose a modified context modeling that takes into account our observations described in sec 2. The context models for the syntax element SIG are selected based on already coded values for neighboring transform coefficients. Furthermore, the significance map is coded using a backward-adaptive scan pattern. For coding the absolute transform coefficient values, transform blocks larger than 4x4 are partitioned into 4x4 sub-blocks and for each of these sub-blocks a set of context models is selected based on the absolute values of already transmitted 4x4 subblocks of the same transform block. The context modeling inside such a 4x4 sub-block is the same as in CABAC. 3.1. Adaptive scan order The scan order for coding the significance map is backwardadaptively determined, based on the values of the already coded SIG flags inside the same transform block. The adaptivity is achieved by switching between two pre-defined scan patterns at certain scan positions. As illustrated on the example of an 8x8 block in Fig. 1, the first candidate scan pattern consists of a number of diagonal sub-scans from bottom-left to top-right and the second candidate scan pattern consist of diagonal sub-scans from top-right to bottom left. The scanning of these diagonal sub-scans always proceeds from the top-left to the bottom-right corner inside a transform block. After a diagonal sub-scan is completed, it is decided which of the two scan patterns is used for the next diagonal. Therefore, the transform block is split into two triangles along the diagonal from the top-left to the bottomright corner. If the number of already coded SIG flags equal to 1 for the top-right triangle is larger than for the bottomleft triangle, the next diagonal is scanned from top-right to bottom-left; otherwise, the opposite scanning direction is used.

Fig. 1: Scanning patterns for the significance map coding.

- 379 -

Our experimental results showed that the adaptive switching between the scan patterns reduces the average number of transmitted SIG flags and, hence, decreases the bit rate for transmitting transform coefficient levels. As an intuitive example, residual blocks often contain mainly vertical or horizontal structures resulting in a concentration of significant transform coefficients at the left or top border of a transform block. Due to the adaptive scan, the diagonal sub-scans are processed starting at the border where the majority of significant coefficients resides and, consequently, a smaller number of SIG flags is transmitted before the last significant transform coefficient in scanning order (with a LAST flag equal to 1) is reached.

the location inside a diagonal sub-scan doesn't have a significant influence. For blocks smaller than 16x16, we use a different context modeling. For 4x4 blocks, the context modeling is done as specified in H.264/AVC. And for 8x8 blocks, the transform block is decomposed into 16 sub-blocks of 2x2 samples, and each of these sub-blocks is associated with a separate context model for coding the SIG and LAST flags.

3.2. Context modeling for the significance map For each supported transform block size and for each color plane, a separate set of context models is used for coding the SIG and LAST flags. For blocks larger than 8x8, the scan positions are divided into four classes based on their locations inside the transform blocks. Each class uses a separate sub-set of context models. The 2x2 block of scan positions in the top-left corner of the block forms the first class and uses one separate context model for each position. All other scan positions that are not directly located at the left or top border form the second class. For this class, the already coded SIG flags of the same block that are equal to 1 and reside inside a local template as depicted in Fig. 2a are counted in order to determine the applied context model. The cross inside Fig. 2a marks the current scan position and the circles represent scan positions of the corresponding local template. The number of significant scan positions inside the local template is divided by 2 and for each of the resulting 6 possible values a distinct context model is used. The remaining scan positions are divided into two classes, depending on whether they lie on to the left or top border. A reduced local template as depicted in Fig. 2b is used for determining the applied context model for these two classes. Similarly as for the template of Fig. 2a, the number of significant transform coefficient levels inside the local template is divided by 2 and a distinct context model is associated with each of the 3 possible values. Hence, 16 context models are used for each transform block size and color plane, independent of the transform block size. The proposed context modeling addresses the ‘local activity’ around the scan position that is being coded. If many SIG flags around the current scan position are equal to 1 (high activity), a different context model will be used than if only a few SIG flags with value of 1 are in the local neighborhood (low activity). For coding the LAST flags of blocks larger than 4x4, 16 different context models are used. These are assigned to the LAST flag such that one or more consecutive diagonals of the scan share one context model. This is motivated by the observation that the probability for a LAST flag equal to 1 increases with the distance of the corresponding scan position from the top-left corner of a transform block, whereas

Fig. 2: Local templates for context model selection of the SIG flags: (a) general template, (b) reduced template for scan positions that are located at the left or top transform block border.

3.3. Context modeling for coding the absolute values For coding the absolute transform coefficient levels for significant transform coefficients of blocks larger than 4x4, the transform blocks are partitioned into 4x4 sub-blocks as illustrated in Fig. 3. The absolute values for each of the 4x4 subblocks are coded separately by employing the same context modeling as in CABAC for the syntax elements ONE and ABS inside such a sub-block. The transform coefficients are scanned in a reverse zig-zag scan, where five different context models are used for the ONE flags and five different context models are used for the unary part of the ABS syntax elements. However, in our design, five different context model classes consisting of these 10 context models are employed, where one of these context model classes is selected for each 4x4 block in a backward-adaptive way. The order in which the sub-blocks are processed is defined by a zigzag scan, as depicted by the bold arrow in Fig. 3. The context model class for a particular 4x4 sub-block is selected based on the number of transform coefficient levels with absolute value greater than 1 in the previously processed 4x4 sub-block. This number, which is in the interval [0,16], is divided by 4 resulting in a class index inside the interval [0,4]. A distinct context model class is associated with each of the 5 possible class indices. For the first sub-block inside a transform block, the class index 4 is used. The modified context modeling adapts to the local properties inside large transform block while retaining the benefits of the original CABAC design. By adaptively selecting a context model class for each 4x4 sub-block, a more reliable probability modeling is obtained for coding the syntax elements ONE and ABS, which results in a more efficient coding of transform coefficient levels.

- 380 -

The overall bit rate savings that were obtained with the proposed context modeling are summarized in Table 1. It can be seen that the proposed technique always provides a bit rate reduction. The averaged bit rate reduction is 1.63% for the high delay case and 1.78% for the low delay case. The highest coding efficiency gains are obtained for high resolution sequences. 5. Fig. 3: Example for the partitioning of an 8x8 block into 4x4 subblocks and the corresponding scanning order.

4.

EXPERIMENTAL RESULTS

The efficiency of the proposed context modeling is evaluated using a hybrid video codec design [7] that supports motion compensation and transform blocks of up to 128x128 samples and provides average bit rate savings of about 30 % relative to H.264/AVC with a similar encoder control. The proposed context modeling is compared to a straightforward extension of the CABAC context modeling, which was described in sec. 2.

CONCLUSIONS

We presented an improved context modeling scheme for the coding of quantized transform coefficients in hybrid video coding designs that is particularly suitable for transform blocks greater than 8x8. The key elements of the proposed approach are an adaptive scan pattern, a probability model selection based on coded syntax elements for neighboring coefficients, and a partitioning of the transform block into smaller sub-blocks with a backward adaptive context model selection. The simulation results indicate that the proposed approach provides coding efficiency gains for video codec designs that support extended block size transforms, relative to a straightforward extension of the state-of-the-art CABAC context modeling.

Table 1: Bit rate savings for high and low delay coding.

REFERENCES Sequence BQSquare (416x240) BasketballPass (416x240) BlowingBubbles (416x240) RaceHorses (416x240) BQMall (832x480) BasketballDrill (832x480) PartyScene (832x480) RaceHorses (832x480) BQTerrace (1920x1080) BasketballDrive (1920x1080) Cactus (1920x1080) ParkScene (1920x1080) Kimono (1920x1080) Vidyo1 (1280x720) Vidyo3 (1280x720) Vidyo4 (1280x720) Average for 1920x1080 Average for 832x480 Average for 416x240 Average for 1280x720 Average

GOP 8 1.33 % 1.30 % 1.52 % 0.63 % 1.85 % 2.44 % 0.94 % 0.96 % 2.67 % 3.48 % 1.66 % 1.31 % 1.08 %

2.04 % 1.55 % 1.19 % 1.63 %

IPPP 1.90 % 1.55 % 1.49 % 1.12 % 2.06 % 1.97 % 0.97 % 0.81 % 2.63 % 3.99 % 1.75 % 1.25 % 0.60 % 2.24 % 2.12 % 2.02 % 2.04 % 1.45 % 1.51 % 2.12 % 1.78 %

The same configuration and encoder control is used for both coding schemes. The test sequences and coding conditions are selected according to [4], where a low and a high delay configuration are specified. For the low delay configuration, the conventional IPPP coding structure is used, while the popular hierarchical B coding structure with a GOP size of 8 pictures and an intra refresh of about 1 second is employed for the high delay coding. For both settings, the transform coefficient levels are determined by ratedistortion optimized quantization [8].

[1]

ITU-T and ISO/IEC, "Advanced Video Coding for Generic Audiovisual Services," ITU-T Rec. H.264 and ISO/IEC 14496-10, version 13, 2010.

[2]

T. Wiegand, G. J. Sullivan, G. Bjøntegaard, and A. Luthra, “Overview of the H.264/AVC Video Coding Standard”, IEEE Transactions on Circuits Systems for Video Technology, Vol. 13, pp. 560-576, July 2003.

[3]

P. Chen, Y. Ye, M. Karczewicz, "Video Coding using Extended Block Sizes," ITU-T Q6/16, Doc. VCEG-AJ23, Oct. 2008.

[4]

ITU-T Q6/16 and ISO/IEC JCT1/SC29/WG11, "Joint Call for Proposals on Video Compression Technology," Doc. VCEG-AM90, WG11 N11113, Jan. 2010.

[5]

D. Marpe, H. Schwarz, and T. Wiegand, “Context-Based Adaptive Binary Arithmetic Coding in H.264 / AVC Video Compression Standard” IEEE Transactions on Circuits and Systems for Video Technology, Vol. 13, No. 7, pp. 620-636, July 2003.

[6]

H. Schwarz, D. Marpe, G. Blättermann, and T. Wiegand, "Improved CABAC," Joint Video Team of ITU-T Q.6/16 and ISO/IEC JTC1/WG11, Doc. JVT-C060, May 2002.

[7]

M. Winken, S. Boße, B. Bross, P. Helle, T. Hinz, H. Kirchhoffer, H. Lakshman, D. Marpe, S. Oudin, M. Preiß, H. Schwarz, M. Siekmann, K. Sühring, and T. Wiegand, "Description of the Video Coding Technology Proposal by Fraunhofer HHI," JCT-VC, Doc. JVTVC-A116, Apr. 2010.

[8]

M. Karczewicz, Y. Ye, I. Chong, "Rate-distortion optimized quantization," ITU-T Q6/16, Doc. VCEG-AH21, Jan. 2008.

- 381 -

Lihat lebih banyak...

Improved context modeling for coding quantized transform coefficients in video compression

Descripción

Comentarios