H. 264/AVC stereo video compression benchmarking

June 16, 2017 | Autor: Davide Aliprandi | Categoría: Video Compression, Benchmarking, AVC, H, D

Descripción

H.264/AVC Stereo Video Compression Benchmarking Subarna Tripathi#

Emiliano Mario Piccinelli* Davide Aliprandi*

(Advanced System Technology labs, STMicroelectronics, India#, Italy*) {subarna.tripathi, emiliano.piccinelli, davide.aliprandi}@st.com

Abstract: This paper presents a study on different techniques for stereo video compression using different view packing formats and benchmarking their performances. With “view packing” formats we intend the possibility to encapsulate in each single frame 2 views (Left and Right) together at lower resolution: in particular we will analyze the quality versus compression behavior of the following main packing schemes: temporal, spatial row/column, spatial side-by-side/updown, half-flipping, checkerboard (quincunx). Any legacy H.264/AVC capable decoder is already able to decode the 2 packed views together, but it cannot understand autonomously what kind of receiving broadcast format it is decoding. This problem is addressed by the H.264 standard with the addition of stereo SEI message as part of Fidelity Extension Range: in this way the decoder will also be potentially able to adapt any stereo content to the native panel input data format required. Recently, H.264/MVC standard evolved as an extension to H.264 for multiple views coding exploiting multi view prediction and using a number of new techniques for better interview prediction. Key-words: 3D Video, Video Coding, Stereo video, H.264 simulcast, H.264 SEI and MVC

1

Introduction

3DTV is getting more and more popular for the last few years after technology development in 3D services and displays ensuring specific different view with each eye. For mobile applications and home user living room applications, 3D video compression is gaining attention. Currently available and most popular video compression standard is H.264 [1]. The addition of stereo SEI message in the standard as a part of Fidelity Extension Range (FRExt in 2005) makes the existing H.264 decoder enabled to decode a compressed stereo video and eventually can display it. Recently, H.264/MVC standard [2] evolved as an extension to H.264 for multiple views coding, exploiting multi view prediction

and using a number of new techniques for better interview prediction. This paper aims to highlight the strategy for stereo video encoding using H.264/AVC with stereo SEI message, and its comparative performance with respect to AVC simulcast and H.264/MVC cases. Different stereo formats describe various methods of L/R packing, e.g. temporal, spatial row/column, spatial side-byside/up-down, half-flipping, checkerboard (quincunx). Actually, there is diversification of formats for stereo video coding [3]. The salient features of stereo video are as follows. Stereo synchronized pairs are captured using special 3D cameras, with a distance (baseline) between the 2 views equal to the average eye distance (usually 6.5cm). Intuitively a stereo video stream implies twice the amount of data compared to a single view stream to be stored or transmitted and naturally twice the broadcast infrastructure, the decoder and the connectivity capability. For these main reasons, and to early enable new 3D TV services, the left and right views are combined together into one frame. In this way, the same encoder, infrastructure, decoder and connectivity can be fully reused without changes, paying this with a degradation in quality, but exploiting anyway the redundancy present in left and right views.

2

Different Stereo Formats

2D compatible simulcast is the obvious case (figure 1), where two encoded streams are there; eventually it takes twice the bitrate and twice the storage. Two times the infrastructure is needed, i.e. new consumer devices (extra decoder) are needed. D-Cinema solution (pair of JPEG 2000 streams) has its application. Frame-compatible systems for H.264 with stereo SEI is shown in figure 2. The examples include side-by-side, sidehalf-flipped, checkerboard, up-down-half-flipped, line-interleaved, etc (figure 5). In the first three cases views are horizontally decimated by 50% and then packed together as a single frame whereas in the latter three cases two views are

vertically decimated by 50% and then combined as a single frame. Flipping of right view is applied before packing in case of side-half-flipped and updown-half-flipped. Only line-interleaved frame-packing (like field picture) matches XPol and μPol pixel matrix [6] [7]. Input L

H.264 encoder

channel

H.264 encoder

channel

H.264 decoder

Input R H.264 decoder

Output L

Output R

Figure 3: Checkerboard: DLP® 3-D HDTV video format

Figure 1: H.264 Simulcast compression of two views channel

deci mati

Left on I/P

Input

R

samp Decimated Reconling view

Recon Left

Decimate d view

Decimated Recon view

Recon Right

O/P

L

Input

upDecimate d view

L/R packin g

H.264 encoder

H.264 decoder

L/R Depacki ng

L

Right I/P

Figure 4: Stereo compression block-diagram

O/P

R

Figure 2: H.264 stereo compression of two views

In checkerboard pattern, to accommodate the two views in a single progressive sequence, the original stereo pair is decimated horizontally by a factor of 2, with one pixel shift toggling line-byline (figure 3), resulting in couple of 2D diagonal (filter and) decimated images arranged together in a checkerboard-like output pattern. It is natively supported by DLP and Plasma. This is considered to be comparatively difficult to implement. Combining together two full resolution views would result in obtaining a combined 2Dcompatible frame resolution, which cannot be accommodated in HDTV resolution. So horizontal or vertical decimation of each view is needed (figure 4). Finally up-sampled reconstructed left and right views are used for PSNR comparison using input left and right views as references respectively. We have added results for MVC [2] compression with the two views also. Multiview Video Coding, an extension of H.264, is assumed to be optimal in terms of compression efficiency.

3

Stereo SEI Message

Stereo SEI [5] and Frame Packing Arrangement SEI message (the latter in 14496-10/5e Amd.1, to be finalized in future) are responsible for supporting AVC stereo coding. We have upgraded reference encoder JM15.1 [4] with the support of Stereo_SEI message. Frame packing arrangement SEI message is not yet finalized by the standard, so implementation is not done, though utilities of all frame-packing arrangements i.e. combining two views and extracting two views are implemented as stand-alone cases. Standard says the stereo video information SEI message syntax is: stereo_video_info( payloadSize ) { field_views_flag if( field_views_flag ) top_field_is_left_view_flag else { current_frame_is_left_view_flag next_frame_is_second_view_flag } left_view_self_contained_flag right_view_self_contained_flag }

Payload 21 means stereo_video_info SEI. This SEI message provides the decoder with an indication that the entire coded video sequence consists of pairs of pictures forming stereo-view content. field_views_flag equal to 1 indicates that all pictures in coded video sequence are fields and all fields of a particular parity are considered a left view. top_field_is_left_view_flag equal to 1 indicates that the top fields in video sequence represent a left view. current_frame_is_left_view_flag equal to 1 indicates that the current picture is the left view of a stereo-view pair. next_frame_is_second_view_flag equal to 1 indicates that the current picture and the next picture in output order form a stereo-view pair, and the display time of the current picture should be delayed to coincide with the display time of the next picture in output order. left_view_self_contained_flag equal to 1 indicates that no inter prediction operations for the left-view pictures of the coded video sequence refer to reference pictures that are right-view pictures. right_view_self_contained_flag equal to 1 indicates that no inter prediction operations for the right-view pictures of the coded video sequence refer to reference pictures that are left-view pictures.

4

Experimental Results

The simulations for the three different stereo video coding approaches, H.264/AVC Simulcast, H.264/AVC Stereo SEI Message and MVC, have been configured with respect to comparable and realistic simulation conditions. Each coding approach was evaluated with several representative stereo test data sets that cover different types and levels of scene content complexity and temporal variation. Some of the test data sets consist of a left and right view, each with a resolution of 480×270 pixels, 4-10 seconds length, and a frame rate of 25 fps. And the others are having spatial resolution 1024x768. Constant quantization parameters 24, 30, 36 and 42 are used for all cases. Intra period is 16 and GOP structure is in general IPBPBPB…. Decimation of views is done before frame-packing [figure 4]. In figures 6 and 7 view decimation is applied except for MVC, simulcast and stereo frame pairs. Lineinterleaved performs quite well here. The reason of better performance of line-interleaved case is

(a)

(b)

(c)

(d)

(e) (f) Figure 5: Horse: different frame-packing methods (a to f : side-by-side, up-down, line interleaved, sidehalf-flipped, up-down-half-flipped, checkerboard)

availability of compatible macroblock mode i.e. field mode. For other frame-packing mechanisms compatible macroblock coding tool is not there in the standard, e.g. there is no macroblock coding tool exploiting alternate columns as field mode does for exploiting alternate rows. The reason for worst performance of checkerboard case is the same. There is no compatible macroblock mode available in H.264 and existing frame macroblock mode affects macroblock prediction and eventually produces very high residue. Figures 6b shows frame-pair and simulcast very close because the used GOP (IBPB...) structure cannot exploit interview prediction in an efficient way. Figure 6c and 7b show RD performance for Ballet and Breakdancers but without B pictures respectively. Here, MVC also doesn’t have B picture and as expected stereo-frame pair (that can fully exploit interview prediction for IPPPP… structure) case is the best performer against simulcast. Please refer to figure 8 about the stereo packing formats.

5

Conclusion

This paper studies the stereo video representation format and their comparative compression efficiency. In any type of video coding, the same amount of raw input data leads to very different RD-performance. The experimental results showed that the required bitrate for achieving acceptable quality mainly depends on the complexity of the sequence content. The coding gain from inter-view prediction (Stereo SEI and MVC) varies largely, leading to a significant

reduction of bitrate (up to 35% in our experiments) for some sequences, but to negligible gains for others. In general, line interleaved frame-packing comes out to be better in terms of compression efficiency because of appropriate field-macroblock coding support from the encoder. Checkerboard pattern’s performance is comparatively inferior. Other packing methodologies like side-by-side, up-down, halfflipped, etc are more or less equivalent in terms of performance. MVC is always better than the stateof-the-art simulcast. Sometimes performance of AVC with stereo SEI (with frame-packing) goes quite close to MVC, becoming a very promising first step towards the adoption of 3D TV at home, keeping unchanged existing infrastructure.

Figure 7a: Breakdancers (1024x768)

Figure 7b: RD-performance of Breakdancers without B picture Line interleaved Simulcast Frame-pair Side by side Up down

Figure 6a: Ballet (1024x768)

Checkerboard MVC Side half flipped Down half flipped

Figure 8: Stereo packing formats

6

References

[1] ITU-T Recommendation H.264, “Advanced

[2] Figure 6b: RD-performance of Ballet with B picture

[3] [4] [5]

[6] [7] Figure 6c: RD-performance of Ballet without B picture

video coding for generic audiovisual services”, November 2007. Annex H, Multiview Video Coding ITU-T Recommendation H.264, “Advanced video coding for generic audiovisual services”, March 2009. Walt Husak, “Issues in Broadcast Delivery of 3D”, DOLBY. H.264/AVC reference software, JVT15.1. Annex D, Supplemental Enhancement Information ITU-T Recommendation H.264, “Advanced video coding for generic audiovisual services”, November 2007. http://www.ddd.com/files/Hyundai-P240W.pdf http://www.media.mit.edu/spi/SPIPapers/sab/u pol-3D.pdf

Lihat lebih banyak...

H. 264/AVC stereo video compression benchmarking

Descripción

Comentarios