An Efficient Fast Intra Mode Decision Method Based on Orthogonal Modes Elimination Hossein Pejman, Student Member, IEEE, Farzad Zargari, Senior Member, IEEE Abstract — One of the computationally intensive stages in the H.264/AVC encoder is intra prediction. This is because ratedistortion optimization (RDO) is performed for the entire possible modes to select the optimum intra prediction mode. Fast intra mode decision methods have been introduced to reduce the number of tested modes by RDO to a few prediction modes. In this paper, a new fast mode decision method is introduced for H.264/AVC encoder. The proposed method is based on the idea that when one of the prediction modes achieves good RDO, its orthogonal prediction mode will not perform well. As a result, we select by simple measures only one of the orthogonal modes and perform RDO only for the selected modes. Simulation results indicate that the proposed method achieves higher peak signal-to-noise ratio (PSNR) and lower bit-rate when compared with other proposed fast mode decision methods with hardware realization. Moreover, we have proposed a three-stage pipelined architecture for our fast mode decision method, which can operate at 175 MHz maximum clock rate. Synthesis results indicate that the three-stage pipelined architecture achieves lower gate count and higher maximum clock rate when compared with hardware realizations for other fast mode decision methods and can be used to 1 encode real-time videos of H.264/AVC standard up to level 5.1 .
Index Terms — Intra prediction, fast mode decision, ratedistortion optimization, orthogonal modes, H.264/AVC.
I. INTRODUCTION The H.264/AVC standard [1], which is developed by Joint video team (JVT), achieves better compression performance than all the previous H.26x and MPEG families [2], [3]. This standard provides high compression performance by employing advanced coding techniques such as intra-spatial prediction, variable block size motion estimation, and multiple reference frames [3]. The H.264/AVC standard uses intra prediction to reduce the spatial redundancies among adjacent blocks. There are several intra prediction modes to encode an I-block. Rate-distortion optimization (RDO) technique is employed in the H.264/AVC standard to select the optimum prediction mode for a block [4]. Nevertheless, selecting the optimum prediction mode by applying RDO to all possible modes imposes high computational load to the encoder. There are two approaches to reduce the high computation load in the intra prediction. The first approach includes methods to speed up the full modes search. A simple method to estimate bit-rate (BR) and distortion without employing complicated RDO technique is used by Tseng et al. [5] and Sarwer and Wu [6]. Parallel 1
Hossein Pejman is with the department of computer engineering, Science and Research Branch, Islamic Azad University, Tehran, Iran (e-mail:
[email protected]). Farzad Zargari is with the department of information technology of research institute for ICT, formerly known as Iran Telecom Research Center (ITRC), Tehran, Iran (e-mail:
[email protected]).
processing units are employed to generate intra prediction modes of each block simultaneously [7], [8]. The encoding order is changed to reduce the stalls in the pipelined architecture for intra prediction [9]. The similarity of adjacent pixels is employed to reduce the computational load in the intra prediction [10]. The second approach, namely fast mode decision, employs simple methods to reduce the number of possible intra prediction modes for a block and applies RDO to a few modes. Lin et al. [11] employed DCT to reduce the number of intra frame prediction modes. Pan et al. [12] used Sobel operator to produce edge direction histogram and this histogram is utilized to decrease the possible prediction modes. Tsai et al. [13] and Bharanitharan et al. [14] predicted the high probable modes by employing intensity gradient filter and discrete cross difference (DCD) algorithms, respectively. Zeng et al. [15] proposed a hierarchical method based on a threshold derived from quantization parameter (QP), to estimate the smoothness of a macroblock (MB), and decide between 16×16 and 4×4 intra prediction mode groups. Quan and Ho [16] employed the variance of 4×4 and 16×16 blocks to select candidate modes, but their algorithm requires complex computations and is not suitable for hardware implementation. Kwon et al. [17] proposed a fast mode decision algorithm based on video characteristics and region of interest (ROI). This algorithm is employed only for 4×4 blocks. Wang et al. [18] proposed dominant edge strength (DES) algorithm to reduce prediction modes using MPEG-7 feature descriptors. In this algorithm, three directional modes along with DC mode are selected for RDO. One of the difficulties in the hardware realization of DES is multiplication by 2 , which is used in MPEG-7 feature descriptors. Hung et al. [19] modified DES algorithm by employing the most probable mode (MPM) introduced in the H.264/AVC Intra coding. Tsai et al. [20] proposed pixel-based direction detection (PDD) and subblock-based direction detection (SDD) algorithms to decide about the candidate modes in the I-block coding. An extension of PDD algorithm is proposed by Miao and Fan [21]. Both PDD and its improved version suffer from high number of difference vectors, which increases the computational load of the algorithms. SDD subsamples 2×2, 4×4, and 8×8 blocks to produce 2×2 blocks and in this way it reduces the computational load in PDD, whereas subsampling of 8×8 blocks including 64 pixel values increases the required hardware resources in the proposed architecture for SDD. In this paper, we propose a fast mode decision method and its hardware realization. Our method is based on the idea that only one of the orthogonal prediction modes is selected for RDO. Moreover, we reuse the computation results in choosing the candidate modes of 4×4 block to select the candidate
16×16 luma mode. In this way, we achieve further reduction in the computational load both in software and hardware realization of our method. Simulation results indicate that our method has 60% lower encoding time for I-frames when compared with the H.264/AVC reference software. The proposed method achieves better peak signal-to-noise ratio (PSNR) and BR when compared with the SDD, DES, and modified DES. Moreover, we have proposed a three-stage pipelined architecture for our method, which achieves 175 MHz maximum clock rate. Synthesis results indicate that the hardware realization of our method achieves lower gate count and power consumption and higher clock rate when compared with the hardware realization of DES and SDD. The rest of the paper is organized as follows. In Section II, we discuss the proposed fast mode decision method. The hardware architecture for our method is presented in Section III. Simulation and synthesis results are given in Section IV followed by concluding remarks in Section V.
our method, we select the prediction modes independently while taking into account to select only one of the orthogonal modes. Another feature of our method is that the selection of the candidate mode for 16×16 luma is based on the same computations as that performed for selection of the candidate modes for 4×4 blocks. In this way, we achieve lower computational load in software and hardware realization of our method. We generate 24 difference vectors to select candidate modes among 4×4 blocks (Fig. 2).
II. PROPOSED FAST INTRA MODE DECISION METHOD The H.264/AVC standard performs intra prediction in 4×4 and 16×16 blocks for luma component and 8×8 blocks for chroma components. There are nine intra prediction modes for 4×4 blocks (Fig. 1) and four prediction modes for 16×16 blocks. Both chroma components use the same prediction mode and the prediction modes for 8×8 chroma blocks are similar to the 16×16 luma blocks. Hence, there are 13 possible prediction modes for luma blocks and four prediction modes for chroma blocks. Mode 0 1 2 3 4 5 6 7 8
Angle 90° 0° °
45 135° 112.5° 157.5° 67.5° 22.5°
Fig. 1. Prediction modes for intra 4×4 blocks
Name Vertical Horizontal DC Diagonal down left Diagonal down right Vertical right Horizontal down Vertical left Horizontal up
Fast mode decision algorithms reduce the number of possible prediction modes and apply RDO to the reduced number of modes. A number of fast mode decision algorithms compare all the possible 4×4 modes and select the best prediction mode along with its adjacent modes for RDO [11], [12], [14], [18]. Some of the other fast mode decision algorithms compare the possible 4×4 modes and select a few modes independently for RDO [13], [20]. In the first approach, the highest correlated modes with the best prediction mode are selected, but it has the negative impact that if there is any fault in the selection of the first mode, the other selected modes will be affected as well. In the second approach, the correlation between the selected modes is ignored completely and as a result the orthogonal modes, which have minimum correlation, may be among the selected modes. In this paper, we propose a fast mode decision method to remove the drawbacks associated with these approaches. In
Fig. 2. Difference vectors that are used for each directional 4×4 luma mode
Each group of the difference vectors in Fig. 2 corresponds to one of the directional prediction modes in Fig. 1. The average sum of absolute difference vectors (ASADV) is generated for each mode as
n 4 x 5, x 2 n n ASA DV X d iMx 2 n 2 x 5 i 1
(1)
where Mx is mode number x and d iMx is the ith difference vector associated to mode x. We categorize eight directional modes into four groups (Fig. 3). Each group consists of a directional mode and its orthogonal mode: G1= { M0 V , M1 H } G2= { M3 DDL , M4 DDR } (2) G3= { M5 VR , M8 HU } G4= { M6 HD , M7 VL }
3
3
7
7
0
0
5
5
8
8
1
1
6
6
4
3
7
0
5
4
8
8
1
1
6
6
4
3
7
0
5
Fig. 3. Orthogonal directions in intra 4×4 prediction modes
4
Only one of the modes in each group will be selected based on the minimum ASADV test, which results in four candidate modes from four groups. These four modes along with MPM mode make the selected modes for RDO. Therefore, there will be four candidate modes for RDO test if MPM is among the modes selected by ASADV and five candidate modes otherwise. To increase the performance of the proposed algorithm, we employed a threshold value TGi for each group Gi as
TGi
2 ASADV mx ASADV my ASADV mx ASADV my
diagonal down left (DDL) modes resulting from sixteen 4×4 blocks in the MB. Algorithm I: Fast intra prediction mode decision for16×16 luma block 1: for each of 4×4 block in macroblock //16 iterations 2: //Calculate frequency of selection of H, V and DDL modes 3: if ASADVV ≤ ASADVH and ASADVV ≤ ASADVDDL then 4: V ← V+1 5: else if ASADVH ≤ ASADVV and ASADVH ≤ ASADVDDL then 6: H ← H +1 7: else 8: DDL ← DDL +1 9: end if 10: end for 11: //Select candidate intra 16×16 luma mode 12: if V ≥ H and V ≥ DDL then 13: Selected mode ← Vertical 14: else if H ≥ V and H ≥ DDL then 15: Selected mode ← Horizontal 16: else 17: Selected mode ← Plane 18: end if
(3)
where mx and my are the orthogonal modes in group Gi. TGi