Compressing the Laplacian pyramid

Share Embed


Descripción

Compressing the Laplacian Pyramid Gagan Rath and Christine Guillemot IRISA-INRIA, Campus de Beaulieu, 35042 Rennes, France

Abstract— The Laplacian pyramid (LP) is one of the earliest examples of multiscale representation of visual data. It is well known that an LP is overcomplete or redundant by construction, and has lower compression efficiency compared to critical representations such as wavelets and subband coding. In this paper, we propose to improve the rate-distortion (R-D) performance of the LP through critical representation. We consider an LP with biorthogonal decimation and interpolation filters, and show that the detail signals lie in lower-dimensional subspaces. This allows them to be represented using fewer coefficients than the original spatial representations. We derive orthogonal bases for these subspaces and represent the detail signals in terms of their projections onto these bases. Simulation results suggest that higher compression ratios can be achieved with the critical representation than with the standard LP with usual or dual frame based reconstructions.

I. I NTRODUCTION Multiscale or multiresolution representation of visual signals such as image and video is an essential feature in multimedia communications. The JVT/MPEG working group is in the process of developing a scalable video compression (SVC) standard which provides video delivery at various spatial, temporal, and granularity resolutions [1]. The basic building block for providing the spatial scalability in the SVC standard is the well-known Laplacian pyramid (LP) [2]. The primary reason for using the LP instead of wavelets is the visual quality requirement at coarser spatial resolutions. The requirement for alias-free reconstructions at coarse resolutions necessitates the use of special anti-aliasing decimation filters instead of the wavelets filters. An LP achieves the multiscale representation of a signal as a coarse signal at lower resolution together with several detail signals at successive higher resolutions. First a coarse approximation of the original signal is derived by low-pass filtering and downsampling. Then a detail signal is derived by predicting the original signal with the interpolated coarse signal. The filtering-downsampling and the interpolation-prediction operations are iterated on the coarse signal to derive the detail signals at successive lower resolutions and the coarse signal at the final lowest resolution. Given an LP representation, the original signal can be reconstructed simply by iteratively interpolating the coarse signal and adding the detail signals successively up to the final resolution. Since the number of coefficients of the LP is greater than the number of samples of the original signal, an LP representation is overcomplete or redundant. Being overcomplete, the LP supports multiple reconstruction methods. In [3], Do and Vetterli consider the LP as a frame expansion and propose a dual frame based structure for the reconstruction. They show that the proposed reconstruction has lesser error than the usual reconstruction method when the LP coefficients are corrupted with noise. Since the proposed structure in [3] requires biorthogonal filters, the authors in [4] modify the LP by including an update step so that the reconstruction structure is valid for any pair of decimation and interpolation filters. Nevertheless, the redundancy of the LP is an undesirable feature from compression point of view. In [3], Do and Vetterli improve the rate-distortion (R-D) performance by utilizing the dual frame based reconstruction structure, but the original LP, and consequently its redundancy, remain

c1

x h p1

h p2

g

c2

g

d2

d1

Fig. 1.

Laplacian pyramid decomposition.

intact. The lifted pyramid proposed in [4] modifies the coarse signal (which is undesirable in the context of scalable video compression), but the new pyramid is still overcomplete. In this paper, we address the question as to how best we can compress an LP. We propose to improve the R-D performance by compressing the LP to a critical representation. We still assume that the decimation and the interpolation filters are biorthogonal even though this is not a requirement for the critical representation of LP. Under this assumption, a closer look at LP reveals that the detail signals lie in lower dimensional subspaces. These subspaces are orthogonal complements of the row spaces of the corresponding decimation filter matrices. We represent the detail signals using orthogonal basis vectors in those subspaces derived through singular value decomposition (SVD) [5] and QR factorization [6]. Subsequently we also demonstrate that the detail signals can be reconstructed from their projections onto the orthogonal complements of the column spaces of the interpolation filter matrices. We derive two orthogonal bases for these complement subspaces through similar decompositions and use them for representing the projections of the detail signals. The development of these methods reveals interesting properties of the LP with biorthogonal filters and indicates the limits of compression of the LP. II. L APLACIAN P YRAMID The LP structure proposed by Burt and Adelson [2] is shown in Fig. 1. For convenience of notation, we will first consider the 1-D signals. The input signal is first lowpass filtered using the  decimation filter and then downsampled producing the coarse signal  . This coarse signal is upsampled and then filtered using the interpolating filter  producing the prediction signal   . The prediction error   is the first level of detail signal. The process is repeated on the coarse signal  until the final resolution is reached. Note that the subscript in Fig. 1 denotes the index of the pyramid level. Here we have used vector notations in order to facilitate matrix operations. By convention 1-D signals are assumed to be column vectors. For convenience of explanation, let us consider an LP with only one level of decomposition. Considering an input signal of samples, the coarse and the detail signals can be derived as





and

      

(1)

where   denotes the identity matrix of order , and and  denote the decimation and the interpolation matrices which have the

d1

following structures:

!

45

! .. ! . ! #$#%#'& )(* & ,+- & / .0 " #-#-##-#-#-##-#-#1#-#-##-#-#2& ) (*



5 5

#$#-#$#-#$#-#%#-#$#-#-#$#-#$#-#  & , + & /.3 #$#-# 6

45 : 5 5 5

! 

##1####*#####1#### 7 ,+- 7 )(0 #$#-# 6 ..



 B C

A

   DB #

Fig. 2.

.

Fig. 3.

(4)

(5)

A      W X

 /  S[CE\ #  B

(6)

The corresponding reconstruction structure is shown in Fig. 3. The above two methods lead to identical results when there is no noise in the LP coefficients. For a two-dimensional signal such as an image, the LP can be derived by filtering both along the columns and the rows. For one level of decomposition, the respective coarse and the detail signals can be expressed using matrix operators as

]

_^S

:

and

`> a^

 ]  : 

h

h

g

g

c

2

(3)

The frame-based reconstruction proposed by Do and Vetterli [3] aims to reconstruct the original signal using a dual frame operator. It can be shown that, if: the decimation and the interpolation filters : : are orthogonal, i.e.,   R S

TU ,  R , the dual frame operator corresponding to the frame operator in Eqn. 4 is V  Q XW [3]. If the filters are biorthogonal, i.e.,  Y U , the above reconstruction operator is still an inverse operator (i.e., it is a left-inverse of the analysis operator in Eqn. 4) even though it is not the dual frame operator [3]. Therefore, with either orthogonal or biorthogonal filters, the original signal can be reconstructed as

[ZN V 

d2

x

#

2

Standard reconstruction structure for LP.

d1

The matrix operator on the right hand side is called the frame operator or the analysis operator associated with the LP [3], [7]. The above representation gives rise to FEG< coefficients of LP with a redundancy of
Lihat lebih banyak...

Comentarios

Copyright © 2017 DATOSPDF Inc.