Combinatorial transforms: Applications in lossless image compression

May 25, 2017 | Autor: Julien Dubois | Categoría: Pure Mathematics
Share Embed


Descripción

See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/225303602

Combinatorial Transforms : Application in Lossless Image Compression Article in Journal of Discrete Mathematical Sciences and Cryptography · April 2011 DOI: 10.1080/09720529.2011.10698328

CITATIONS

READS

2

115

3 authors, including: Elfitrin Syahrul

Julien Dubois

University of Burgundy

University of Burgundy

4 PUBLICATIONS 7 CITATIONS

82 PUBLICATIONS 256 CITATIONS

SEE PROFILE

SEE PROFILE

All content following this page was uploaded by Julien Dubois on 16 January 2017. The user has requested enhancement of the downloaded file. All in-text references underlined in blue are added to the original document and are linked to publications on ResearchGate, letting you access and read them immediately.

Combinatorial Transforms : Applications in Lossless Image Compression E. Syahrul1 , J. Dubois2 , V. Vajnovszki3 LE2I - Universit´e de Bourgogne B.P. 47 870, 21078 Dijon Cedex - France {elfitrin.syahrul1 , julien.dubois2 , vvajnov3 }@u-bourgogne.fr

Abstract Common image compression standards are usually based on frequency transform such as Discrete Cosine Transform. We present a different approach for lossless image compression, which is based on a combinatorial transform. The main transform is Burrows Wheeler Transform (BWT) which tends to reorder symbols according to their following context. It becomes one of promising compression approach based on context modeling. BWT was initially applied for text compression software such as BZIP2; nevertheless it has been recently applied to the image compression field. Compression schemes based on the Burrows Wheeler Transform have been usually lossless; therefore we implement this algorithm in medical imaging in order to reconstruct every bit. Many variants of the three stages which form the original compression scheme can be found in the literature. We propose an analysis of the latest methods and the impact of their association and present an alternative compression scheme with a significant improvement over the current standards such as JPEG and JPEG2000.

Keywords: BWT, Lossless (image) compression, combinatorial transform.

1

Introduction

BWT was created by Burrows and Wheeler in 1994. Since then its improvements have been considered by many authors, as well as its application not only for text compression but also for image compression [1–6]. This paper 1

presents the improvement and analysis of BWT methods particularly implemented for image compression. We analyze the effects of each stage of BWT pre- and post-processing. The results are compared with JPEG and JPEG2000.

2

Original method based on BWT

A typical Burrows Wheeler Transform method that has been proposed by Burrows and Wheeler for lossless text compression consists of 3 stages as seen in Figure 1 [7], where • BWT is the Burrows Wheeler Transform itself that tends to group similar characters together, • GST is the Global Structure Transform that transforms the output of BWT local structure redundancy to a global redundancy using a ranking list. It produces sequences of contiguous zeros, • DC is a Data Coding. Input data

- BW T

-

GST

-

DC

- Output ...data

Figure 1: Original scheme of BWT method. The stages are processed sequentially from left to right. The output of a stage becomes the input of the next stage. BWT as a main transform rearranges the input data using a sorting algorithm. The output content is exactly the same with the input, except the ordering. Figure 2 gives a simple example how BWT works in a small image. Pixels are encoded by a pair of hexadecimal values. The image is generated to assume that scanning the data left to right where there are not two identical consecutive symbols. BWT as a context based transform tends to group similar pixels together as seen in Figure 2(b). This transform does not reduce the data size; by contrast it adds a few bytes information as a primary index to decode the data. The detailed work of BWT and other related transforms will be explained below. BWT Figure 3(a) and (b) show how the forward BWT works. In this example, we consider the first 16 bytes of the image represented by the array in Figure 2(a). BWT makes all possible cyclic rotations of the input data as seen in Figure 3(a). Then, it sorts the rotations [8]. The obtained matrix is 2

ff

0f

dc

07

c8

0f

dc

ff

0f

ff

0f

07

05

0f

dc

07

dc

05

c8

0f

dc

ff

0f

dc

07

dc

07

ff

0f

07

dc

07

dc

05

c8

0f

dc

ff

0f

dc

07

dc

07

ff

0f

07

dc

07

dc

05

c8

0f

dc

ff

0f

dc

ff

0f

dc

07

c8

0f

dc

ff



07 dc ff ff 05 0f 0f 07

dc dc ff c8 05 0f 0f ff

dc 0f ff c8 07 07 0f dc

dc 0f ff c8 07 07 0f dc

dc 00 09 02 02 00 03

00 01 01 00 01 01 05

02 01 00 02 00 01 00

10 00 02 05 01 00 05

01 03 ca 00 01 06

dc dc 05 c8 07 0f 0f dc

dc dc ff ff 07 07 0f dc

dc dc ff 05 0f 07 07 dc

(b) The output of BWT



(a) Image 8x8 07 01 04 01 00 01 01

0f dc ff c8 07 0f 0f dc

00 ff 00 04 00 01

04 00 04 05 01 00

07 00 ff 00 00 00 01 00



.

(d) The output of RLE0

dc 00 00 ca 00 00 00 03

00 01 00 00 05 01 00 05

00 00 00 00 00 00 00 00

10 01 00 00 00 01 00 00

01 00 09 00 00 00 00 00

00 00 01 01 00 01 00 00

00 00 00 02 05 00 01 00

(c) The output of MTF

Figure 2: Example how BWCA works in a small image 8 × 8. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Rotated BWT input ff 0fdc07c80fdc ff 0f ff 0f07050fdc07 0f dc07 c8 0f dc ff 0f ff 0f 0705 0f dc07 ff dc07 c8 0f dc ff 0f ff 0f 0705 0f dc07 ff 0f 07c8 0f dc ff 0f ff 0f 0705 0f dc 07 ff 0f dc c8 0f dc ff 0f ff 0f 0705 0f dc07 ff 0f dc 07 0f dc ff 0f ff 0f 0705 0f dc07 ff 0f dc07 c8 dc ff 0f ff 0f 0705 0f dc07 ff 0f dc07 c8 0f ff 0f ff 0f 0705 0f dc07 ff 0f dc 07 c8 0f dc 0f ff 0f 07 05 0f dc07 ff 0f dc07 c8 0f dc ff ff 0f 07 05 0f dc07 ff 0f dc07 c8 0f dc ff 0f 0f 0705 0f dc07 ff 0f dc07c8 0f dc ff 0f ff 0705 0f dc 07 ff 0f dc07c8 0f dc ff 0f ff 0f 05 0f dc 07 ff 0f dc07c8 0f dc ff 0f ff 0f 07 0f dc07 ff 0f dc07 c8 0f dc ff 0f ff 0f 07 05 dc07 ff 0f dc07 c8 0f dc ff 0f ff 0f 0705 0f 07 ff 0f dc 07c8 0f dc ff 0f ff 0f 0705 0f dc

13 12 4 16 11 2 14 6 9 5 3 15 7 10 1 8

(a) Rotated BWT input.

F L 05 0f dc 07 ff 0f dc07c8 0f dc ff 0f ff 0f 07 07 05 0f dc 07 ff 0f dc07c8 0f dc ff 0f ff 0f 07 c8 0f dc ff 0f ff 0f 0705 0f dc 07 ff 0f dc 07 ff 0f dc 07c8 0f dc ff 0f ff 0f 0705 0f dc 0f 0705 0f dc07 ff 0f dc07c8 0f dc ff 0f ff 0f dc07 c8 0f dc ff 0f ff 0f 0705 0f dc07 ff 0f dc07 ff 0f dc07 c8 0f dc ff 0f ff 0f 07 05 0f dc ff 0f ff 0f 0705 0f dc07 ff 0f dc07 c8 0f ff 0f 07 05 0f dc07 ff 0f dc07 c8 0f dc ff c8 0f dc ff 0f ff 0f 0705 0f dc07 ff 0f dc 07 dc 07 c8 0f dc ff 0f ff 0f 0705 0f dc07 ff 0f dc 07 ff 0f dc07 c8 0f dc ff 0f ff 0f 0705 0f dc ff 0f ff 0f 0705 0f dc07 ff 0f dc07 c8 0f ff 0f 07 05 0f dc07 ff 0f dc07 c8 0f dc ff 0f ff 0fdc07c80fdc ff 0f ff 0f07050fdc 07 ff 0f ff 0f 0705 0f dc07 ff 0f dc 07 c8 0f dc

(b) Sorted BWT rotations.

Figure 3: The forward of BWT.

3

← 15

Position 1 2 3 4 5 6 7 8 9 10 11 12 13 14 → 15 16

input 07 0f dc dc ff ff 05 c8 ff 07 0f 0f 0f 0f 07 dc

context 05 07 07 07 0f 0f 0f 0f 0f c8 dc dc dc ff ff ff

link 07 01 10 15 02 11 12 13 14 08 03 04 16 05 06 ← 09

Figure 4: The BWT inverse transform. represented in Figure 3(b). The position of original data is named primary index (equals to 15 in our example). This information is required for the reconstruction process. The last column (L) of this matrix and the primary index are the BWT output. As it can be seen in this example, BWT output tends to groups similar pixels together. The reverse BWT [9] is principally just another permutation of the original data. Figure 4 shows how this process works. The second column of this figure is the BWT output of the data in Figure 3. The third column (called here context) is obtained from the second one by sorting its elements in ascending order. The link in the fourth column refers to the position of the context in the input (the second column). For repeated symbols the ith occurrence of a context symbol corresponds to the ith occurrence of the same symbol in the data input. The process is started from position 15 (the primary index) where the first pixel value of the original data is placed. It refers to the context f f (as the first pixel of original data) and also refers to link 6, as a clue of next position to the second element of original data. So, the next position is 6 which refers to 0f and gives the next link to the next output. Therefore each step gives the permuted pixel value as output and will process the whole file, because of the cyclic rotations. GST Common GST that has been used by Burrows and Wheeler is Move-ToFront (MTF) from Bentley et al. [10]. As the second stage of original BWT chain, MTF does not shrink data but it can reduce redundancy. MTF uses an update list as an index of MTF input. The list consists of 256 symbols since the input of grey level image are 256 pixels for 8 bits per pixel. It 4

processes the input symbols sequentially. Every input of MTF is moved to the front of the list, so the input symbols that occur often are transformed into small indices. Runs of repeated symbols are transformed into zeros. Some authors proposed other GST transforms such as Move One From Front (M1FF), Move One From Front Two (M1FF2), Frequency Count (FC), etc. These transforms will be discussed in Section 6. Data Coding A data coding essentially based on Entropy Coding. Burrows and Wheeler propose to use Run Length Encoding Zeros (RLE0) after MTF since it produces a lot of zeros, and then Huffman Coding or Arithmetic Coding to increase data compression [7]. Some approaches use Arithmetic Coding, which offers the best compression rates [11]. Further, Arithmetic Coding translates the entire data into numbers represented in certain base rather than translating each data symbol into a series of digit in certain base. Therefore, AC approach is often more optimal than Huffman Coding. Since the MTF output tends to produce a lot of small pixels value, Fenwick proposes an adaptive Arithmetic Coding that follows the skew distribution of MTF output [9]. More details on this process will be discussed in Section 7.

3

Corpus

The BWT compression method has been applied for lossless text compression and do not take into account the data’s nature. Nevertheless, it can be applied successfully to image, as we will discuss in the next section. The lossless approach is obviously appropriate to medical image compression, which is expected to be lossless. Our experiments use 100 medical images selected from IRMA (Image Retrieval in Medical Applications) database [12] and Lukas Corpus [13]. The images are extracted randomly. IRMA database consists of primary and secondary digitized X-ray films in portable network graphics (PNG) and tagged image file format (TIFF), 8 bits per pixel (8 bpp), examples of images are shown in Figure 5. The image sizes are between 101 KB and 4684 KB. And for the Lukas Corpus, we are using two dimensional 8 bit radiographs in TIFF format. Lehmann et al. [4] and Syahrul et al. [6] present the implementation of the BWT method in medical images which give an interesting result. They use medical images from IRMA database. These papers have not considered BWT preprocessing nor other variants of GST. Below, we consider different 5

Figure 5: Example of tested images. From left to right: hand; head; pelvis; chest frontal and chest lateral. ways to scan the images as a BWT pre-processing step.

4

Linearization Scheme

BWT method is used to compress two-dimensional images, but the input of BWT is a one dimensional sequence. Thus, the image has to be converted from two dimensional image into one dimensional sequence. This conversion is referred to as linearization or scan path. Some codings, as Arithmetic Coding or BWT itself, depend on the relative order of gray scale values, they are therefore sensitive to the linearization method used.

(a) Left

(b) Left-Right

(c) Up-Down

(d) Spiral

(e) Zigzag

Figure 6: Linearization methods. Some of the popular linearization schemes are given in Figure 6. We have tested 8 such different methods; namely: scanning image from left to right (L), left to right then right to left (LR), up to down then down to up (UD), zigzag (ZZ), spiral, divide image into small blocks 8 × 8, small blocks 8 × 8 covering the image in zigzag, and small blocks 3 × 3 covering the image in zigzag. Our preliminary test uses BWT original method (Figure 1) and also adds different image scan methods to see this pre-processing impact in compression performance. We use MTF as GST stage, simple Run Length Encoding zero (RLE0) and Arithmetic Coding (AC) for EC stage. Table 1 shows the results of these tests for 8 different image scans. It shows that scan path 6

Table 1: Image compression ratios for different type of scan path. Image Hand1 Hand2 Hand3 Hand4 Head1 Head2 Head3 Head4 Pelvis1 Pelvis2 AV. 10 Av. 100

L 2.372 2.260 2.114 2.685 2.219 2.481 2.566 2.726 1.808 1.850 2.308 2.516

LR 2.366 2.251 2.123 2.679 2.216 2.480 2.565 2.721 1.808 1.848 2.306 2.515

UD 2.547 2.390 2.253 2.830 2.274 2.527 2.527 2.721 1.842 1.890 2.380 2.577

Spiral 2.536 2.382 2.221 2.802 2.273 2.538 2.563 2.764 1.835 1.876 2.379 2.575

ZZ 2.257 2.091 1.991 2.551 2.155 2.366 2.350 2.544 1.760 1.806 2.187 2.357

8x8 2.204 2.052 1.933 2.411 2.108 2.337 2.303 2.502 1.750 1.791 2.139 2.315

8x8ZZ 2.262 2.073 1.994 2.510 2.174 2.392 2.349 2.542 1.782 1.829 2.190 2.364

3x3zz 2.336 2.138 2.049 2.557 2.220 2.466 2.422 2.622 1.814 1.863 2.249 2.442

Jpeg 2.249 2.205 2.136 2.189 1.992 2.210 2.363 2.399 1.725 1.797 2.109 2.280

J2K 2.994 2.769 2.733 2.909 2.554 2.938 2.932 2.548 2.038 2.105 2.665 2.924

influences BWT methods performance. Image scan vertically (up-down) is slightly better than spiral method. Over the 100 images tested, 49 of them give the better result using up-down scan and 47 using spiral scan. The worst compression ratios are obtained using scan image in a block 8 × 8 (85 images), and zigzag mode (14 images). This preprocessing stage can increase the compression ratios around 4% than conventional scanning. In other words, the neighborhood of pixels influences the permutation yields by the BWT. This result also shows that BWT method is better than JPEG but inferior than JPEG2000. For all images tested, 91 give better CR than JPEG, and among them 10 are better than JPEG2000. This preliminary result shows that scanning the image up-down and in spiral improves compression ratios. Therefore our future tests will use spiral or up-down stage.

5

BWT and its improvement

BWT is based on a sorting algorithm. There are several methods to improve the performance of sorting process, but they do not influence BWT results. Burrows and Wheeler suggested suffix tree to improve sorting process [7]. Other authors suggest suffix array or their own sorting algorithm [14–16]. Figure 7 shows the relationship between BWT and suffix array for the same input given in Figure 3, and we consider the input data is Im, then n is the length of Im, and so in this example n = 16. The first column of Figure 7 consists of the given suffix. The third column is the sorted suffixes in ascending order and the fourth column is their suffix array (SA). We can see the correlations between sorted rotations (the usual BWT in Figure 3(b)) and sorted suffixes in the third column of Figure 7. These matrixes are similar. The second column in Figure 3(b) is the same with the third column of Figure 7 if the added symbols to create the rotations 7

Suffixes ff0fdc07c80fdcff0fff0f07050fdc07 0fdc07c80fdcff0fff0f07050fdc07 dc07c80fdcff0fff0f07050fdc07 07c80fdcff0fff0f07050fdc07 c80fdcff0fff0f07050fdc07 0fdcff0fff0f07050fdc07 dcff0fff0f07050fdc07 ff0fff0f07050fdc07 0fff0f07050fdc07 ff0f07050fdc07 0f07050fdc07 07050fdc07 050fdc07 0fdc07 dc07 07

ID 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Sorted suffixes 05 0f dc07 0705 0f dc07 07c8 0f dc ff 0f ff 0f 0705 0f dc07 07 0f 0705 0f dc07 0f dc07c8 0f dc ff 0f ff 0f 0705 0f dc07 0f dc07 0f dc ff 0f ff 0f 0705 0f dc07 0f ff 0f 0705 0f dc07 c8 0f dc ff 0f ff 0f 0705 0f dc07 dc07c8 0f dc ff 0f ff 0f 0705 0f dc07 dc07 dc ff 0f ff 0f 0705 0f dc07 ff 0f 0705 0f dc07 ff 0f dc07c8 0f dc ff 0f ff 0f 0705 0f dc07 ff 0f ff 0f 0705 0f dc07

SA 13 12 4 16 11 2 14 6 9 5 3 15 7 10 1 8

Figure 7: Relationship between BWT and suffix arrays (SA). matrix of Figure 3(a) are ignored. For example the first line of Figure 3(b) was placed in line 13 of the rotated original data, so the rotations matrix for this line begins with the 13th symbol till the 16th symbol then followed by the 1st till the 12th symbols to complete the sorted rotations matrix. If the added symbols are omitted, this first line is equal to the first line of sorted suffixes (first line in the third column of Figure 7). The BWT output can be computed using suffix array (SA) as: ½ Im[SA[i] − 1], if SA[i] 6= 1 (1) L(i) = Im[n], otherwise. Hence, it does not need to create the sorted rotations matrix to obtain BWT output (Figure 3(a) and (b)). Nevertheless, finding suffix sorting algorithms that run in linear worst case time is still an open problem. Figure 8 shows a few different methods for BWT computation [17]. We use the BWT of Yuta Mori that is based on SA-IS algorithm to construct the BWT output [18]. It reduces processing time and decrease memory requirements.

6

GST and its modifications

The most common GST algorithm is MTF. The interest of MTF is twofold: runs of similar symbols are changed into runs of zeros, and the grey level image distribution is modified in the sense that the probability of lower input values increase (and so the probability of the higher values decreases). Therefore the Entropy Coding is more efficient on the output of MTF. 8

complexity time Worst-case Avg-case O(n) O(n) O(n) O(n) O(n) O(n)

Method Ukkonen’s suffix tree construction McCreight’s suffix tree construction Kurtz-Balkenhol’s suffix tree construction Farach’s suffix tree construction Manber-Myers’s suffix array construction Sadakane’s suffix array construction Larson-Sadakane’s suffix array construction Itoh-Tanaka’s suffix array construction Nong’s SA-IS Burrows-Wheeler’s sorting Bentley-Sedgewick’s sorting Sedward’s sorting

space Avg-case 10n

O(nlogn) O(nlogn)

O(nlogn) O(nlogn)

8n

O(nlogn) O(nlogn)

O(nlogn) O(nlogn)

9n 8n

> O(nlogn) O(nlogn) O(n2 logn) O(n2 ) O(n2 logn)

O(nlogn) O(nlogn) O(nlogn) -

5n 6n 5n + stack -

Figure 8: Different sorting algorithms used for BWT. MTF maintains the list of all possible symbols (MTF list), which order is modified during the process. The list can be considered as a stack and is used to obtain the output. Each input value is coded with its rank in the list. This stack is then updated: the input value is pushed at the top of the list. Therefore, the rank of this input symbol becomes zero. Consequently, a run of N identical symbols is then coded with the symbol followed by N − 1 zeros. Most references show the efficiency of MTF, especially for text compression. Table 2 and Table 3 in the 2nd and the 3rd column give the results of BWT method for image compression without and with MTF. Here we can see that MTF part is significant. The compression ratios decrease for 6 images. Nevertheless, using scan image up-down, the average of compression ratios increase of approximately 3% for 10 images presented and 4% for the total database. And for scan image spiral, the compression ratio increases around 2% for 10 images and also 4% for total image database. Table 2: The comparative results for MTF and its variants with up-down scan image. Image Hand1 Hand2 Hand3 Hand4 Head1 Head2 Head3 Head4 Pelvis1 Pelvis2 Av.10 Av.100

no-MTF 2.616 2.196 1.669 2.589 2.251 2.561 2.554 2.751 1.978 2.026 2.319 2.475

MTF 2.547 2.390 2.253 2.830 2.274 2.527 2.527 2.721 1.842 1.890 2.380 2.577

M1FF 2.541 2.387 2.247 2.825 2.263 2.518 2.52 2.714 1.835 1.881 2.373 2.570

M1FF2 2.538 2.387 2.246 2.824 2.262 2.518 2.519 2.714 1.835 1.880 2.372 2.570

Ts(0) 2.624 2.449 2.319 2.920 2.329 2.590 2.606 2.801 1.888 1.936 2.446 2.654

Bx3 2.649 2.466 2.345 2.939 2.349 2.613 2.632 2.824 1.905 1.952 2.467 2.679

9

Bx5 2.666 2.476 2.363 2.946 2.364 2.632 2.650 2.841 1.919 1.966 2.482 2.696

Bx6 2.670 2.477 2.368 2.945 2.367 2.637 2.654 2.843 1.922 1.969 2.485 2.699

FC 2.663 2.470 2.375 2.938 2.357 2.631 2.646 2.832 1.911 1.961 2.478 2.694

WFC 2.659 2.484 2.352 2.979 2.348 2.611 2.645 2.857 1.898 1.951 2.478 2.694

AWFC 2.706 2.513 2.410 3.000 2.400 2.672 2.695 2.902 1.908 1.950 2.516 2.724

IFC 2.656 2.475 2.347 2.954 2.359 2.614 2.635 2.839 1.909 1.960 2.475 2.687

Table 3: The comparative results for MTF and its variants with spiral scan image. Image Hand1 Hand2 Hand3 Hand4 Head1 Head2 Head3 Head4 Pelvis1 Pelvis2 Av.10 Av.100

no-MTF 2.616 2.193 1.657 2.587 2.251 2.576 2.578 2.784 1.974 2.018 2.323 2.477

MTF 2.536 2.382 2.221 2.802 2.273 2.538 2.563 2.764 1.835 1.876 2.379 2.575

M1FF 2.530 2.380 2.214 2.800 2.263 2.528 2.555 2.757 1.828 1.867 2.372 2.568

M1FF2 2.527 2.380 2.214 2.798 2.263 2.528 2.553 2.756 1.828 1.867 2.371 2.567

Ts(0) 2.614 2.453 2.284 2.898 2.329 2.602 2.642 2.848 1.882 1.922 2.447 2.651

Bx3 2.639 2.472 2.307 2.923 2.350 2.626 2.669 2.874 1.899 1.939 2.470 2.677

Bx5 2.657 2.486 2.326 2.937 2.365 2.646 2.689 2.894 1.913 1.954 2.487 2.694

Bx6 2.662 2.488 2.330 2.939 2.370 2.651 2.693 2.898 1.917 1.957 2.491 2.698

FC 2.655 2.479 2.338 2.926 2.358 2.645 2.685 2.885 1.905 1.948 2.482 2.692

WFC 2.648 2.481 2.319 2.954 2.347 2.623 2.684 2.905 1.892 1.937 2.479 2.691

AWFC 2.701 2.517 2.375 2.985 2.399 2.689 2.736 2.957 1.902 1.932 2.519 2.721

IFC 2.646 2.476 2.312 2.927 2.359 2.627 2.674 2.887 1.903 1.946 2.476 2.685

Some authors change the MTF with other Global Structure Transform. We will test and analyze the impact of some of these transforms. Balkenhol proposed the modification of MTF called Move One From Front (M1FF) [19]. This algorithm changes the work of the M1FF list. The input symbol from the second position in the M1FF list is moved to the first position; meanwhile the input from higher positions is moved to the second position. Furthermore, Balkenhol gives a modification of M1FF called Move One From Front Two (M1FF2). The symbol from the second position is moved to the first position of the M1FF2 list only when the previous transformed symbol was at the first position. Unfortunately, the results of our tests for these transforms decrease compression performance as seen in the column 4th and 5th of Table 2 and Table 3. Albers also presented other list based algorithms, called Time Stamp (TS) [20]. The deterministic version of this algorithm is TimeStamp (0) or TS (0). While MTF use 256 symbols in its list, this transform uses a double length list. So the list contains 512 symbols and each symbol occurs twice. When the input data is processed, the position of an item is one plus the number of double symbols in front of that input symbol. Then the list is updated by moving the second symbol to the front. This method is also called “Best 2 of 3” algorithm based on the Chapin’s algorithm called a “Best x of 2x − 1” algorithm [21], because TS (0) uses 2 × 256 symbols in the list. Therefore, the list of a “Best x of 2x−1” algorithm contains x×256 symbols. Here, we have tested a “Best x of 2x−1” algorithm for x = 3, 5, and 6 (called Bx3, Bx5 and Bx6, respectively for short) to see its impact in the BWT update methods (the original method with up-down or spiral image scan). The results of these transforms are shown in column 6th to 9th in the Table 2 and Table 3. The compression ratios using these transforms are better than those using MTF algorithm. The average compression performance increases till 9% for a “Bx6” algorithm using image scan up-down for the whole tested images. Other variant of GST is Frequency Counting (FC). This transform is 10

quite different with the previous GST. There is a computation to count the symbols ranking. It gives the highest rank to the symbol with the highest frequency. This transform is not very effective since it takes time to favoring symbols [8]. The Weighted Frequency Count (WFC) improves FC transform by defining a function based on symbol frequencies [17]. It also counts the distance of symbols within a sliding window. The most frequent symbol has a higher weight. This approach does not give better compression ratios than FC in our tests. It can be seen in the column 10th and 11th of Table 2 and Table 3. The improvement compression performance is obtained by a WFC modified called Advanced Weighted Frequency Count (AWFC). It counts a weight of symbol distribution. It is more complex than WFC, but it gives better compression ratios. These results can be seen in the column 12th of Table 2 and Table 3. Abel presents other GST method called Incremental Frequency Count (IFC) [11]. It is quite similar to WFC, but it is less complex and the CR is less than that obtained with WFC. The results of this transform can be seen in column 13th in the Table 2 and Table 3. Table 2 and Table 3 show that AWFC is slightly better than a “Bx6”. But AWFC is more complex than a “Bx6” transform. It takes more time to count the distance of symbol. Therefore, for the next stage, we will use two of these transforms to analyze the Entropy Coding stage and to improve BWT method performances.

7

EC and its modifications

Burrows and Wheeler original paper uses RLE0 as the first step of EC, since there are a lot of zeros after MTF [7]. The function of RLE is to support the probability estimation of the next stage. A long run of zeros tends to overestimate the global symbol probability [19]. Our previous best results use AWFC or a “Bx6” algorithm for GST stage, Up-down or spiral image scan method for BWT pre-processing and a simple RLE0 followed by Arithmetic Coding for the EC stage. This compression scheme is referred as method M1 in 2nd and 3rd column of Table 4. Lehmann et al. propose other RLE called Run Length Encoding two symbols (RLE2S) [4]. This coding separates the data stream and the runs so it does not interfere with the main data coding. It codes only two or more consecutive symbols into two symbols and the length of runs are placed in other data stream. This coding also decreases compression performance in our tests. The results of this second method (M2) are shown in the 6th and 7th column of Table 4. 11

Table 4: CR results for different EC methods. Image

Hand1 Hand2 Hand3 Hand4 Head1 Head2 Head3 Head4 Pelvis1 Pelvis2 Av.10 Av.100

M1 : UD-BWT-GST RLE0-AC Bx6 AWFC 2.670 2.706 2.477 2.513 2.368 2.410 2.945 3.000 2.367 2.400 2.637 2.672 2.654 2.695 2.843 2.902 1.922 1.908 1.969 1.950 2.485 2.516 2.699 2.724

M2 : UD-BWTRL2S-GST-AC Bx6 AWFC 2.850 2.861 2.591 2.600 2.510 2.525 3.108 3.120 2.483 2.480 2.786 2.793 2.828 2.837 3.046 3.058 2.009 1.974 2.069 2.027 2.628 2.628 2.860 2.849

M3 : UD-BWT-GST modRLE-modAC Bx6 AWFC 2.989 2.977 2.733 2.710 2.537 2.552 3.289 3.280 2.512 2.505 2.833 2.834 2.921 2.922 3.206 3.187 2.019 1.981 2.075 2.030 2.711 2.698 2.955 2.930

M4 : UD-BWTGST-AC Bx6 AWFC 2.964 2.940 2.640 2.611 2.489 2.496 3.173 3.156 2.502 2.490 2.827 2.816 2.900 2.890 3.164 3.128 2.017 1.976 2.081 2.034 2.676 2.654 2.931 2.890

M5 : UD-BWTGST-modAC Bx6 AWFC 3.030 2.996 2.759 2.709 2.551 2.560 3.329 3.297 2.520 2.506 2.848 2.836 2.952 2.940 3.254 3.202 2.024 1.982 2.081 2.032 2.735 2.706 2.987 2.939

JPEG 2000 2.994 2.776 2.733 2.909 2.554 2.937 2.932 2.673 2.038 2.105 2.665 2.924

Another RLE0 that never expands the file has been presented by Wheeler. This coding encodes the data as a binary value. Reference [9] discusses more details about this coding. For the last coding, as stated above, Burrows and Wheeler suggest to use Arithmetic Coding. As mentioned by Abel in [11] that a specific Arithmetic Coder should be used, we cannot use a simple arithmetic coder with a common order-n context. GST stage increases the probability of lower input values (and so it decreases the probability of the higher values). Therefore, Fenwick proposes an Arithmetic Coder with hierarchical coding model for this skew distribution. We also tested this coding, referred as M3 in Table 5. The results of this method are presented in the 7th and 8th column of Table 4. In contrast to the common methods related in literature, we propose to omit RLE stage on this compression scheme. Two chains are then proposed. They are respectively similar to the M1 and M3 schemes but do not include any RLE stage. Therefore, the first one uses a traditional AC and the second one, a modified AC. Table 5: CR results for different BWT methods. Method M1

M2

M3

M4

M5

> JPEG2000 < JPEG2000 Total > JPEG2000 < JPEG2000 Total > JPEG2000 < JPEG2000 Total > JPEG2000 < JPEG2000 Total > JPEG2000 < JPEG2000 Total

(group 1) (group 2) (group 1) (group 2) (group 1) (group 2) (group 1) (group 2) (group 1) (group 2)

Bx6 Nb. Image 29 71 100 17 83 100 24 76 100 20 80 100 30 70 100

CR 3.132 2.522 2.699 3.324 2.765 2.860 3.435 2.804 2.955 3.449 2.801 2.931 3.401 2.809 2.987

JPEG2000 CR 2.486 3.103 2.924 2.741 2.961 2.924 2.868 2.941 2.924 2.815 2.951 2.924 2.898 2.935 2.924

AWFC Nb. Image 13 87 100 17 83 100 21 79 100 18 82 100 25 75 100

CR 3.196 2.653 2.724 3.329 2.750 2.849 3.433 2.796 2.930 3.416 2.775 2.890 3.375 2.794 2.939

JPEG2000 CR 2.633 2.967 2.924 2.741 2.961 2.924 2.821 2.951 2.924 2.768 2.961 2.924 2.845 2.950 2.924

The results presented in Table 4 show that the performance of M5 can be slightly improved in average and therefore to be considered. We propose to 12

analyze the results by splitting the image in two classes. The images which provide better compression ratios than JPEG2000 (>JPEG2000), referred as group 1, and others (
Lihat lebih banyak...

Comentarios

Copyright © 2017 DATOSPDF Inc.