A Comparative Study of Lossless Compression Algorithms on Multi-spectral Imager Data

Share Embed


Descripción

A Comparative Study of Lossless Compression Algorithms on Multispectral Imager data Michael Grossberga and Srikanth Gottipatia and Irina Gladkovaa and Malka Rabinowitza and Paul Alabia and Tence Georgea and Amnia Pachecoa a CCNY,

NOAA/CREST, 138th Street and Convent Avenue, New York, NY 10031, USA ABSTRACT

This paper reports a comparative study of current lossless compression algorithms for data from a representative selection of satellite based earth science multispectral imagers. The study includes the performance of compression algorithms on Advanced Very High Resolution Radiometer(AVHRR), SEVIRI, the Moderate Resolution Imaging Spectroradiometer(MODIS) imager, as well as a subset of MODIS bands as a proxy for the upcoming GOES-R series. SEVIRI aboard the ESA/EUMETSAT operated Meteosat Second Generation (MSG) satellites is a geostationary imager. The AVHRR aboard the NOAA Polar Orbiting Environmental Satellites and MODIS aboard the NASA Terra and Aqua satellites have polar orbits. Thus this study will present representatives from both polar and geostationary orbiting imagers. The imagers we include have sensors for both reflected and emissive radiance. We also note that the older satellites have coarser quantizations and present our conclusions on the impact on compression ratios. Faced with a enormous growing large volume of data on a new emerging current generation images from faster scanning, finer spatial resolution, and greater spectral resolution, this study provides a comparison of current compression algorithms as a baseline for future work. With growing satellite Earth science multispectral imager volume data, it becomes increasingly important to evaluate which compression algorithms are most appropriate for data management in transmission and archiving. This comparative compression study uses a wide range standard implementations of the leading lossless compression algorithms. Examples include image compression algorithms such as PNG and JPEG2000, and widely-used file compression formats such as BZIP2 and 7z. This study includes a comparison with the Consultative Committee for Space Data Systems (CCSDS) recommended Szip software which uses the extended-Rice lossless compression algorithm as well as the most recent recommended compression standard which relies on a wavelet transform followed by an entropy coder. To establish statistical significance of our analysis, we have developed a system to acquire and manage a large number of imager granules: currently over 1000 MODIS granules, over 2400 AVHRR granules, and over 220 SEVIRI granules.

1. INTRODUCTION High resolution multi-spectral imagers are becoming increasingly important tools for understanding and monitoring the earth. Future NOAA missions such as GOES-R will include improved imagers which will provide a rich stream of scientific data. The rich stream of data comming from next generation imager must be transmitted wirelessly back to earth over channels with severly limited bandwith. Even after data are recieved they must be archived, and distributed world wide. This makes lossless compression of the data essential. As a proxy for next generation imagers we have collected a selection of data from both geostationary and polar orbiting imagers. A representive modern polar orbiter is the Moderate Resolution Imaging Spectroradiometer (MODIS). MODIS is a 36 band Visible and IR multispectral imager which is currently deployed on both the Terra and Aqua satellites. We also examined data from the Advanced Very High Resolution Radiometer (AVHRR), carried aboard the NOAA Polar Orbiting which was developed much earlier than MODIS. As a representive of a modern geostationary satellite we have collected data from the Spinning Enhanced Visible and Infrared Imager (SEVIRI) which is located on the Meteosat Second Generation (MSG). The granules from all the examples given often contain calibration and telemetry data, and meta data, along with the imaging granules. The relative size of the imaging data in comparison to this auxilary data was ignored Further author information: (Send correspondence to Michael Grossberg) Michael Grossberg: E-mail: [email protected], Telephone: 1 212 650 6295 Algorithms and Technologies for Multispectral, Hyperspectral, and Ultraspectral Imagery XV, edited by Sylvia S. Shen, Paul E. Lewis, Proc. of SPIE Vol. 7334, 733408 · © 2009 SPIE CCC code: 0277-786X/09/$18 · doi: 10.1117/12.821007 Proc. of SPIE Vol. 7334 733408-1

for our comparison. We also focused on raw digital counts, referred to as ’1a’ data in MODIS and extracted from the ’1b’ data in AVHRR. While radiance data, and other processed products are also important, they are derived ultimately from the digital counts making this data the primary source. There are number of lossless compression algorithms available. We considered a number of widely used algorithms which are relatively fast and achieve high compression results. We look at a representitive collection of general lossless data compression algorithms including GZIP, ZIP, BZIP2, and 7zip. We also looked at image compression software such as TIFF, JPEG2000, and the current lossless compression recommended by the Consultative Committee on Space Data Systems (CCSDS). We considered it important to use algorithms for which there are open source implementations available. This availability ensures that the compression algorithms we consider here, may be used globally for transmission, distribution, and archiving of imager data. One concern in any evaluation is the variablity of the data. Evaluation by the examination of a small number of samples, or exclusivly focusing on the mean of the data is typically insufficient for use in planning engineeringrequirements. In this work we have considered hundreds of granules for each imager. We have categorized these data sets by different criteria in order to show the robustness of the results to the sampling. The absolute estimates are consistant across the sampling and there is even less variation in the relative performance of the different algorithms. This gives a high confidence in the reliability of the estimates for future data.

2. IMAGER DATA 2.1 MODIS The MODerate resolution Imaging Spectroradiometer (MODIS) is a key instrument aboard the Terra (EOS AM) and Aqua (EOS PM) polar satellites. Terra’s orbit around the Earth is timed so that it passes from north to south across the equator in the morning, while Aqua passes south to north over the equator in the afternoon. The MODIS instrument provides high radiometric sensitivity (12 bit) in 36 spectral bands ranging in wavelength from 0.4 µm to 14.4 µm. These 36 distinct spectral bands are divided into four separate Focal Plane Assemblies (FPA): Visible (VIS), Near Infrared (NIR), Short- and Mid-Wave Infrared (SWIR/MWIR), and Long-Wave Infrared (LWIR).

Figure 1. (a) A crippling heat wave and strong winds in southeastern Australia contributed to an outbreak of forest and grassland fires in Victoria in late January 2009. (b) Dust plumes rippled and swirled over the Arabian Sea in early March 2009. The dust in this region likely resulted from multiple sources, including plumes from Pakistan in the north and the Arabian Peninsula in the west. Both images were captured by the Moderate Resolution Imaging Spectroradiometer (MODIS) on NASAs Aqua satellite.

Data acquisition was primarily from LAADS (Level 1 and Atmosphere Archive and Distribution System ). All granules were randomly selected and downloaded using a sampling algorithm. Furthermore , LAADS allows us to customize granules based on distribution satellite, product level, temporal selection (months and days) as well as spartial selection (longitude and latitude). We downloaded just over 3300 Modis level 1a granules (raw radiances) which included both day and night coverages, of which about 1000 were primarily daylight coverages

Proc. of SPIE Vol. 7334 733408-2

as well as distinct north/south/east/west hemisphere granules and were used for this paper. All granules are in HDF4 file extension.

2.2 AVHRR The Advanced Very High Resolution Radiometer (AVHRR) is a space-borne sensor embarked on the National Oceanic and Atmospheric Administration (NOAA) family of polar orbiting platforms ( POES ). AVHRR instruments measure the reflectance of the Earth in 5 relatively wide (by today’s standards) spectral bands. The first two are centred around the red (0.6 micrometer) and near-infrared (0.9 micrometer) regions, the third one is located around 3.5 micrometer, and the last two sample the thermal radiation emitted by the planet, around 11 and 12 micrometers, respectively. The primary purpose of these instruments is to monitor clouds and to measure the thermal emission (cooling) of the Earth. Advanced Very High Resolution Radiometer (AVHRR) data was collected from Al Powell , Senior Director National Oceanic and Atmospheric Administrator (NOAA). Before using the granules, NOAA provided us with a converter (sat2netcdf) which allows us to convert each granule from its original format to a readable NETCDF format. 2400 AVHRR granules were used in the survey.

2.3 SEVIRI The MSG satellite’s main payload is the optical imaging radiometer, the so-called Spinning Enhanced Visible and Infrared Imager (SEVIRI). SEVIRI is a 50 cm-diameter aperture, line-byline scanning radiometer, which provides image data in four Visible and Near-InfraRed (VNIR) channels and eight InfraRed (IR) channels. The 12 SEVIRI channels consist of 8 InfraRed (IR) detector packages (3 detectors each), and 1 High Resolution in the Visible (HRV) channel (9 detectors), 2 Visible and 1 Near-IR (3 detectors each). The spectral range for the 4 visible/NIR channels is 0.4-1.6 µm, and for the 8 IR channels ranges from 3.9-13.4 µm. The SEVIRI has unique capabilities for cloud imaging and tracking, fog detection, measurement of the Earth-surface and cloud-top temperatures, tracking of ozone patterns, as well as many other improved measurements. SEVIRI (Spinning Enhanced Visible and InfraRed Imager) data granules were obtained through EUMETSTAT (European Organisation for the Exploitation of Meteorological Satellites). Data granules used for this paper were randomly selected over a year covering the winter season through late summer. For the paper we focused only on daylight granules. In total, 220 Seviri granules were used in the survey. The Seviri data was obtained in HDF5 format.

3. COMPRESSION ALGORITHMS We now give a brief overview of current state of the art compression software and our reasons behind using them for evaluation purposes. Tiff: TIFF stands for Tagged Image File Format is used for storing images and is well known for high colordepth images. TIFF was developed by the Albus Corporation in 1986 and is a variable-resolution format. TIFF files can be grey scale, RGB full color, or several other classes. For the purpose of this experiment Lempel-ZivWelch usually referred to as LZW was the compression type used to compress the image files. To compress the LZW algorithm1, 2 builds a translation table. Gzip/Zip: Both Zip and Gzip use an identical algorithm that is a variation of LZ77 (Lempel-Ziv 1977). It replaces duplicate strings in the data with a pointer that is a reference to a previous string, utilizing hash tables and hash chains in order to identify duplicates. 7zip: 7-Zip is an open source file archiver designed by Igor Pavlov.3 To compress a file, 7-Zip uses a combination of filters that can be preprocessors, compression algorithms, or encryption filters. The main compression is done using LZMA compression. LZMA algorithm was also developed by Igor Pavlov. LZMA uses an enhanced LZ77 algorithm4 along with a form of arithmetic coding known as a range coder. Bzip: Bzip25 is a lossless data compression algorithm that was developed by Julian Seward. Unlike rar and zip Bzip2 is a data compressor, not an archiver. Bzip2 uses Burrows-Wheeler transform to change character sequences into strings of identical letters, then uses a move-to-front transform, and finishes using Huffman coding.

Proc. of SPIE Vol. 7334 733408-3

Szip/Rice: Szip is an implementation of the extended-Rice lossless compression algorithm. The Consultative Committee on Space Data Systems (CCSDS) has adopted the extended-Rice algorithm for international standards for space applications.6 Szip is reported to provide fast and effective compression, specifically for the EOS data generated by the NASA Earth Observatory System (EOS).6 It was originally developed at University of New Mexico (UNM) and integrated with HDF4 by UNM researchers and developers. CCSDS-IDC-V2.0.6: The Consultative Committee on Space Data Systems (CCSDS) Image Data Compression (IDC) was developed for use on space instruments and uses an extended rice algorithm. CCSDS came up with the algorithm to help reduce data transmission time, reduce storage requirement, as well as to reduce the size of the data. This compression algorithm allows for real time processing with space electronics, it supports 4 - 16 bit image compression, and can be adjusted to output lossy to lossless compression formats. CCSDS IDC7 is a wavelet encoder that processes the coefficients from the residual subbands with a special algorithm which makes it different than your normal wavelet encoders. Jpeg2000: Jpeg2000 was created by the Joint Photographic Experts Group committee in the year 2000. Jpeg2000 uses multi-level discrete wavelet transform, that uses scalar quantization and block-based arithmetic coding to compress images. Jpeg2000 can achieve lossy to lossless compression. Its algorithm differs from standard Jpeg’s which uses a discrete cosine transform to do its compression. We use the JasPer library8 for implementing Jpeg2000.

4. SURVEY PROCESS The compression runs were done using a python script to call linux command line programs for data extraction and compression. The two main computers running the compression runs were Dell Servers each with (8) Quadcore genuine Intel(R) Xeon(R) CPU @ 2.66GHz with 4GB cache, 500 GB mirrored internal hard drives, 16GB Memory, Running RedHat Enterprise Linux (v.5 for 64-bit x86 64). The data was stored on six SATA drives holding 750 GB each. The sequence of scripted extraction and compression used for the compression runs was tested on randomly chosen samples by compressing and decompressing to guarantee the processes was completely lossless. Both the HDF4 and HDF5 libraries were used extensively. For MODIS the HDF4 function dumped a selected band (EV 250 m, EV 500m, EV 1km day, EV 1km night) out as a binary file. SEVERI files utilized the HDF5 library to dump its images to a binary file, while AVHRR was converted to NETCDF and then the NETCDF library was applied. The binary data extracted from MODIS was separated into individual images. For some algorithms the raw binary images were directly compressed, for others they were first converted to pgm format by attaching a small pgm header to the file, before compressing them. The compression algorithms include Zip, Gzip, TIFF, 7zip, Bzip2, CCSDS IDC, Jasper (Jpeg2000), and Rice szip. The ImageMagick libraries were used for TIFF compression using the convert command with -lzw switch. The 7z program was used for 7zip compression, bzip2 for Bzip2, gzip for Gzip, Zip for zip, and the CCSDS IDC compression implementation was provided by Pen-Shu Yeh of NASA-GSFC. For Bzip2, gzip, zip, and 7zip the -9 or mx9 switch was used to maximize compression. Before CCSDS IDC program run each binary file needed to be padded so that the length and width of the image were powers of two. Once the compression was complete, the sizes of the compressed files were stored in a database. Two tables were compiled to describe the dataset and to generate statistics about the dataset. The table with the file sizes yielded the bitrates and the table with the date, time, and location information categorized the granules.

5. SURVEY RESULTS To establish statistical significance of our analysis, we have developed a system to acquire and manage a large number of imager granules. We currently have over 1000 MODIS granules, over 2400 AVHRR granules, and over 220 SEVIRI granules. In this paper, we display statistical summaries of the results we obtained on these large collections of granules, with kinds of graphs.

Proc. of SPIE Vol. 7334 733408-4

Max-inlier maximum Max-inlier

(+) standard deviation third quartile

jmed in

I . . . U U U

Min-inlier minimum

mean

I

fir1ru ir1-iIdeviation (-) standard

Figure 2. Glyphs key for a box and whisker plot showing rank statistics and for a graph showing mean and standard deviation.

In one graph we produce a summary of our results through a box and whisker graph produced by the matplolib python library. Figure 2 shows two glyphs for visualizing the statistics. On the left, the glyph shows the distribution of compression values. For each compression algorithm, the top of the box is the value of the first quartile Q1 bit rate. The line bisecting the box is the second quartile Q2, in other words the median. The bottom of the box is the third quartile Q3. The length of the box to the bottom, IQR = Q1 − Q3 is called the interquartile range and is used to classify inliers. Any data point within 1.5 ∗ IQR or top or bottom of the box is considered an inlier. The whiskers extend to the maximum and minimum inliers. On the right, the glyph shows a filled in circle for the mean and bars for the standard deviations.

5.1 MODIS Figure 3 shows little variation with respect to platforms, Terra and Aqua. Also note that for this data waveletbased compressors, CCSDS and Jasper are superior. The results for MODIS 500m channels are similar. For MODIS data we only considered granules labeled ”DAY”. This is because when the satellite passes into night mode, the 250m, 500m, and day channel sensors are not recorded. The results for the emissive night bands are nearly the same for day or night mode. Figure 4 (left) shows the average bit-rate per granule for the 1km MODIS day bands of the compressed data while the on the right shows the min and max ranges while on the right the figure shows the bit rate ranges. Both graphs are again broken down by platform. Compared to the 250m modis bands the bit rates are on average lower but the ranges are much broader. The day bands are much more prone to saturation which accounts for the range. As with the 250m data there is little difference in the compression performance between the platforms. Curiously, for the day bands the CCSDS wavelet algorithm does not perform as well as either bzip2 or 7zip. Figure 5 (left) shows average bit rates of each compression algorithm per granule for the MODIS 1km night channels. The night channels have narrower ranges than the other types of modis data. The night channels are emmissive infrared channels dissimilar to the other band types. Note the consistant performance between the platforms, and the relatively good performance of bzip2 with respect to the CCSDS. The narrower ranges are because there is less variation in the emmissive data itself, when compared to the reflective bands. Figure 6 (left) shows the MODIS average bit-rate per granule for each compression algorithm applied to the 250m bands broken down by hemisphere. The graph on the right shows the same MODIS average bit rates for the 1km day bands. The pattern seen in previous figures for the per platform breakdown persists. The Jasper implementation is the best performer. For the 250m visible image data the CCSDS is ranked 2nd while bzip2 takes 2nd place for the 1km day data. The difference in the 1km day data between northern and southern hemisphere may be due to some variation in the sampling of the data across seasons. The graphs show that there is little variation by hemisphere, particularly in the ranking of the algorithms. Figure 7 shows average bit rates of each compression algorithm for each of the Earth View (EV) blocks of MODIS granules. and Figure 8 shows similar results for the entire MODIS granules database.

Proc. of SPIE Vol. 7334 733408-5

MODIS EV250m_Platform_Day Level 1a, Ranges of Bits Per Sample

Key

Bits Per Sample

Bits Per Sample

MODIS EV250m_Platform_Day Level 1a, Aves of Bits Per Sample

Terra

Aqua

Terra

Aqua

Figure 3. The graph on the left shows the average bit-rate (bits per sample) per granule of each compression algorithm on the 2 visible 250m bands of MODIS broken down by platform. The graph on the right shows the max and min on the data sets. For MODIS data we only considered granules labeled ”DAY”

MODIS EV1km_Day_Platform_Day Level 1a, Ranges of Bits Per Sample

Key

Bits Per Sample

Bits Per Sample

MODIS EV1km_Day_Platform_Day Level 1a, Aves of Bits Per Sample

Terra

Aqua

Terra

Aqua

Figure 4. The graph on the left shows the average bit-rate per granule for the 1km MODIS day bands of the compressed data while the on the right shows the ranges. Both graphs are again broken down by platform, Terra and Aqua.

MODIS EV1km_Night_Platform_Day Level 1a, Ranges of Bits Per Sample

Key

Bits Per Sample

Bits Per Sample

MODIS EV1km_Night_Platform_Day Level 1a, Aves of Bits Per Sample

Terra

Aqua

Terra

Aqua

Figure 5. The graph on the left shows average bit rates of each compression algorithm per granule for the MODIS 1km night channels.

Proc. of SPIE Vol. 7334 733408-6

MODIS EV1km_Day Level 1a, Aves of Bits Per Sample

Key

Bits Per Sample

Bits Per Sample

MODIS EV250m Level 1a, Aves of Bits Per Sample

Northern

Southern

Eastern

Western

Northern

Southern

Eastern

Western

Figure 6. The graph on the left shows the MODIS average bit-rate per granule for each compression algorithm applied to the 250m bands broken down by hemisphere. The graph on the right shows the same MODIS average bit rates for the 1km day bands. The graphs show that there is little variation by hemisphere, particularly in the ranking of the algorithms.

1

MODIS By_Block Level 1a, Aves of Bits Per Sample

MODIS By_Block Level 1a, Ranges of Bits Per Sample

1

20

12

Bits Per Sample

Bits Per Sample

1

10

15

Key

10

5 2 0

EV250m

EV500m

EV1km_Day

0 EV250m

EV1km_Night

EV500m EV1km_DayEV1km_Night

Figure 7. The graph on the left shows the averages, while on the right shows ranges for the MODIS data per block. Note the widest ranges are for the 1km day data. This is because some day channels are prone to saturation.

MODIS Overall Level 1a, Ranges of Bits Per Sample

Key

Bits Per Sample

Bits Per Sample

MODIS Overall Level 1a, Aves of Bits Per Sample

Modis

Modis

Figure 8. The graph on the left shows the averages, while on the right shows ranges for the MODIS data bit-rate per granule. Due to saturation in some bands, the bit rate can effectively be zero.

Proc. of SPIE Vol. 7334 733408-7

5.2 AVHRR The AVHRR granules are long thin strip images which frequently contain dark pixels (in night) with little or no data. The absolute averages are hence less useful, however the relative performance of the compression algorithms is consistant with Jasper followed by bzip2, see Figure 9. We do not report the CCSDS and SZIP results on the thin AVHRR granules as we encountered some problems running the software on these datasets. AVHRR Overall Level 1b, Ranges of Bits Per Sample Tiff Gzip Zip 7zip BZip2 Jasper

Bits Per Sample

Bits Per Sample

AVHRR Overall Level 1b, Aves of Bits Per Sample

Tiff Gzip Zip 7zip BZip2 Jasper Avhrr

Avhrr

Figure 9. The graph on the left shows the mean with standard deviations for AVHRR data. The graph on the right show the ranges. The results are similar to the other data sets, however SZIP and the CCSDS algorithms were not included.

5.3 SEVIRI The SEVERI imager on the Meteosat Second Generation geosynchronous platform is a recent sensor. As such, it is an important reference in estimating compression results for the future GOES-R mission. In looking at the compression numbers it is important to note that the SEVIRI images are full disk images containing many pixels that view space. In the published data sets these pixels are precisely zero. GOES-R will use a swath pattern as shown in Figure 10.

Figure 10. ABI swathpattern9, 10

The full disk SEVIRI images when compressed have very low bit-rates due to the space portion of the image being filled with zeros. In order to correct for this superficial gain in compression ratios we have taken the uncompressed size to be the earth portion of the image only. Also, due to the uneven sample size for SEVIRI data categorized by seasons, we have computed the weighted-sum of the means of bit-rates per season according to their frequencies in the sample. As with MODIS and AVHRR Jasper outperforms the generic data compression algorithms but bzip2 does surprisingly well.

Proc. of SPIE Vol. 7334 733408-8

Seviri Overall Level 1b, Ranges of Bits Per Sample

Key

Bits Per Sample

Bits Per Sample

Seviri Overall Level 1b, Aves of Bits Per Sample

Seviri

Seviri

Figure 11. The graph on the left shows the mean and standard deviation of the bit-rates per granule of the SEVIRI data for each algorithm. The graph on the right shows the ranges for the same values.

6. CONCLUSION As the data set is broken down and compression results are examined across categories, some generalizations can be made. The means for each algorithm stay within a narrow range, they do not differ largely from each other within the same satellite and band. The ranking of algorithms remains consistent across categories and satellites, although there is some variation in the relative performance when different band types are considered, as can be seen by the MODIS results. The consistency of the means for a large data set across a range of criteria and breakdowns indicates that the results are a reliable statistical measure of compression behavior. The overall best performer was Jasper. It was also expected that CCSDS performs only slightly worse than Jasper since it is based on a similar wavelet filter bank. The excellent performance of Bzip2 was a surprise. It bests Jasper in a few cases and does nearly as well overall. It does so despite the fact that it is not specifically an image compression algorithm.

Acknowledgments This compression research is managed by Roger Heymann, PE of OSD NOAA NESDIS Engineering, in collaboration with the NOAA NESDIS STAR Research Office through Mitch Goldberg, Tim Schmit, Walter Wolf.

REFERENCES [1] Ziv, J. and Lempel, A., “Compression of individual sequences via variable-rate coding,” IEEE Trans. on Info. Theory 24, 531–536 (September 1978). [2] Welch, T. A., “A technique for high performance data compression,” IEEE Computer 17(6) (1984). [3] Pavlov, I., “http://www.7-zip.org.” [4] Ziv, J. and Lempel, A., “A Universal algorithm for sequential data compression,” IEEE Trans. on Info. Theory 23, 337–343 (May 1977). [5] “http://www.bzip.org.” [6] Yeh, P.-S., Xia-Serafino, W., Miles, L., Kobler, B., and Menasce, D., “Implementation of ccsds lossless data compression in hdf,” in [Earth Science Technology Conference], (June 2002). [7] Yeh, P.-S., Armbruster, P., Kiely, A., Masschelein, B., Moury, G., Schaefer, C., and Thiebaut, C., “The new CCSDS image compression recommendation,” Aerospace Conference, IEEE 5-12, 4138–4145 (March 2005). [8] Adams, M. D. and Kossentini, F., “JasPer: A software-Based JPEG-2000 Codec Implementation,” Proc. of IEEE International Conference on Image Processing 2, 53–56 (Oct 2000). [9] Schmit, T. J., Gurka, J. J., Gunshor, M. M., and Li, J., “The ABI on the GOES-R series,” in [5th GOES Users’ Conference], (January 2008). [10] “http://cimss.ssec.wisc.edu/goes/abi.”

Proc. of SPIE Vol. 7334 733408-9

Lihat lebih banyak...

Comentarios

Copyright © 2017 DATOSPDF Inc.