Data Fusion, Denoising, and Filtering to Produce Cloud-Free High Quality Temporal Composites Employing Parallel Temporal Map Algebra

Share Embed


Descripción

Data Fusion, De-noising, and Filtering to Produce Cloud-Free High Quality Temporal Composites Employing Parallel Temporal Map Algebra Bijay Shrestha, Charles G. O’Hara, and Preeti Mali GeoResources Institute, Mississippi State University, Starkville, MS 39759 {bijay, cgohara, preeti}@gri.msstate.edu Abstract Remotely sensed images from satellite sensors such as MODIS Aqua and Terra provide high temporal resolution and wide area coverage. Unfortunately, these images frequently include undesired cloud and water cover. Areas of cloud or water cover preclude analysis and interpretation of terrestrial land cover, vegetation vigor, and/or analysis of change. Cross platform multi-temporal image compositing techniques may be employed to create daily synthetic cloud free images using fused images from Aqua and Terra MODIS satellite images, and then creating a composite that includes representative values derived from a set of possibly cloudy satellite images collected during a given longer time period of interest. Spatio-temporal analytical processing methods that utilize moderate spatial resolution satellite imagery with high temporal resolution to create multi-temporal composites are data intensive and computationally intensive. Therefore, a study of the strategies using high performance parallel solutions is required. This research focuses on analyzing the fusion, de-noising, filtering, and compositing strategies for vegetation indices using parallel temporal map algebra. The report provides objective findings on methods and the relative benefits observed from various analysis methods and parallelization strategies. Index Terms – Map Algebra, Temporal Map Cube, Vegetation Index Fusion, Compositing, Parallel Processing

1. Introduction Remotely sensed images from satellite sensors such as Aqua and Terra Moderate Resolution Imaging Spectroradiometer (MODIS) provide high temporal resolution and wide area coverage, and therefore are highly useful in performing land use analysis [10]. Unfortunately, these images frequently include undesired cloud and water cover. Areas of cloud or water cover preclude analysis and interpretation of

terrestrial land cover, vegetation vigor, and/or analysis of change. Multi-temporal image compositing techniques may be employed to create synthetic daily cloud free images, from the MODIS sensors onboard of Aqua and Terra satellites, that includes representative values derived from a set of possibly cloudy satellite images collected during a given time period of interest. This study is basically concerned with vegetation indices. Therefore, all the references to composites would implicitly refer to composites of vegetation indices. Spatio-temporal satellite image analysis is a technique for monitoring spatial and temporal changes of land cover and oceanic locations on earth. Temporal Map Algebra (TMA) is a novel technique developed by Jeremy Mennis and Roland Viger for analyzing a time series of satellite imagery using simple algebraic operators that treats time series of imagery as a threedimensional data set, where two dimensions encode planimetric position on earth surface and the third dimension encodes time [5]. The high dimensionality of raster data leads to high computational cost, which is why parallel computation is attractive. This work was funded by NASA Rapid Prototyping Capabilities (RPC) Agriculture Efficiency Project. Matlab is used for the rapid prototyping of the solution. To make use of the high performance prototyping capabilities, we have used MatlabMPI [7].

1.1 Vegetation Indices Vegetation indices are defined as dimensionless, radiometric measures that function as indicators of relative abundance and activity of green vegetation, often including leaf-area index, percentage green cover, chlorophyll content, green biomass, and absorbed photo-synthetically active radiation [6]. Normalized difference vegetation index was used for this work. The principle underlying NDVI is the strong reflectance of healthy, chlorophyll-based vegetation at near-infrared wavelengths and its relatively weak reflectance in the visible red. NDVI is simply defined as the difference between these

reflectance normalized by their sum as shown by the following equation. This yields a dimensionless quantity ranging in theory from -1 to 1, but in practice the lowest value seldom falls below -0.25. The value increases from -1 to +1 with the increase in vegetation. The clouds are in the lower end of the NDVI value range.

NDVI =

NearIR − R NearIR + R

(1)

The NDVI is calculated by using the above equation, where NearIR represents near infra-red reflectance and R represents red reflectance.

1.2. Temporal Map Algebra The quantitative nature of digital images provides a foundation for the spatial mathematics. Map algebra is a useful spatial analysis and cartographic modeling framework using simple mathematics operators [3]. In this approach, raster data handling is performed by treating spatial data as variables and applying mathematical local, focal and zonal operations on them. Temporal map algebra (TMA) functions are temporal extensions to conventional map algebra. The temporal map algebra functions treat a time series of imagery as a three dimensional data set, where two dimensions encode planimetric positions on Earth’s surface, and the third dimension encodes time. This allows the time series of satellite imagery to be manipulated according to well-established and standardized raster processing techniques. Mennis and Viger used temporal map algebra for determining the El Nino /Southern Oscillation (ENSO) effect on southern African vegetation intensity over different land cover types. The temporal map algebra function assisted in effectively obtaining the ENSO effects over various land cover from a time series NDVI datasets using a zonal function for a given multi-temporal dataset. The study concluded that the use of cube function approach assisted in implementation of temporal map algebra functions. The logical data model for temporal map algebra is a regular tessellation of R3, a three dimensional matrix. Temporal map algebra can be used for spatial analysis that are performed by map algebra over a period of time using local, focal and zonal statistics functions like minimum, maximum, mean, median, mode, variance, etc. Local functions will give statistics for each point over time; focal functions will give temporal statistical values for a defined region over

each pixel; and zonal functions will give statistical data with respect to each defined region. Local and focal functions can be used for creating composites using TMA [2]. However, in our case we use cross-cube fusion filter function. The main objective of using temporal map algebra for vegetation index compositing is to obtain improved capabilities to perform compositing processes on a multi-temporal datasets such as a time series NDVI and extract the required information effectively. Mali et al. have utilized temporal map algebra to create vegetation index composites of AVHRR and MODIS datasets [11].

1.3. Parallel Processing Parallel processing is relevant here because of the large datasets involved and the need for rapid prototyping of the various different scenarios. Parallel processing is performed by decomposing one large process into small processes that can be solved simultaneously and hence providing faster execution time [1]. Thus, parallelization provides a way to solve a large computationally intensive process like temporal image compositing, which would otherwise take a long time on a single machine. Table 1. Data Cube Information Cube Description Aqua /Terra NDVI Cube Angle Cube Mask Cube

Bits per Pixel 32

Size (MB)

16 8

769.32 385.05

1537.88

Parallel applications need to divide a large problem into a set of smaller problems so that the problems may be allocated to a set of processes running on the processors of a parallel machine for simultaneous executions. There are two aspects of parallel decomposition: data decomposition and functional decomposition. Functional parallelism is based upon the assumption that a computational task may be broken into relatively independent functions or processes. Parallelism is then achieved by assigning these functions or processes to multiple processors. In this study, regular domain decomposition is relevant as the data sets concerned are threedimensional raster grid data represented as three dimensional matrices. Any parallel application requires methods for reading input data from disk, distribution of data among processors, gathering resultant data from processors, and writing the combined resulting

data to disk. The metrics that are used for the evaluation are serial execution time, parallel execution time, speedup and efficiency.

2. Methodology The following methodology was used to create the daily fused datasets. Collection: This step involves the collection of the satellite imagery of the spatial area of interest; the satellite imageries of interest are from MODIS Aqua and Terra sensors. The MODIS datasets that were used are MOD09GQK, MYD09GQK, MODMGGAD, MYDMGGAD, MOD09GST, MYD09GST. Here, MYD represents Aqua data products and MOD represents Terra data products. MYD/MOD09GQK is 250m resolution surface reflectance data, MYD/MODMGGAD is 1 Km resolution Geo-location angles data and MYD/MOD09GST is 1 Km resolution surface reflectance quality data. Preprocessing: The preprocessing steps included reprojecting and reformatting MODIS09 data to GeoTIFF, NDVI, quality mask values and satellite angles calculation. The HEG2.7 tool was used to reproject and reformat the MODIS 09 data. Creation of data cubes: This step is the actual creation of the data cubes. MODIS data comes with satellite sensor angle and quality information about each pixel; three data cubes namely NDVI, View Angle, and Surface Reflectance Quality are created. Table 1 lists the size of the 32 day data cubes for our Area of Interest (AOI). NDVI cubes were computed using MOD09GQK and MYD09GQK, View Angle cubes were created using MODMGGAD and MYDMGGAD, and Quality cubes were created using MOD09GST and MYD09GST. Fusion: This step utilizes rule-based approach for manipulating data streams and creating a subset of observations that meet view angle and quality criteria. This process is generally referred to hereafter as fusion. Table 1 lists the required input data cubes, number of bits representing each pixel, and the size in MB for each cube.

filtering is used to find the pixels that fulfill the required criteria. The criteria that that were enforced were that the zenith angle for the pixel must be less than 48o and the underlying pixel was classified as a good observation. More details about the criteria selection are described in [11]. The temporal window that was considered for filtering of good observation is of 5 days. The following algorithm depicts the rule based fusion. For day from 1 to 5 if angle is less than 48 AND mask is Land select pixel else get next pixel If no pixel is selected select pixel with highest NDVI

MODIS TERRA NDVI

MODIS AQUA NDVI

Cross-Platform Fusion

Associated Angle, Quality, and Metadata Cubes

Fused AQUA-TERRA NDVI

Figure 1. Cross Platform Fusion Methodology The rule based algorithm for the cross platform fusion, de-noising, filtering is based on the fact that pixels with lower scan angles contains less noise, and also utilizing the associated metadata that differentiates between land, water, cloud and snow to mask out the pixels that are not land. Then a stepwise temporal

The applications were developed using MATLAB, MatlabMPI and LibTIFF [8] libraries. The experiments were conducted on the EMPIRE cluster at ERC at Mississippi State University. The EMPIRE cluster is an IBM Linux cluster with 1038 1 GHz and 1.266 GHz

Pentium III processors on 519 nodes and 607.5 Gigabytes of RAM. The system has a total of 9.7 terabytes of disk space acting as secondary memory, with each processor having a two Gigabytes of personal disk space. To perform this experiment, parallel software was developed using MatlabMPI that could simultaneously execute MATLAB in more than one processor. The software works such that the different chunks of centralized input data, located in a Network File Server, are read simultaneously by different processors and then processed. This is a classic case of block decomposition, where the input data is divided and assigned to different processors. There are 6 threedimensional cubes of same dimensions as input sets: two NDVI cubes, two Quality-Mask cubes and the two Angle cubes representing Aqua and Terra images. Therefore, each processor calculates the dimensions of the multi-dimensional input data that it needs to read. The precise chunk of data is then read by each processor from the three input data cubes. The processors then compute the composite from the cubes using the focal maximum value compositing, with angular constraint and surface reflectance quality mask. The processed results are then written as a MATLAB mat file in a central file server, which is then read by central processor and then combines the results to give the final output composite image in a TIFF format.

3. Results and Analysis The fusion process consists of the following steps namely acquisition, preprocessing, and fusion computation using Temporal Map Algebra. The daily fused images can then be used for daily analysis, and also to create long term composites if the ratio of cloud cover in daily fused data is still high. All these different processes can be parallelized if automated processes are developed for acquisition and preprocessing. Since all of them are lengthy and timeconsuming process, and therefore an ideal candidate for parallel processing. This study focused on the rule based fusion process using the concepts of Temporal Map Algebra. Figure 3.1 shows one of the actual input NDVI images scenes that was used to compute fused image. The size of the image cube is 3148 X 4000 pixels, and it covers Mississippi, Arkansas, Alabama, and parts of Louisiana. This is a subset of a MODIS grid h10v05 was downloaded from the LP DAAC (Land Processes Distributed Active Archive Center) data center. For our computation, we subsetted the whole image tile to the

following coordinates: [37° 9' 59.9", -95°12'8.86"] [29° 49' 41.10", -84° 41' 27.35"].

Figure 3.1. NDVI Cube tile Figure 3.3 shows the input Aqua and Terra NDVI data, and the fused NDVI output created using the cross-platform fusion as described in the methodology section, and shows our area of interest. Since this is a work under progress and the detailed quantitative analysis is still being performed. The detailed results will be published in upcoming publications. The serial time taken to compute the AOI of MODIS Aqua and Terra fused image over 32 days took approximately 58.98 hours. However, it should be noted that the program was written in Matlab, and better performance can be achieved using C/C++ and other compiled languages. But, Matlab was preferred because for its rapid prototyping capabilities, and generate faster output and improve performance MatlabMPI was used to parallelize the software using the block decomposition of input data. Table 2. Run-times Processors 1 16 32 64

Time (Hours) 58.98 4.46 2.7 2.0

Speedup 1 13.21 21.53 28.1

Efficiency 1 0.82 0.67 0.44

Table 2 profiles the timing using various processors. As we can see, we can get a good speedup using higher number of processors. Figure 3.2 shows the graph of the computation times. From the table and the figure, we can see that almost linear speedup was

achieved for the parallel software for 16 processors. However, it also looks like some processes are taking longer times than others, and are acting as critical paths and thereby increasing the total computation time. We can overcome this problem by reducing input granularity for each process and performing dynamic scheduling instead of the current static scheduling. Run-Times for Different Number of Processors

5. References [1] A. Grama, A. Gupta, G. Karypis, and V. Kumar, Introduction to Parallel Computing, Second edition, Addison Wesley, 2003. [2] B. Shrestha, C. G. O’Hara, and N. Younan, “Strategies for Alternate Approaches For Vegetation Indices Compositing using Parallel Temporal Map Algebra”, Proceedings of the American Society for Photogramemetry and Remote Sensing Annual Conference, May 2006.

70 60 50

Hours

done to analyze and employ the fused datasets and associated metadata in NASA-RPC projects.

40

[3] C. D. Tomlin, Geographic Information Systems and Cartographic Modeling, Prentice Hall, Englewood Cliffs, New Jersey, 1990.

30 20 10 0 0

8

16

24

32

40

48

56

64

72

Number of Processors

Figure 3.2. Serial and Parallel Run-times

4. Conclusions This is a work in progress and therefore additional changes and employing other aspects of fusion, denoising, and filtering will be done in future. However, we can safely say that the use of parallel software will certainly give faster results because of the required processing of large data sets. Also, use of Matlab results in faster algorithm prototype development time and use of MatlabMPI gives relatively faster processing time. The accuracy of the results, however, is dependent by various factors such as network bandwidth, network traffic, processor speed, and network/cluster architecture. The use of MATLAB and MatlabMPI as the programming environment has added extra overhead to the total serial and parallel times for the computation. The application was also programmed using MATLAB’s interpreted system and that also added overhead to the execution time. We know that MATLAB has high overhead cost and initializing MATLAB using MatlabMPI in a number processor takes a considerable amount of time. These overheads can be reduced by programming in C or C++ that have lower overhead. Further work is being

[4] Healey, R., S. Dowers, B. Gittings, and M. Mineter (1997), Parallel Processing Algorithms for GIS, CRC Press. [5] J. Mennis and R. Viger, “Analyzing Time Series Satellite Imagery Using Temporal Map Algebra”, Proceedings of the American Society for Photogramemetry and Remote Sensing Annual Conference, May 2004. [6] J. R. Jensen, Remote Sensing of the Environment: An Earth Resource Perspective, Prentice Hall, 2000. [7] J. Kepner, “Parallel Programming with MatlabMPI,” Proceedings of the High Performance Embedded Computing (HPEC 2001) Workshop, September 2001. [8] Leffler, S., “Libtiff,” www.libtiff.org (current 2 September, 2006). [9] Matlab 7 User’s Guide (2004), The MathWorks, Inc. [10] National Aeronautics and Space Administration (NASA), “MODIS Home Page,” http://modis.gsfc.nasa.gov/ (current 2 September, 2006). [11] P. Mali, C. O’Hara, B. Shrestha, and V. Vijayraj, “Use and Analysis of Temporal Map Algebra for Vegetation Index Compositing”, International Workshop on the Analysis of Multi-Temporal Remote Sensing Images, May 2005.

September 13

September 14

September 15

September 16

September 17

September 16

September 17

a) Aqua NDVI

September 13

September 14

September 15 b)Terra NDVI

September 13

September 14

September 15

September 16

September 17

c) Fused Aqua-Terra NDVI Images Figure 3. 3 Daily Aqua, Terra and Fused MODIS Composites for Mississippi Study Region. White color represent the ‘No Data’ values, Grey Color shows cloud cover, the low NDVI values are in orange and yellow colors, the increase in NDVI is shown by increase in darkness of green color.

Lihat lebih banyak...

Comentarios

Copyright © 2017 DATOSPDF Inc.