A Loss function approach to identifying environmental exceedances

June 20, 2017 | Autor: Thomas Santner | Categoría: Engineering, Environmental Science, Economics, Bayesian Analysis, Mathematical Sciences, Markov Chain Monte Carlo, Nitrogen, Extremes, Chesapeake Bay, Bayesian hierarchical model, Tagucghi Loss Function, Cumulative distribution function, Markov Chain Monte Carlo, Nitrogen, Extremes, Chesapeake Bay, Bayesian hierarchical model, Tagucghi Loss Function, Cumulative distribution function

Share Embed

Laporkan tautan ini

Descripción

Extremes (2006) 8: 143–159 DOI 10.1007/s10687-006-7964-y

A loss function approach to identifying environmental exceedances Peter F. Craigmile & Noel Cressie & Thomas J. Santner & Youlan Rao

Received: 21 July 2004 / Revised: 1 December 2005 / Accepted: 12 December 2005 # Springer Science + Business Media, LLC 2006

Abstract A frequent problem in environmental science is the prediction of extrema and exceedances. It is well known that Bayesian and empirical-Bayesian predictors based on integrated squared error loss (ISEL) tend to overshrink predictions of extrema toward the mean. In this paper, we consider a geostatistical extension of the weighted rank squared error loss function (WRSEL) of Wright et al. (2003), which we call the integrated weighted quantile squared error loss (IWQSEL), as the basis for prediction of exceedances and their spatial location. The loss function is based on an ordering of the underlying spatial process using a spatially averaged cumulative distribution function. We illustrate this methodology with a Bayesian analysis of surface-nitrogen concentrations in the Chesapeake Bay and compare the new IWQSEL predictor with a standard ISEL predictor. We also give a comparison to predicted extrema obtained from a Bplug-in’’ goestatistical analysis. Keywords Bayesian hierarchical models . Geostatistics . Integrated weighted quantile squared error loss . Bayes predictor . Markov chain Monte Carlo . Extremes AMS 2000 Subject Classification Primary—62M30; Secondary —62H11

1. Introduction In many environmental applications, there is a need to quantify exceedances and the spatial Bextent’’ of those exceedances. This is a difficult problem because substantial P. F. Craigmile (*) : N. Cressie : T. J. Santner Department of Statistics, The Ohio State University, Columbus, OH 43210-1247, USA e-mail: [email protected] Y. Rao Department of Mathematics, The Ohio State University, Columbus, OH 43210-1174, USA

144

P. F. Craigmile, N. Cressie, et al.

amounts of data are usually required to estimate the tail of a probability distribution. In spatial settings, determining exceedances and their locations based on the spatial-prediction method known as kriging (Matheron, 1969; Cressie, 1993, Section 3), does not work particularly well because kriging Bsmooths out’’ exceedances, having as its basis the squared-error loss function. This paper introduces loss functions that are formulated to facilitate the estimation of exceedances. Our motivation is the study of environmental problems based on hidden spatial statistical models with continuous spatial index (i.e., geostatistical models). There have been several approaches to guide inference about extrema and exceedances of a hidden process based on noisy and incomplete data. Stein et al. (1999) consider methods for spatial extremes based on extreme-value distributions using the Bpeaks over threshold’’ method (see Davison and Smith, 1990, for details). Coles and Tawn (1996), Dixon et al. (1998), Coles (2001), and Gilleland et al. (2004) approach inference on extrema, for geostatistical data, using hierarchical spatial modeling of the parameters of the generalized extreme-value distribution. Relevant for this article, are the well-known conclusions from the Bayesian literature that posterior means, which are derived from the sum-ofsquared-error loss function, tend to Bovershrink’’ their predictions of extrema. Louis (1984), Cressie (1988), and Ghosh (1992) were among the first to quantify explicitly this phenomenon and to suggest alternative, constrained Bayes predictors that, while having larger expected sum-of-squared-error loss than Bayes predictors, are better suited to the estimation of nonlinear functionals, including extrema. Stern and Cressie (1999a, 1999b) use predictions derived from a hierarchical Bayesian model with loss functions that are not based on squared error, for inference about extreme disease rates in small areas. Cressie et al. (2000) and Wright et al. (2003) introduce Bweighted rank squared error loss’’ (WRSEL) for estimating extrema of finite-dimensional Markov random fields. This paper extends the WRSEL from finite-index lattice processes to uncountable-index geostatistical processes. We believe that there is great value in trying to quantify scientific questions regarding extrema in terms of such loss functions. One obvious advantage of this approach is that an optimal (Bayes) predictor (i.e., one that minimizes the posterior expected loss) is available in principle. We introduce a loss function that we call the integrated weighted quantile squared error loss (IWQSEL) function, based on an expected ordering of the values of the hidden process in terms of the averaged (over a spatial region) cumulative distribution function (ACDF). While motivated by the WRSEL function, the IWQSEL represents an important advance, both conceptually and technically, for solving extrema problems involving a spatial process with uncountable index sets (geostatistical processes). The remainder of the article is organized as follows. Section 2 introduces the hierarchical spatial model that we shall consider and also defines the ACDF and the IWQSEL function. We present several intuitively useful examples of IWQSELs for weight functions that can be used for estimating extrema, and we calculate the optimal Bayes predictor relative to an arbitrary IWQSEL. Section 3 gives a hierarchical spatial model for pollutant concentrations based on the lognormal distribution and discusses Bayesian posterior inference derived from the model. Section 4 illustrates how the methodology can be applied to the problem of estimating exceedances of surface-nitrogen concentrations in the Chesapeake Bay. It shows the impact of using IWQSEL versus ISEL and compares our new

A loss function approach to identifying environmental exceedances

145

methodology with the Bstandard’’ analysis based on universal kriging. Finally, in Section 5, we give a discussion of our approach.

2. Loss functions for prediction of spatial extrema A generic Bayesian hierarchical spatial model is made up of three components. There is a process of scientific interest, fYðsÞ : s 2 Dg, that is hidden from the observer because of measurement error and incomplete sampling of D; here D is an uncountable region of positive volume in Rd. The joint distribution of the hidden process depends on unknown parameters 71 ; we use the notation ½fYðsÞ : s 2 Dg j 71 to denote this distribution. The parameter 71 would typically include spatial trend and covariance parameters. There is also an observed process Z ¼ ðZðs1 Þ; . . . ; Zðsn ÞÞT, where the effect of measurement error is modeled through the conditional distribution of Z given fYðsÞ : s 2 Dg and possibly additional unknown parameters 72 . Let ½Z j fYðsÞ : s 2 Dg; 72 denote this distribution. For example, 72 would include the measurement-error variance. The final component of the Bayesian model is the joint (prior) distribution of 7 ð7T1 ; 7T2 ÞT. Bayes’ theorem allows inference to be made on unknown quantities by expressing their posterior distribution given the data, as proportional to the product of the three component distributions identified above. Given a relevant loss function for the problem at hand, an optimal Bayes predictor minimizes the expected loss over all predictors in a given class. For example, to predict the value Yðs0 Þ of the hidden process at s0 , with respect to squared error loss (SEL), the Bayes predictor is well known to be Ybðs0 Þ EðYðs0 ÞjZÞ. More generally, when predicting YðÞ by YbðÞ, over a fixed reference region B D, with respect to integrated SEL (ISEL), Z 2 b b ðsÞ ds; SB YðÞ; Y ðÞ YðsÞ Y B

the optimal predictor is again b ðsÞ ¼ EðYðsÞjZÞ; s 2 B: Y

ð1Þ

The ISEL predictor (1) has weaknesses. For example, for a fixed threshold k near the extrema of the process YðÞ, the predicted extent of exceedances fs 2 B : YbðsÞ > kg is biased for fs 2 B : YðsÞ > kg. Intuitively, this is because YbðÞ Brounds the peaks’’ and Bfills in the valleys’’ of YðÞ. In the subsection that follows, we consider an alternative loss function that is adapted to inferences for exceedances and extrema. 2.1. Integrated weighted quantile square error loss (IWQSEL) In this subsection, we develop a loss-function approach to prediction of extremal values of functionals of a geostatistical process, such as IðYðAÞ > kÞ where A B. This is a different problem than the commonly posed geostatistical problem of predicting fIðZðs0 Þ > kÞ : s0 2 Bg, where recall that ZðÞ contains a measurementerror component of variation. Now, it is well known that the optimal decisiontheoretic prediction of a parameter changes substantially if one, instead, wishes to

146

P. F. Craigmile, N. Cressie, et al.

estimate a function of (say, 2 ). The same is true for prediction of functionals of the random process YðÞ instead of YðÞ itself. Consequently, the usual solutions given in geostatistics to prediction of IðZðs0 Þ > kÞ, namely disjunctive kriging and indicator kriging (e.g., Diggle et al., 1998), do not solve the problem at hand. The optimal predictor of IðZðs0 Þ > kÞ at s0 2 B, based on squared error loss, is PrðZðs0 Þ > kjZÞ. As k is varied, this yields the posterior distribution of Zðs0 Þ, but this is not helpful for finding the posterior distribution of functionals of fYðsÞ : s 2 Ag, such as YðAÞ, for A B D. Fix the reference region B contained in D (often B ¼ D). The averaged CDF (ACDF) of the spatial process fYðsÞ : s 2 Bg, evaluated at y 2 R, is defined as follows. Let Fs ðy; 71 Þ PrðYðsÞ yj71 Þ be the cumulative distribution function (CDF) of YðsÞ. Then the ACDF over region B is defined by Z Fs ðy; 71 Þds ; ð2Þ FB ðy; 71 Þ B jBj R where jBj B 1 ds denotes the area of B. It is straightforward to see that FB ðy; 71 Þ ¼ R E ðjBj1 B IðYðsÞ yÞdsj1 Þ; where IðÞ denotes the indicator function. That is, the ACDF R is the expected value of the spatial CDF, defined by GB ðyÞ jBj1 B IðYðsÞ yÞds, and is discussed in Lahiri et al. (1999). For jBj > 0, FB ðy; 71 Þ is not itself a CDF of any functional of YðÞ, however it does have all the properties of a CDF when regarded as a function of y. Hence, for 0 < p < 1 we can define the inverse ACDF to be FB1 ð p; 71 Þ inffy 2 R : FB ðy; 71 Þ pg;

ð3Þ

namely that value above which, on average, 100p% of the process YðÞ in B lies. To put more emphasis on the quantiles of interest (here extreme quantiles), we define the IWQSEL to be a weighted ISEL, where the weight depends on the quantile achieved by the ACDF. The conditional (on 71 ) IWQSEL of using YeðÞ to predict YðÞ is defined by Z e ðÞ; 71 Þ e ðsÞÞ2 ds; LB ðYðÞ; Y wB ðYðsÞ; 71 Þ ðYðsÞ Y ð4Þ B

where 1 is momentarily assumed known, and the weight function is wB ðYðsÞ; 71 Þ

Z 0

1

wð pÞI YðsÞ 2 ðFB1 ðp; 71 Þ; FB1 ðp þ dp; 71 Þ ;

ð5Þ

and w : ½0; 1 ! ½0; 1Þ is a user-specified importance function that, in general, describes Binteresting’’ portions of the inverse ACDF. To predict extrema and exceedances well in B, we should put more weight on those YðsÞ in B that are either large or small, that is, on values of p Bnear’’ 0 or 1. We will talk more about the weight function in Section 2.2. The IWQSEL predictor is an extension of the WRSEL predictor introduced by Wright et al. (2003) for discrete-indexed processes, the fundamental differences being here our ordering and weighting of an uncountable number of YðsÞ values of the geostatistical process. The extension

A loss function approach to identifying environmental exceedances

147

represents an advance in quantifying spatial exceedances for continuous-indexed processes. Since 71 is unknown, we average the conditional IWQSEL with respect to the e Þ and priors and define the (unconditional) IWQSEL, a function of predictor Yð predictand YðÞ, as Z e ðÞÞ LB ðYðÞ; Y e ðÞ; 71 Þf ð71 Þd71 ; L*B ðYðÞ; Y ð6Þ where f ð71 Þ is the prior distribution of 71 . It is straightforward to see that Z 2 e e ðsÞ ds; * L B ðYðÞ; Y ðÞÞ w*B ðYðsÞÞ YðsÞ Y

ð7Þ

B

R where w*B ðYðsÞÞ wB ðYðsÞ; 71 Þf ð71 Þd71 : The function given by Eq. 6 declares the process YðÞ to be the predictand of interest and its parameters 71 to be nuisance parameters. This represents a class of loss functions that are new, as far as the authors know, in the spatial-prediction literature. Because the loss is an integral of componentwise loss functions, the optimal predictor of fYðsÞ : s 2 Bg minimizes the posterior expectation of the IWQSEL componentwise. Fixing s, the posterior expected loss given the data Z, is Z 2 e ðsÞÞ2 Z ¼ e ðsÞ f ðy jZÞ dy; w*B ðyÞ y Y ð8Þ E w*B ðYðsÞÞðYðsÞ Y R

where f ð j ZÞ is the posterior density of YðsÞ. Differentiating Eq. 8 with respect to YeðsÞ, setting the result equal to zero, and solving for YeðsÞ, we obtain the optimal predictor relative to IWQSEL as: R wB* ðyÞf ðyjZÞ y dy e ; s 2 B; ð9Þ Y ðsÞ ¼ RR * R wB ðyÞf ðyjZÞdy where w*B ðyÞ ¼

Z Z 0

1

wðpÞI y 2 ðFB1 ðp; 71 Þ; FB1 ðp þ dp; 71 Þ f ð71 Þd71 :

ð10Þ

The weight function (10) can be computed using Monte Carlo integration, and the predictor (9) can be computed using Markov chain Monte Carlo (MCMC) e sampling from the posterior distribution (Section 3). Clearly, the optimal YðsÞ is invariant to scale changes of the importance function wðÞ. The next section describes the IWQSEL predictor (9) for several choices of wðÞ. 2.2. The Bayes predictor for specific importance functions The simplest choice of importance function is wðpÞ ¼ 1, for p 2 ½0; 1. Then Eq. 4 reduces to ISEL and YeðsÞ becomes the posterior mean YbðsÞ given by Eq. 1. To give more weight to the larger values of fYðsÞ : s 2 Bg, we consider a number of alternative functions. One example is the step function, wð pÞ ¼ Iðp Þ;

ð11Þ

148

P. F. Craigmile, N. Cressie, et al.

where 1=2 < < 1 is a target proportion selected to be near one. For example, if ¼ 0:95, then we are interested in the top 5% of fYðsÞ : s 2 Bg. The corresponding IWQSEL predictor is obtained from Eq. 9 where Eq. 11 is substituted into Eq. 10. Another importance function we consider is the sigmoid-type function, 1 ; wð pÞ ¼ 1 þ eCðpÞ

p 2 ½0; 1;

ð12Þ

where C > 0 is a scale parameter and 1=2 < < 1. For a given , larger values of C concentrate wðÞ more on the top 100ð1 Þ% of YðÞ in B. The IWQSEL predictor corresponding to Eq. 12 is obtained from Eq. 9, where Eq. 12 is substituted into Eq. 10. It is obvious that C ¼ 0 corresponds to the ISEL estimator, and C ! 1 corresponds to the IWQSEL predictor with the step function (11). Thus, the parameter C controls the amount of shrinkage in the predictor by smoothly interpolating between these two extreme cases. One way to quantify the amount of shrinkage in the predictor is through the difference between the inverse ACDF (summarized at the componentwise posterior medians of 71 ) for the IWQSEL predictor and the inverse ACDF for the true YðÞ process for specified ACDF probabilities fqi g. A negative difference denotes overshrinkage of the qi -th Baverage quantile’’, and a positive difference denotes undershrinkage. We use this quantification based on q1 ¼ to select C in Eq. 12 for the example in Section 4.2. 2.3. Comparison of the Bayes predictors based on IWQSEL and ISEL Bayesian prediction of the process YðsÞ at site s based on ISEL is given by Eq. 1. Simulation-based comparisons of the IWQSEL and ISEL predictors of the extremes of the underlying YðÞ process are made in [Section 5] Wright et al. (2003) for a Markov-random-field model having a finite index set. They show consistent, often dramatic, improvement of the IWQSEL predictor over the ISEL predictor, particularly when the smooth sigmoid-type functions (12) are used. For example, in simulations to identify the location of the largest value for a lattice process, they find 40–80% improvement in the squared difference between the true rank of the location predicted by IWQSEL as having largest value, relative to the true rank of the location predicted by ISEL as having largest value. Other measures of the efficiency of the IWQSEL predictor for identifying extremes show similar results. A head-to-head comparison of the IWQSEL and ISEL predictor will be given for the pollutant-concentration example in Section 3.

3. A geostatistical model for concentrations We now apply the modeling and loss-function strategy given in Section 2 to a problem in environmental science. Suppose fYðsÞ : s 2 Dg represents the true concentrations of a pollutant in the domain D R2 . Let fZðsÞ : s 2 Dg denote the observable concentration process and Z ¼ ðZðs1 Þ; . . . ; Zðsn ÞÞT denote the vector of observed concentrations at locations s1 ; . . . ; sn . In what follows, N n ð; Þ denotes the n-variate normal distribution, and I n denotes the n n identity matrix.

A loss function approach to identifying environmental exceedances

149

In the rest of this article, we assume that the true and observed concentrations are nonnegative, and we model their logarithms. Given the true concentrations Y ¼ ðYðs1 Þ; . . . ; Yðsn ÞÞT, the observed Z is modeled by log ZjY; 2 N n ðlog Y; 2 I n Þ;

ð13Þ

where log Z ðlog Zðs1 Þ; . . . ; log Zðsn ÞÞT . In words, the measurement errors on the log scale are additive, normal with zero mean and common variance 2 > 0, and independent across sites. The unobserved true concentrations are modeled by log Yjb; q N n ðXb; C q Þ;

ð14Þ

where X is a known n p design matrix of covariates and b is an unknown pdimensional vector. The n n positive-definite covariance matrix C q describes the spatial covariances among the sites, where q is a q-dimensional parameter vector. In addition to comparison of the IWQSEL and ISEL predictors of extremes for this problem, we also compare the IWQSEL predictor of concentration extremes with a Bplug-in’’ approach commonly used in environmental science, based on the universal kriging predictor (known as Bobjective analysis’’ in some circles). To make the paper self-contained, we summarize the key features of universal kriging in the next subsection. 3.1. Universal kriging in the presence of measurement error When q and 2 are known, universal kriging predicts the values of the true logconcentrations log YðÞ at points, s*1 ; . . . ; s*m (Matheron, 1969; Huijbregts and Matheron, 1971; Cressie, 1993; Stein, 1999). Most commonly, we arrange the fs*j g to lie on some fine grid that covers D. For j ¼ 1; . . . ; m, the universal-kriging predictor of log Yðs*j Þ, based on the observed concentrations, is log Yðs*j Þ ¼ ðs*j Þ log Z; ð15Þ n oT where ðs*j Þ cj þ XðX T S1 XÞ1 ðx*j X T S1 cj Þ S1 and cj ðcovðlog Yðs*j Þ, log Zðs1 ÞÞ, . . . , covðlog Yðs*j Þ, log Zðsn ÞÞÞT . The kriging variance of log Yðs*j Þ, that is, the minimum mean squared prediction error of log Yðs*j Þ, is given by varðlog Y ðs*j ÞÞT cj þ ð ðs*j ÞÞT S ðs*j Þ. ðs*j ÞÞ 2ð In practice, geostatisticians seldom know the parameters q and 2 . They plug estimates of these parameters into the kriging predictor and kriging variance given above, yielding a variety of Bestimated best linear unbiased predictors’’ (EBLUPs) of log Yðs j Þ. This leads naturally to a comparison of parameter estimation [(Stein, 1999), Section 6.2] methods. The maximum likelihood estimators (MLEs) of ðb; q; 2 Þ are the values of the parameters that maximize the likelihood of the observed log-concentrations; that is, the MLEs minimize log det ðSÞ þ ½ log Z XbT S1 ½ log Z Xb;

ð16Þ

where recall that S ¼ C q þ 2 I n . By substituting b bðq; 2 Þ ðX T S1 XÞ1 X T S1 log Z;

ð17Þ

150

P. F. Craigmile, N. Cressie, et al.

into Eq. 16, we obtain a profile likelihood in ðq; 2 Þ to be maximized (McCullagh and Nelder, 1989). However, the MLEs of q and 2 are biased. To reduce this, Patterson and Thompson (1971) proposed estimating q and 2 based on contrasts of the data, a method they called restricted maximum likelihood (REML) estimators; REML chooses ðq; 2 ) to minimize: log detðSÞ þ log detðBÞ þ W T ðS1 S1 XB1 X T S1 ÞW ;

ð18Þ

with W ¼ ðI n XðX T XÞ1 X T Þ log Z and B ¼ X T S1 X. We estimate b by substituting the REML estimators of q and 2 into Eq. 17. Kitanidis (1983) first suggested the use of this method for spatial data. 3.2. Bayesian prediction with respect to ISEL and IWQSEL In contrast to the assumptions underlying universal kriging, we prefer to work with a (hierarchical) Bayesian model, which involves specification of a prior distribution f ðb; q; 2 Þ for the parameters of the likelihood model (13)–(14). Typically, this prior distribution will incorporate prior scientific belief about the parameters, although in some cases there may be only Bvague’’ prior belief (e.g., the prior used in Section 4). Consider predicting the true log-concentration process at points s*1 ; . . . ; sm*, that is, log Y* ðlog Yðs1*Þ; . . . ; log Yðsm*ÞÞT . The Bayesian predictive density of the true log concentrations at these points is calculated by, f ðlog Y*j log ZÞ ¼

Z

f ðlog Y*j log Z; b; q; 2 Þf ðb; q; 2 j log ZÞ db dq d2 :

ð19Þ

The first term in the integrand is obtained from the joint distribution of log Y* and log Z, given ðb; q; 2 Þ, which is

Xb log Z S 2 ; N b; q; nþm X*b log Y* Fq

F qT C *q

;

where F q is the m n matrix with ði; jÞ element covðlog Yðs*i Þ; log Zðsj ÞÞ, C*q is the m m matrix with ði; jÞ element covðlog Yðs*i Þ; log Yðs*j ÞÞ, and X* is the design matrix associated with the points fs*j : j ¼ 1; . . . ; mg. By a well known property of multivariate Gaussian distributions, log Y*jlog Z; b; q; 2 N m ð m*; S*Þ;

ð20Þ

where m* ¼ F q ðC *q Þ1 logZ þ ðX* F q ðC *q Þ1 XÞb and S* ¼ C *q F q S1 F Tq . From Eq. 19, we can estimate the Bayesian predictive density f ðlog Y*j log ZÞ by first drawing an MCMC realization of ðb; q; 2 Þ from its posterior distribution, and then using this realization, along with log Z, to draw a realization from the normal distribution given by Eq. 20. Alternatively, we could use importance sampling to simulate from the posterior distribution (see De Oliveira and Ecker 2002, for an illustration of this method in the geostatistical setting).

A loss function approach to identifying environmental exceedances

151

For any j ¼ 1; . . . ; m, the ISEL predictor of log Yðs*j Þ is the mean of Eq. 20, namely c Yðs*j Þ ¼ Eðlog Yðs*j Þj log ZÞ: log

ð21Þ

Now let F1 B ðp; 71 Þ denote the inverse ACDF of the process log YðÞ. Using Eq. 9, we conclude that the IWQSEL predictor of log Yðs*j Þ is given by, Z f *j Þ ¼ Z log logYðs

w*B ðvÞf ðv j log ZÞ v dv ;

ð22Þ

w*B ðvÞf ðv j log ZÞ dv

for j ¼ 1; . . . ; m, where w*B ðÞ is defined in Eq. 10. In practice, the integral in Eq. 22 is evaluated using MCMC; see the next section.

4. Surface-nitrogen concentrations in the Chesapeake Bay In recent years there has been a great deal of interest in monitoring nitrogen levels in the Chesapeake Bay, the largest estuary in the United States. Parts of six states make up the Chesapeake Bay watershed: Delaware, Maryland, New York, Pennsylvania, Virginia, and West Virginia, as well as the District of Columbia. Increased nitrogen levels increase algal growth. The blanketing effect of algal growth, along with the decomposition of these algae, in turn decreases the amount of dissolved oxygen in the water. As a consequence, increasing nitrogen levels pose a substantial threat to the wildlife and ecology of the Bay and the surrounding watershed. In a recent agreement between the Chesapeake Bay Program partners, every state is mandated to regulate the nitrogen levels (both from tributaries and from the air) in their part of the Bay (Chesapeake Bay Program Partnership, 2000). We consider a spatial dataset of 49 measurements of the surface-nitrogen concentration (in mg/l) taken in May 1995 throughout the Chesapeake Bay. This same dataset was used in De Oliveira and Ecker (2002) for hotspot detection. We shall consider different models and inferences than they do, instead characterizing the nitrogen-exceedance regions for the purpose of remediation. This determination is made using the IWQSEL predictor as defined in Section 2.1. The spatial locations of the measurements are denoted by the circles in the left panel of Fig. 1. The shading of each circle denotes the magnitude of the log nitrogen concentration grouped in intervals of 0.4. 4.1. The standard universal-kriging analysis We first consider modeling the observed nitrogen concentrations using the two-stage geostatistical (but not Bayesian) model given by Eqs. 13–14, for which the design matrix and the spatial covariances need to be specified. We select these quantities by carrying out an explanatory spatial data analysis. A scatterplot of log nitrogen concentration versus northing showed an increasing trend from south to north in the Bay. A histogram of the concentrations gave some evidence of a trimodal distribution, with a clear separation of clusters. This may be due to the fact that

152

P. F. Craigmile, N. Cressie, et al.

Fig. 1 The left panel displays a map of the Chesapeake Bay, along with the location of the monitoring sites. The shading of each circle denotes the log nitrogen concentration to the nearest 0.4 value. The circles on the right panel denote the empirical semivariogram of the residuals from the regression model of Section 4.1. The solid line denotes the fitted exponential semivariogram model obtained from REML fitting

the northern region is mostly fresh water, the south is predominately sea water, and the middle is a mix of these two water types. As a first-order approximation to the fit, we capture the substantial south–north trend with a simple linear function of the northing. There was one nitrogen concentration value (at latitude 39.00 and longitude j76.35) whose absolute residual was 2.7 times the residuals’ interquartile range, and we declared it to be an outlier. Nearby values were quite different so that, while recognizing that we wish to make inference on extrema, we deemed that the outlier was a recording error and deleted it from the spatial analysis to follow. In our model, we let X denote the 48 2 design matrix whose first column contains ones and whose second column contains the northing coordinate. Examining directional semivariograms of the residuals, we saw that the correlation between sites is direction-dependent. To correct for this anisotropy, we rescaled the easting direction by a factor of 2.5 but decided that a rotation was not needed (this is not surprising due to the relatively narrow shape of the Bay evident in the left panel of Fig. 1). Examining the resulting isotropic semivariogram (right panel of Fig. 1), there is initial strong positive correlation, with evidence of an oscillatory pattern in the spatial correlations, which is in agreement with De Oliveira and Ecker (2002). We captured this strong positive correlation with a simple exponential model for C q . Here, q ¼ ð1 ; 2 ÞT and the ði; jÞ element of the 48 48 matrix ﬃC q is, ðC q Þi;j ¼ 1 expðjjsi sj jj=2 Þ; where jjsi sj jj qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ ðsi1 sj1 Þ2 þ ð2:5Þ2 ðsi2 sj2 Þ2 , is a scaled Euclidean distance (accounting for anisotropy), and each of si and sj are an ordered pair of northing and easting coordinates given in km on the Universal Transverse Mercator coordinate scale. Observe that the micro-scale-variation term is assumed to be zero, because we have no repeated nitrogen measurements at any site to separately estimate the measurement-error variance and micro-scale variation. The REML estimates of q and 2 obtained by minimizing Eq. 18, and the estimate of b calculated using Eq. 17 are summarized in Table 1 along with estimated standard errors, which were calculated from the large-sample approximations of Cressie and Lahiri (1996). Plotting the empirical semivariogram with the

A loss function approach to identifying environmental exceedances

153

Table 1 REML estimates (standard errors in parentheses) and posterior medians of model parameters. Parameter (interpretation)

REML

Posterior median

1 (intercept) 2 (slope) 1 (sill) 2 (range) 2 (measurement error)

j20.276 (2.753) 0.0047 (0.0007) 0.026 (0.001) 54.666 (2.927) 0.002 (0.0002)

j20.207 0.0047 0.022 50.846 0.006

REML-based fitted semivariogram (right panel of Fig. 1), it was found that the fit was good up to spatial lags of 50 km and the exponential fitted semivariogram clearly captures the gross spatial dependence evident in the empirical semivariogram values beyond 50 km. It is well known that for spatial prediction, a good fit near the origin is paramount (e.g., Stein, 1999). Now we consider spatial prediction using universal kriging. The increasing trend from south to north in the Bay implies that, on average, large surface-nitrogen concentrations occur in the northern portion of the Bay. Since the goal of this analysis is to predict excessive levels, we predict on a 100 100 grid restricted to the north of the Bay, between longitudes ½76:6; 75:7 and latitudes ½38:2; 39:6 (3,681 points). This grid is a discrete approximation to the region of interest, B. We denote these prediction points by fs*j : j ¼ 1; . . . ; 3; 681g. As we do not take into account the error in estimating the parameters b, q, and 2 , the kriging standard errors will tend to be underestimated. In contrast, the Bayesian inference that we present in the next subsection automatically accounts for the uncertainty in the parameters. In what follows, we compare the Bplug-in’’ universal-kriging predictor with the new Bayesian IWQSEL-based predictor, and the Bayesian ISEL-based predictor. 4.2. Bayesian IWQSEL-based predictor We fitted the Bayesian hierarchical spatial model given by Eqs. 13 and 14 together with the following disperse, but proper, priors: 1 Nð20:25; ð0:5Þ2 Þ; 2 Nð0:0047; ð5 104 Þ2 Þ; 2 IGð2; 30Þ; log 1 Nð3:65; 0:05Þ; log 2 Nðlog 50; 0:05Þ; where IGð; Þ denotes an inverse Gamma distribution with density f ðxj; Þ ¼

xðþ1Þ expð1=ðxÞÞ ; GðÞ

x > 0:

The means of the priors for b, q, and 2 were chosen to have values that approximate the REML estimates. We used the Metropolis-Hastings algorithm to generate samples from the posterior distribution (Tierney, 1994; Gilks et al., 1998; Diggle et al., 1998). In particular, we used a Metropolis-Hastings algorithm with symmetric Gaussian proposals (for convenience and speed) to generate samples from the posterior (see, e.g., Roberts, 1998). Following a burn-in of 5,000 realizations, we generated 20,000 realizations from the posterior distribution of

154

P. F. Craigmile, N. Cressie, et al.

the parameters given the data. We assessed convergence by running multiple chains and examining the correlation properties of the chains using autocorrelation plots. As Table 1 illustrates, there is general agreement between the REML estimates of the slope and intercept, and the posterior medians. Compared to the REML estimates, the posterior medians of the range and sill are smaller, but the posterior median of the measurement-error variance is larger than the REML estimate. The posterior summaries were robust to different choices of the prior distribution. Let fðbðkÞ ; qðkÞ ; ð2 ÞðkÞ ; log YðsÞðkÞ ÞT : k ¼ 1; . . . ; 20; 000g denote the post burn-in MCMC draws from the posterior distribution of ðb; q; 2 ; log YðsÞÞT given the data. To approximate the posterior mean (the ISEL predictor) of the true log concentrations, we subsampled every L 40 draws (the subsampling decreases the correlations between successive parameter draws). For each of the parameter values in fðbðkLÞ , qðkLÞ , ð2 ÞðkLÞ ÞT : k ¼ 1; . . . ; 500g, we simulated a draw from the distribution (20) at the predicted grid points. Let flog Yðs j ÞðkLÞ : k ¼ 1; . . . ; 500; j ¼ 1; . . . ; 3; 681g be those simulated values. We now use the IWQSEL function to deal with overshrinkage and hence give a better prediction of log-concentration exceedances. Recognizing that 71 ¼ ðb; 1 ÞT, we estimate the ACDF for the true log concentrations by ! 3;681 y ðxðs*j ÞÞT b 1 X e pﬃﬃﬃﬃﬃ ; y 2 R; FB ðy;;1 Þ F 3; 681 j¼1 1 where FðÞ denotes the cdf of a N(0,1) random variable, and ðxðs*j ÞÞT is the row of the design matrix corresponding to the prediction point s*j in the north of the e Chesapeake Bay. For 0 < p < 1, let Fe1 B ðp; 71 Þ denote the inverse of FB ðy; 71 Þ, defined analogously to Eq. 3. We can then approximate a Blogged’’ version of Eq. 9, the IWQSEL predictor, using f Yðs*j Þ ¼ log

P500

k¼1

ðkLÞ e *B ðlog Yðs*j ÞðkLÞ Þ log Yðs*Þ w j ; P500 ðkLÞ e *B ðlog Yðs*j Þ Þ k¼1 w

for j ¼ 1; . . . ; 3; 681, where " # M G g g 1 X 1 X g1 1 1 e e e *B ðyÞ w I F ; w B G ;71;m < y FB G ;71;m M m¼1 G g¼1 G

ð23Þ

is an estimate of w*B ðyÞ, and f71;m : m ¼ 1; . . . ; Mg denotes a random sample from the prior density, f ð71 Þ, of the parameters. (Regarding the granularity of the integral approximation, we found that M ¼ 500 and G ¼ 500 gave sufficient accuracy based on some preliminary experiments to estimate the weight function.) We used the ðmedÞ sigmoid importance function (12) with a target proportion of ¼ 0:9. Let 71 denote the vector of posterior medians of 71 estimated using the MCMC draws. Then for a given value of C in wðÞ, we define the error made in using the IWQSEL ðmedÞ , by predictor to estimate the ACDF at ¼ 0:9, with 71 ¼ 71 3;681 1 X f e 1 ð0:9; 7ðmedÞ Þ 0:9: I log Yðs*j Þ < F B 1 3; 681 j¼1

A loss function approach to identifying environmental exceedances

155

By varying C in Eq. 12, we found that C ¼ 9:15 (to two decimal places), made this error close to zero. Figure 2 shows the shrinkage of the universal-kriging and ISEL predictors compared to the IWQSEL predictor with ¼ 0:9 and C ¼ 9:15 (the ISEL predictor is studied further in Section 4.3). For a range of proportions p from 0.86 to 0.99 we ðmedÞ ; then, for each predictor, we calculated the inverse ACDF evaluated at 71 calculated the proportion of prediction locations whose predicted values are smaller than this value of the inverse ACDF. Thus, a number larger/smaller than p indicates that the predictor underestimates/overestimates the inverse ACDF (i.e., denotes overshrinkage/undershrinkage). Both the universal-kriging and ISEL predictors overshrink predictions over the range of p considered here. For our choice of ¼ 0:9 and C ¼ 9:15, the IWQSEL predictor matches well with the inverse ACDF around p ¼ 0:9, as expected, and outperforms the other two predictors, in terms of predicting the upper extreme values of the process. 4.3. Bayesian ISEL-based analysis The left-hand panel of Fig. 3 displays the ISEL predictor of the true log concentration at site s*j obtained by approximating Eq. 21 with the average, 500 X c Yðs*j Þ 1 log log Yðs*j ÞðkLÞ : 500 k¼1

ð24Þ

The difference between the ISEL and IWQSEL predictors can be seen in the point-by-point plot of the two over the 2,285 points in our gridding of Chesapeake

Fig. 2 The proportion of grid points with predicted values smaller than the inverse ACDF evaluated at proportion p and the posterior medians for the universal kriging, ISEL, and IWQSEL predictors. The 45o line is shown for comparison. A predictor that predicts extremes well will be close to the 45o line, for large values of p. A predictor that overshrinks when predicting extremes, will be above the 45o line

156

P. F. Craigmile, N. Cressie, et al.

Fig. 3 The ISEL predictor is shown in the left-hand panel. A comparison of the IWQSEL and ISEL predictors is displayed in the right-hand panel

Bay. The range of these differences (IWQSEL minus ISEL) is positive, with a median difference of 0.029, and interquartile range of 0.018. These positive differences demonstrate substantial shrinkage of the ISEL predictor compared to the IWQSEL predictor, across the north of the Bay. 4.4. Prediction of exceedance sets The IWQSEL predictor is used to identify exceedance sets as follows. The exceedance set associated with an absolute threshold y 2 R is eB ðyÞ ¼ fs 2 B : f Yðs*j Þ yg (we could also consider a log YðsÞ yg; which we estimate by fs*j : log

Fig. 4 Two example exceedance sets constructed using the IWQSEL and universal-kriging predictors (the thresholds of 1.212 and 1.256 mg/l correspond to the 90th and 95th percentiles of the inverse ACDF evaluated at the posterior medians of 71 ). The light grey regions denote the universal-kriging-based exceedance sets. The dark grey regions denote the areas of the IWQSELpredictor-based exceedance sets that are not contained in the universal-kriging-predictor-based exceedance sets

A loss function approach to identifying environmental exceedances

157

relative threshold). From this estimate, the left panel of Fig. 4 displays those prediction points that exceed 1.212 mg/l (the 90th percentile of the inverse ACDF ðmedÞ ), and the right panel displays those prediction points that exceed evaluated at 71 ðmedÞ ). We 1.256 mg/l (the 95th percentile of the inverse ACDF evaluated at 71 display only the north-east of the prediction region. This exceedance set is denoted by both the light and dark grey areas, according to which predictor is used. As expected, the largest surface-nitrogen concentrations occur in the very north of the Bay. To highlight the overshrinkage associated with using the standard universalkriging predictor, the light grey area denotes the exceedance set obtained by plugging in the universal-kriging predictor, rather than the IWQSEL predictor. Note the difference in the estimated sets; the overshrinkage of the universal-kriging predictor leads to smaller exceedance sets than those obtained from the IWQSEL predictor (indeed, the universal-kriging-based exceedance set using the larger threshold is empty). We observe a similar pattern when we replace the universalkriging-based exceedance sets with the ISEL-based exceedance sets. Based on Fig. 2 and the discussion in Section 2.3, we conclude that the IWQSEL predictor in Fig. 4 gives the most accurate picture of exceedance sets in the Chesapeake Bay.

5. Discussion In this article, we have proposed a new loss function, IWQSEL, which facilitates the prediction of spatial extrema and their spatial extent, from geostatistical data. Defined in terms of the averaged cumulative distribution function and an importance function, this loss function yields good predictions of exceedances and extrema. The choice of the importance function in the IWQSEL can be calibrated to improve estimation of the ACDF of the underlying process; indeed, when compared to the universal-kriging or ISEL predictor, the IWQSEL predictor suffers less from overshrinkage. In the analysis described in Section 4, it was noticeable how close the universal-kriging predictor and the ISEL predictor were. Thus, the differences seen in Fig. 4 are due to the choice of loss function (IWQSEL versus ISEL), and not the modeling approach (Bayesian versus non-Bayesian). In our approach, we calculate an exceedance set by plugging the spatial predictor into the definition of the exceedance set. For the Chesapeake Bay example, we have demonstrated that the exceedance sets are more extensive using the IWQSEL predictor than if we had used universal kriging. This can have substantial environmental implications, since the extent of remediation will be underestimated if we use universal kriging to estimate the exceedance set. In terms of the ACDF, inference for the supremum is a special case where we consider limp!1 F1 B ðp; 71 Þ. Inference on small values of the hidden geostatistical process is also possible by considering different forms of the importance function wðÞ. For example, we can use the sigmoid function defined by wðpÞ ¼ 1 ð1 þ eCðpÞ Þ1 . A bowl-shaped function for wðÞ can be used for simultaneous prediction of both small and large values of the process, analogous to the suggestion by Wright et al. (2003). We also note that our method applies to processes other than the Gaussian or log Gaussian processes used in this article; it also applies to the type of observed and hidden spatial processes considered by Diggle et al. (1998).

158

P. F. Craigmile, N. Cressie, et al.

There are several potential enhancements of our method. One possibility is to replace the ACDF in Eq. 5 by the spatial CDF GB ðÞ, defined in Section 2.1. The analog of Eq. 4 becomes Z 2 e ðÞÞ e ðsÞ ds; LB ðYðÞ; Y wB ðYðsÞÞ YðsÞ Y ð25Þ B

R1 1 where the weight function is wB ðYðsÞÞ 0 wðpÞ I YðsÞ 2 ðG1 B ðpÞ; GB ðp þ dpÞ : Unfortunately, the Bayes predictor based on Eq. 25 can no longer be computed componentwise given the data, because GB ðÞ is a function of the entire process fYðsÞ : s 2 Bg. It is an area of active research to facilitate computation of the Bayes predictor in this case. Lastly, we note that this work has extensions to more complicated models. In particular, we are investigating exceedances for space–time models. Those models have been applied with considerable success to the analysis of air-quality data (e.g., Dominici et al., 2002; Smith et al., 2003). Acknowledgments This research was partially supported by the Office of Naval Research under grants N00014-02-1-0052 and N00014-05-1-0133. We thank Jian Zhang for assistance with some computations that appear in this article. We would also like to thank Victor De Oliveira for making the Chesapeake Bay dataset available to us, and the editors and referees for helpful comments.

References Chesapeake Bay Program Partnership: Chesapeake 2000. Annapolis, Maryland. www.chesapeakebay. net/pubs/chesapeake2000agreement.pdf (2000) Coles, S.: An Introduction to Statistical Modeling of Extreme Values. Springer, Berlin Heidelberg New York (2001) Coles, S.G., Tawn J.A.: Modelling extremes of the areal rainfall process. J. R. Stat. Soc., Ser. B 58, 329–347 (1996) Cressie, N.: When are census counts improved by adjustment? Sur. Methodol. 14, 191–208 (1988) Cressie, N.: Statistics for Spatial Data (Revised edition). Wiley, New York (1993) Cressie, N., Lahiri, S.N.: Asymptotics for REML estimation of spatial covariance parameters. J. Stat. Plan. Inference 50, 327–341 (1996) Cressie, N., Stern, H., Wright, D.: Mapping rates associated with polygons. J. Geogr. Syst. 2, 61–69 (2000) Davison, A.C., Smith, R.L.: Models for exceedances over high thresholds (with discussion). J. R. Stat. Soc., Ser. B 52, 393–442 (1990) De Oliveira, V., Ecker, M.D.: Bayesian hot spot detection in the presence of a spatial trend: Application to total nitrogen concentration in Chesapeake Bay. Environmetrics 13, 85–101 (2002) Diggle, P.J., Tawn, J.A., Moyeed, R.A.: Model-based geostatistics (with discussion). Appl. Stat. 47, 299–326 (1998) Dixon, M., Tawn, J., Vassie, J.: Spatial modelling of extreme sealevels. Environmetrics 9, 283–301 (1998) Dominici, F., Daniels, M., Zeger, S., Samet, J.: Air pollution and mortality: estimating regional and national dose-response relationships. J. Am. Stat. Assoc. 97, 100–111 (2002) Ghosh, M.: Constrained Bayes estimation with applications. J. Am. Stat. Assoc. 87, 533–540 (1992) Gilks, W.R., Richardson, S., Spiegelhalter, D.J.: Markov Chain Monte Carlo in Practice. Chapman & Hall, London (1998) Gilleland, E., Nychka, D., Schneider, U.: Spatial models for the distribution of extremes. Technical report, Summer Institute at the Center on Global Change, Duke University (2004) Huijbregts, C., Matheron, G.: Universal kriging (an optimal method for estimating and contouring in trend surface analysis). In: McGerrigle, J.I. (ed.) Proceedings of Ninth 28 International Symposium on Techniques for Decision-Making in the Mineral Industry, The Canadian Institute of Mining and Metallurgy, special volume 12, pp. 159–169 (1971)

A loss function approach to identifying environmental exceedances

159

Kitanidis, P.: Statistical estimation of polynomial generalized covariance functions and hydrologic applications. Water Resour. Res. 19, 909–921 (1983) Lahiri, S.N., Kaiser, M.S., Cressie, N., Hsu N.-J.: Prediction of spatial cumulative distribution functions using subsampling. J. Am. Stat. Assoc. 94, 86–97 (1999) Louis, T.A.: Estimating a population of parameter values using Bayes and empirical Bayes methods. J. Am. Stat. Assoc. 79, 393–398 (1984) Matheron, G.: Le krigeage universel. Technical Report 1, Cahiers du Centre de Morphologie Mathematique, Fontainebleau, France (1969) McCullagh, P., Nelder, J.A.: Generalized Linear Models (2nd edn.). London: Chapman & Hall (1989) Patterson, H.D., Thompson, R.: Recovery of inter-block information when block sizes are unequal. Biometrika 58, 545–554 (1971) Roberts, G.O.: Markov chain concepts related to sampling algorithms. In: Gilks, W.R., Richardson, S., Spiegelhalter, D.J. (eds.) Markov Chain Monte Carlo in Practice, pp. 45–48. Chapman & Hall, London (1998) Smith, R.L., Kolenikov, S., Cox, L.: Spatio-temporal modeling of PM2.5 data with missing values. J. Geophys. Res.-Atmospheres 108(D24). doi:10.1029/2002JD002914 (2003) Stein, M.L.: Interpolation of Spatial Data: Some Theory for Kriging. Springer, Berlin Heidelberg New York (1999) Stein, A., Turkman, K., Bermundez, P., van Heerd, R., de Brujin, P.: In search of spatial extremes. In: Barnett, V., Stein, A., Turkman, K.F. (eds.) Statistics for the Environment 4: Statistical Aspects of Health and the Environment, pp. 8–26. Wiley, Chichester (1999) Stern, H., Cressie, N.: Inference for extremes in disease mapping. In: Lawson, A. et al. (ed.) Disease Mapping and Risk Assessment for Public Health, pp. 63–84. Wiley, Chichester (1999a) Stern, H.S., Cressie, N.: Posterior predictive model checks for disease mapping models. Stat. Med. 19, 2377–2397 (1999b) Tierney, L.: Markov chains for exploring posterior distributions. Ann. Stat. 22, 1701–1762 (1994) Wright, D., H. Stern, and N. Cressie.: Loss functions for estimation of extrema with an application to disease mapping. Can. J. Stat. 31, 251–266 (2003)

Lihat lebih banyak...

A Loss function approach to identifying environmental exceedances

Descripción

Comentarios