Model comparison for a complex ecological system

Share Embed


Descripción

Model Comparison for a Complex Ecological System C. A. Ferguson, A. W. Bowman, E. M. Scott† Dept. of Statistics, University of Glasgow, Glasgow, U.K.

and L. Carvalho Centre for Ecology & Hydrology, Edinburgh, U.K.

Summary. The ecological consequences of climate change and its interaction with other environmental pressures, such as nutrient pollution, are little understood. For freshwater ecosystems, knowledge of these combined effects is required for water resource management and in particular for successful implementation of the European Community Water Framework Directive 2000, which requires that all surface waters should be at, or above, ‘good status’ by 2016. Statistical analysis of detailed long-term environmental data sets can be used to unravel these combined effects and Loch Leven (Scotland) has one of the largest and most extensive of such data sets. The system has been routinely monitored since 1967 and over this period there has been evidence of climate change and a period of eutrophication and recovery at the loch. Transfer functions, additive models and varying-coefficient models are used to explore the effects of these influences on the complex ecological system at Loch Leven. Keywords: Additive model; Climate change; Eutrophication; Transfer function; Varying-coefficient model

1.

Introduction

Lakes are an important resource in terms of the environment, tourism, recreation and industry and it is, therefore, important to maintain their quality. The European Community (EC) Water Framework Directive (WFD) 2000 requires that ‘good quality’ should be achieved in all lakes by 2016 (European Parliament; 2000). It is therefore important to know if water quality is improving or deteriorating and if the quality is affected by climate change. Specifically for lakes, for example, it is required that they are not impacted greatly by nutrient pollution (eutrophication) and therefore have low levels of phytoplankton biomass (chlorophylla ) and high water clarity. However, little is †Address for correspondence: Claire Ferguson, Dept. of Statistics, 15 University Gardens, University of Glasgow, Glasgow, G12 8QW E-mail: [email protected]

2

C. A. Ferguson et al.

known about the effects of climate change on lake ecosystems and how this pressure interacts with other pressures, such as eutrophication. The purpose of this paper is to investigate and compare statistical techniques to analyse the quality of lake ecosystems through establishing trends and seasonality, incorporating serial correlation and investigating complex-relationships and the effects of covariates on water quality. Loch Leven (Kinross, Scotland) is the largest shallow lake in Great Britain and the Centre for Ecology & Hydrology (CEH), in Edinburgh, have monitored the loch since 1967. Over this period the loch has experienced eutrophication (a deterioration in water quality as a result of increased algae from increased nutrient loading) and subsequent recovery, following reductions in nutrient loading from the catchment. The time series also spans a period of changing climate, with increasing temperatures in winter and spring; details are provided in Carvalho & Kirika (2003). This makes Loch Leven an ideal site for investigating statistical techniques that can differentiate between these features. The 150 variables measured at the loch cover the chemistry, biology and climate with key variables of chlorophylla (algal biomass, an indicator of water quality), nutrients (soluble reactive phosphorus, SRP, and nitrate-nitrogen, NO3 -N), zooplankton (Daphnia; water fleas which graze on algae) and water temperature. For a description and discussion of each variable see Scheffer (1998). Samples were collected from a mid-basin area and the sampling dates reflect a mixture of weekly, biweekly and monthly intervals, with a large amount of variability evident for many measurements with the exception of water temperature. Therefore, for this paper the data have been aggregated to monthly means with a natural log transform applied to many of the variables in order to reduce variability and stabilize the variance. As a result of various events in the loch’s history, the system at Loch Leven has to be considered in three sections; 1968-1970, 1971-1983 and 1988-2002. In the early period, Daphnia were absent from the loch until August 1970. This is thought to be the result of large pollution (pesticide) loadings into the loch. During the period 1971-1983, the re-introduction of Daphnia resulted in substantial changes in the system, particularly reductions in algal abundance (log chlorophylla ; Figure 1). By the mid-1980s, Loch Leven was still showing symptoms of eutrophication. Therefore, in an attempt to improve the ecology and water quality of the loch, the phosphorus loadings from

Model Comparison

3

the catchment were reduced further (Bailey-Watts & Kirika; 1987, 1999). There are three years of almost complete missing data in the 1980’s (1984, 1986 and 1987), but continuous monitoring since 1988. Since 1997, internal loadings of nutrients have also reduced and the loch is believed to be moving towards a more sustained state of improved water quality. For Daphnia there are another seven months of zeros in the data series, starting in September 1998. These zeros are thought to be

5 4 3 2 1

Log(Chlorophyll, mug/l)

a result of a pesticide loading into the loch which again killed off the Daphnia during this period.

1970

1980

1990

2000

Year

Fig. 1. Monthly Means for Log Chlorophylla from January 1968 to December 2002

Figure 1 displays a time series plot of the monthly means for log chlorophylla from January 1968 to December 2002. The plot illustrates several features of the data, including missing data (particularly large for Daphnia), non-monotonic trends and cycles. As a result of features in the loch’s history, in this paper only the period January 1988 to December 2002 will be considered. However, in this latter period there is still a small proportion of missing data for each variable. Two modelling exercises are considered here with the individual responses of chlorophylla and Daphnia. Covariates for each response were selected after discussion concerning ecological relationships and hence chlorophylla is considered to be a response (as an indicator of water quality) with covariates of SRP, Daphnia, water temperature (water temp) and NO3 -N. Daphnia is also treated as a response (as an indicator of grazer pressure on algae) with covariates of chlorophylla (food resource) and

4

C. A. Ferguson et al.

water temperature. Lagged relationships are typical between ecological variables such as those collected at Loch Leven. Therefore, transfer functions are considered in Section 2 to help identify potential lagged relationships between variables. However, transfer functions are linear time series models and hence additive models are used in Section 3 to relax the assumption of linearity, while incorporating terms for trend, seasonality and lagged variables. Knowledge of the ecological system suggests that relationships between the variables in the system change throughout the year. After initial exploratory analysis using the standard techniques in Sections 1 and 2, it is of interest to consider models which allow the nature of the relationships between responses and covariates to change. These are explored here in Section 4 through the use of time varying-coefficient models for situations where the relationships between responses and covariates appear linear. Conclusions and discussion of further work is given in Section 5. 2.

Transfer Functions

Transfer functions, discussed in Box et al. (1994) and Young (1984), are a natural starting point in the identification of lagged relationships. Transfer functions (1) are linear time series models consisting of lagged covariates used to describe the response in a system.

yt = α +

k X ωi (B) i=1

δi (B)

xi,t−b +

θ(B) at φ(B)

(1)

Here, yt is the response and xi,t−b is the ith covariate with lag b, Bxt = xt−1 , at ∼ WN(0,σ 2 ) where WN is white noise and ωi (B), δi (B), θ(B), φ(B) are polynomials. Lagged relationships can be difficult to identify for multiple covariates. Therefore, in order to fit these models an automatic identification procedure SMISO (an abbreviation for Seasonal Multiple Input Single Output) has been used. Details of the model fitting and estimation are provided in Chiogna et al. (2003). SMISO fits and identifies models by implementing an Iterative Stepwise Gauss Newton algorithm which searches through all models subject to certain conditions which are pre-specified. The Bayesian Information Criterion (BIC) is used to compare different model structures and the model which minimizes the BIC is chosen.

Model Comparison

5

For the automatic identification of transfer functions the data are required to be complete and equally spaced. Therefore, two techniques were considered for missing data imputation. The first technique involved decomposing each variable into trend, seasonal and residual components. The de-trended and seasonally adjusted data were used to produce fitted values for each missing data point in the series and the standard deviation, σ, of the residuals for each variable was calculated. At each position, noise was then added to each fitted value, where the noise was constructed from a random N (0, σ 2 ) distribution. The second technique for imputation involved using a multivariate normal conditional distribution to estimate a missing point(s) given the observed values on either side and assuming a correlation structure based on an AR(1) process. 500 simulations were run for each technique and ‘new’ values for each missing data point were constructed through averaging the simulated results. These imputation techniques were felt to be adequate for the Loch Leven data and both methods gave similar results. However, further discussion of missing data imputation techniques is provided in Schafer (1997) and for environmental data in Latini (2004).

2.1.

Transfer function results

In order to fit the transfer functions using SMISO the missing values for the monthly mean data were imputed using the two techniques described above and the complete data with imputed values from technique two have been used here. Therefore, complete data were created from April 1989 to December 2001 for log chlorophylla as a response and from April 1989 to December 2002 for log Daphnia as a response. The reduction in time period is a result of the second technique requiring at least one initial and one final data point. When Daphnia is treated as a response variable, the seven month period of zeros from September 1998 have been treated as missing data and imputed using the above method. For each model fitted using SMISO, the conditions specified for model selection were selection of the level of differencing that minimized the BIC (i.e. first and twelfth differencing to account for trend and seasonality, if necessary) and a search up to four lags (back into the previous season). In SMISO only the lags 1 to 4 are searched, variables at the same time point are not considered. Results are given, in Models (2) and (3) respectively, for the individual responses of log chlorophylla , with covariates of log SRP, log Daphnia, water temperature and log NO3 -N, and

6

C. A. Ferguson et al.

for log Daphnia, with covariates of log chlorophylla and water temperature. The standard errors for each coefficient are provided in brackets.

log(chlorophylla )t

=

3.92(0.15) − 0.18(0.06)log(SRP)t−4 −0.10(0.03)log(Daphnia)t−1 +

log(Daphnia)

=

0.11(0.02)watertempt−1 +

1 at 1 − 0.52(0.07)B

(1 + 0.32(0.12)B 2 ) at (1 − 0.72(0.08)B + 0.50(0.09)B 2

(2)

(3)

Model (2) highlights a negative lag 4 log SRP term and a negative lag 1 log Daphnia term with errors based on an AR(1) process. In Model (3), only a lag 1 water temperature term is identified along with a more complicated ARMA error structure. In both models no differencing was selected, highlighting that differencing to account for trend and seasonality does not improve either model. Ecologically, the negative lag 1 log Daphnia term and positive lag 1 water temperature term are indicative of predator grazing on prey and higher temperatures being associated with greater numbers ofDaphnia. The negative lag 4 term for log SRP is difficult to interpret ecologically, indicating that high nutrient levels in one season are related to low chlorophylla in the next. This highlights a need for further work to examine changes in relationships throughout the year and this issue is revisited in Section 4. Some issues relating to the identification of these models should be noted. Changing the criteria for fitting each model and imputing the missing data various times can occasionally result in slightly different models being identified. However, in both of these situations the BIC values produced are always very similar to those for each model above. For Model (2), selected for log chlorophylla , the BIC was -154.9 and for Model (3), selected for log Daphnia, the BIC was 73.7. Transfer functions have been used here to aid in the identification of potential lagged variables. However, there is no reason to assume linearity in a natural system such as that at Loch Leven and hence it is of interest to relax the assumption of linearity, to allow nonparametric components. It is of particular interest to incorporate terms for trend and seasonality in such models. The loch’s history provides evidence of nonparametric trends since 1968, particularly for chlorophylla , and

Model Comparison

7

seasonality should be considered as a cyclic term. Additive models have therefore been used, as described below. 3.

Additive Models

Additive models, discussed in Stone (1985), Buja et al. (1989) and Hastie & Tibshirani (1990), can be used to model a response (yi ) in a flexible nonparametric manner by expressing the effects of covariates (x1 , ....xk ) in smooth arbitrary functions, such as in (4).

yi = α +

k X

mj (xji ) + εi ,

i = 1, ..., n.

(4)

j=1

Here, mj (xj ) is a smooth function of the jth explanatory variable and the errors εi are assumed to be independent with variance σ 2 . For time series data, methods for additive models that incorporate correlated errors and circular smoothers, for cyclic terms such as ‘month of the year’, have been developed in Giannitrapani et al. (2005) and these are applied here to the Loch Leven data. The methods in Giannitrapani et al. (2005) use the local linear method of smoothing (Cleveland; 1979) for the components mj (xj ). In this approach, an estimate for an observed data point is constructed by fitting a linear model locally using weights to concentrate on the specific point of interest. For a single covariate x, the estimator m(x) ˆ is taken as the least squares estimator α ˆ of: min α,β

n X

{yi − α − β(xi − x)2 }w(xi − x; h).

i=1

The weight function w(xi −x; h) is taken to be a normal density function with mean 0 and standard deviation h. For cyclic components, a local mean estimator is used and the weight function is defined as:

1

w(xi − x, h) = e h cos(2π

(xi −x) ) 12

(5)

Giannitrapani et al. (2005) modify the above local linear/mean methods to incorporate correlated errors by assuming the variance matrix to be of the form V σ 2 , where V is a correlation matrix.

8

C. A. Ferguson et al.

Models are fitted by employing the backfitting algorithm, details of which are provided in Hastie & Tibshirani (1990).

3.1.

Additive model testing

For additive models, it is of interest to assess whether covariates are significant in addition to the other model components and if nonparametric, rather than linear, terms are required. This allows significant relationships between variables to be identified along with the nature of the relationship. In order to test the components, approximate F-tests (6) have been used to compare the full model to a reduced or semi-parametric model, following Hastie & Tibshirani (1990).

F =

(RSS2 − RSS1 )/(df2 − df1 ) RSS1 /df1

(6)

Here, RSS1 and RSS2 are the residual sum of squares for the full and reduced models respectively, and df1 and df2 are the approximate degrees of freedom for error. In the context of independent data, the F statistic can be referred to an F distribution with (df2 − df1 ) and df1 degrees of freedom. While this provides a useful reference by analogy with linear models, an alternative approach involving expressing the F statistic in terms of quadratic forms (QF) can also be used, with p-values calculated from general results about the distribution of a quadratic form in normal variables. This approach is detailed in Bowman & Azzalini (1997) and Giannitrapani et al. (2005). For additive models, the residual sum of squares and approximate degrees of freedom for error have been adjusted in Giannitrapani et al. (2005) to incorporate correlated errors. Using the projection matrices, P , from the fitted models y ˆ = P y, the residual sum of squares is given by Pn yT (I − P )T V −1 (I − P )y. For the additive model, mˆj = Pj y and P = j=0 Pj . The mean value of this residual sum of squares (RSS) can be written as: E(RSS) = E(yT (I − P )T V −1 (I − P )y) = tr(V (I − P )T V −1 (I − P ))σ 2 + bt V −1 b where b is a bias vector. It then follows that the approximate degrees of freedom for error, required in testing models, can be taken to be df = tr(V (I − P )T V −1 (I − P )). From this, σ ˆ 2 can be estimated using RSS/df and for each of the components, mj , the standard errors are taken to ˆ2. be the square root of the diagonal entries of var{m ˆ j } = Pj V PjT σ

Model Comparison

3.2.

9

Additive model results

Additive models can be implemented with irregular time series. Therefore, the original monthly means before imputation were used for fitting these models. Models with log chlorophylla as a response were considered from April 1989 to December 2001 and for log Daphnia as a response from April 1989 to December 2002 (with the seven months of zeros from September 1998 treated as missing). Four models were considered here namely: log(chlorophylla )t

= α + m1 (yeart ) + m2 (montht ) + m3 (log(SRP)t )

(7)

+m4 (log(Daphnia)t ) + m5 (watertempt ) + m6 (log(NO3 −N)t ) + εt

log(chlorophylla )t

= α + m1 (yeart ) + m2 (montht ) + m3 (log(SRP)t−4 )

(8)

+m4 (log(Daphnia)t−1 ) + εt

log(Daphnia)t

= α + m1 (yeart ) + m2 (montht ) + m3 (log(chlorophylla )t )

(9)

+m4 (watertempt ) + εt

log(Daphnia)t

= α + m1 (yeart ) + m2 (montht ) + m3 (watertempt−1 ) + εt

(10)

Models (7) and (9) were fitted to include all covariates of interest for each response and Models (8) and (10) were fitted to compare results with the transfer function models. In each model the errors are assumed to have variance matrix V σ 2 where V is a correlation matrix based on an AR(1) process and since the month term in each of these models is defined on a cyclical scale the weight function (5) was used for this component. The original smoothing matrices, before backfitting, for each component, mj , can be represented by Sj . Hastie & Tibshirani (1990) suggest that the approximation tr(Sj ) − 1 can be used to calculate degrees of freedom (df ) for each component in the model and hence to select the respective smoothing parameters (h). For all examples in this paper smoothing parameters are determined based on df = 4. In order to estimate the correlation for the errors, the models were initially

10

C. A. Ferguson et al.

run under an assumption of independent errors to obtain residuals. The lag 1 correlation of the residuals, ρˆ, was then calculated and used to develop a correlation matrix, V , based on an AR(1) process i.e. the (i, j)th entry of V is ρˆ|i−j| . After fitting the models a plot can be produced for each smooth component, as illustrated in Figure 2 for Model (7). In each plot the individual components are plotted against the fitted values for that particular component with the shaded band indicating ± 2 standard errors, where the standard errors can be calculated as indicated in Section 3.1. Using the approximate F-test and a test based on expressing the F statistic in terms of quadratic forms (QF), with modified residual sum of squares and degrees of freedom for error to incorporate correlation, the model components were tested in terms of significance and linearity.

Log chlorophylla as a response

1994

1998

2002

1 -2

-1

0

m3(log(SRP))

2

3 1 -2

-1

0

m2(month)

2

3 2 1

2

4

6

8

10

12

1

2

3

4

d) estimate of m4(log Daphnia) for log chlorophylla

e) estimate of m5(water temp) for log chlorophylla

f) estimate of m6(log NO3-N) for log chlorophylla

-4

-2

0

2

log(Daphnia, individuals/l)

4

2 1 -1

0

m6(log(NO3-N))

2 1 -1 -2

-2

-1

0

1

m5(water temp)

2

3

log(SRP,mug/l)

3

month

3

year

0

m1(year)

0 -1 -2 1990

m4(log(Daphnia))

c) estimate of m3(log SRP) for log chlorophylla 3

b) estimate of m2(month) for log chlorophylla

a) estimate of m1(year) for log chlorophylla

-2

3.2.1.

5

10 water temp,oC

15

20

-4

-3

-2

-1

0

1

log(NO3-N,mg/l)

Fig. 2. Model (7) for log chlorophylla as a response. Separate component plots for each covariate with shaded regions to identify ± 2 x standard errors from the estimate, correlation = 0.45, h = 1.40, 0.34, 0.45, 0.99, 2.20, 0.63.

Model Comparison

11

Table 1. Testing for no effect and linearity in Models (7) and (8) Variable year month log(SRP) log(Daphnia) water temp log(NO3 -N) Variable year month log(SRP)lag 4 log(Daphnia)lag 1

Testing for no effect and linearity in Model (7) F-Test no effect QF no effect F-Test linearity 0.723 0.225 0.396 0.858 0.145 0.951

0.748 0.194 0.442 0.831 0.145 0.792

0.762 1 0.772 0.120 1

Testing for no effect and linearity in Model (8) F-Test no effect QF no effect F-Test linearity 0.790 0.103 0.669 0.002

0.822 0.085 0.681 0.001

0.703 1 0.073

QF linearity 0.810 0.902 0.754 0.107 0.999 QF linearity 0.747 1 0.127

For log chlorophylla as a response two models were fitted, Models (7) and (8). The results for Model (7) are given in Figure 2 and Table 1. Figure 2 illustrates very little evidence of trend or seasonality and the relationships between log chlorophylla and the other components appear to be mainly negative, particularly for the nutrients. The most likely explanation for this is that increasing chlorophylla (algae) is driving nutrient concentrations down (through uptake for growth), rather than increasing chlorophylla being a response to decreasing nutrients; see Section 5 for further discussion. It can be seen in Table 1 that for the full model none of the components are significant (all p-values > 0.14) and the assumption of linearity appears reasonable for all terms. However, Figure 2 indicates a reasonably strong negative relationship between log N03 -N and log chlorophylla and it is surprising, therefore, that this is not identified as a significant relationship in the p-value for no effect (0.951, 0.792). NO3 -N and the month of the year are strongly related (a concurvity effect) and hence when log NO3 -N is included in the model this component explains a large part of the seasonal signal for log chlorophylla . Therefore, when log NO3 -N is excluded from the model the smooth function for month increases to describe this seasonality. This transfer of effects has resulted in the non-significant p-values for log NO3 -N. For consistency with the transfer function results an additive model with lagged terms, Model (8), can also be fitted and the results can be seen in Figure 3 and Table 1. Again, linear terms appear adequate in all cases with the exception of a borderline p-value for the F-test for log Daphnia lag

12

C. A. Ferguson et al.

1 in Model (8). The transfer function, Model (2), selected a model with no differencing to account for trend and seasonality and while the trend term is non-significant (0.790, 0.822) in Table 1, the month term is almost borderline significant (0.103, 0.085) suggesting weak seasonality. The month term, in Figure 3, illustrates that over this period, peak chlorophylla values generally occur in summer (July/August) and minima occur generally in spring (April/May). The lag 4 term for log SRP is not significant for this model in addition to log Daphnia lag 1 and the relationship with log chlorophylla appears negative again (Figure 3). However, the lagged term for log Daphnia is significant (0.002, 0.001). These results appear reasonably consistent with the transfer function model with the exception of the log SRP lag 4 term.

2 1 -2

-1

0

m2(month)

1 0 -2

-1

m1(year)

2

3

b) estimate of m2(month) for log chlorophylla

3

a) estimate of m1(year) for log chlorophylla

1990

1992

1994

1996

1998

2000

2002

2

4

year

8

10

12

2 1 0 -1 -2

-2

-1

0

1

m4(log Daphnia lag 1)

2

3

d) estimate of m4(log Daphnia lag 1) for log chlorophylla

3

c) estimate of m3(log SRP lag 4) for log chlorophylla

m3(log SRP lag 4)

6 month

0

1

2

3

log(SRP, mug/l)lag 4

4

-4

-2

0

2

4

log(Daphnia, individuals/l)lag 1

Fig. 3. Model (8) for log chlorophylla as a response. Separate component plots for each covariate with shaded regions to identify ± 2 x standard errors from the estimate, correlation = 0.42, h = 1.42, 0.34, 0.47, 1.01.

3.2.2.

Log Daphnia as a response

For log Daphnia as a response Models (9) and (10) were considered. Figure 4, for Model (9),

Model Comparison

13

Table 2. Testing for no effect and linearity in Models (9) and (10) Testing for no effect and linearity in Models (9) F-Test no effect QF no effect F-Test linearity

Variable year month log(chlorophylla ) water temp

0.586 0.065 0.262 0.626

0.607 0.035 0.262 0.695

0.491 0.365 0.568

Testing for no effect and linearity in Model (10) F-Test no effect QF no effect F-Test linearity

Variable year month water temp lag 1

0.530 0.150 0.623

QF linearity

0.478 0.353 0.515

0.543 0.115 0.689

QF linearity

0.448 0.459

0.455 0.503

illustrates a moderate seasonal signal (through the month term), a mild negative relationship with log chlorophylla and a weaker positive relationship between water temperature and log Daphnia. Figure 5, for Model (10), illustrates a stronger positive relationship when water temperature is lagged by 1 month than seen in Figure 4.

2 1 0

m2(month)

-3

-2

-1

0 -1 -3

-2

m1(year)

1

2

3

b) estimate of m2(month) for log Daphnia

3

a) estimate of m1(year) for log Daphnia

1990

1992

1994

1996

1998

2000

2002

2

4

year

8

10

12

2 1 0 -1 -2 -3

-3

-2

-1

0

1

m4(water temp)

2

3

d) estimate of m4(water temp) for log Daphnia

3

c) estimate of m3(log chlorophylla) for log Daphnia

m3(log chlorophylla)

6 month

2

3

4

log(chlorophylla, mug/l)

5

5

10

15

20

water temp, oC

Fig. 4. Model (9) for log Daphnia as a response. Separate component plots for each covariate with shaded regions to identify ± 2 x standard errors from the estimate, correlation = 0.33, h = 1.51, 0.31, 0.36, 2.18.

14

C. A. Ferguson et al.

2 1 0

m2(month)

-3

-2

-1

0 -1 -3

-2

m1(year)

1

2

3

b) estimate of m2(month) for log Daphnia

3

a) estimate of m1(year) for log Daphnia

1990

1992

1994

1996

1998

2000

2002

2

year

4

6

8

10

12

month

1 0 -1 -2 -3

m3(water temp lag 1)

2

3

c) estimate of m3(water temp lag 1) for log Daphnia

5

10

15

20

water temp, oC, lag 1

Fig. 5. Model (10) for log Daphnia as a response. Separate component plots for each covariate with shaded regions to identify ± 2 x standard errors from the estimate, correlation = 0.37, h = 1.52, 0.32, 2.22.

Table 2 highlights that in both models the assumption of linearity appears reasonable for all terms. All terms also appear non-significant with the exception of a (borderline) significant month term in Model (9).

In summary, for all of the models fitted in this section it appears that linear relationships are reasonable. The lack of significant terms in the additive models encourages the use of other approaches to explore relationships between variables. Ecologically, it is likely that relationships between variables change throughout the year and particular interest lies in relationships between variables in the winter and spring where temperature increases have been greatest. Therefore, since many of the relationships in Section 3 appear linear and there is little evidence of trend and seasonality, the following time varying-coefficient models were fitted with the modifying variable of month (r) to investigate how relationships change throughout the year.

Model Comparison

4.

15

Varying-Coefficient Models

A varying-coefficient model (Hastie & Tibshirani; 1993), of form (11), was considered with a response (yi ), linear covariates (x1 , ...., xk ) and modifying variable r.

yi = α(r) +

k X

βj (r)xji + εi

i = 1, ...., n

(11)

j=1

The errors, εi , are assumed to be independent with variance σ 2 . Approaches for statistical estimation and model testing of varying-coefficient models are discussed in Fan & Zhang (1999, 2000) with nonparametric varying-coefficient models considered in Hoover et al. (1998) and semivaryingcoefficient models in Xia et al. (2004). For the Loch Leven data, the errors εi are assumed to have variance matrix V σ 2 where V is a correlation matrix and r is taken to be a seasonal (within year) modifier. The coefficients are allowed to change smoothly by month (r) using weights (5) to construct a circular smoother . For a particular value of r, Model (11) can be written in the vector matrix form Y = Xθ, where θ is the vector of parameters α, β1 , ..., βk for each data point i. This model can be fitted using generalised (weighted) least squares (12) to incorporate the correlation matrix V into the model, where again the correlations are based on an AR(1) process.

θˆ = (X T Wi V −1 X)−1 X T Wi V −1 y = Ay

(12)

This provides estimates of the parameters α ˆ and βˆj for each month of the year, where Wi is a diagonal matrix of weights, corresponding to data point i, constructed from (5). The standard errors can be estimated for the parameters for each month by calculating the square root of the diagonal entries of:

ˆ = cov(θ)

cov(AY )

= Acov(Y )AT ' AV AT σ ˆ2

(13)

16

C. A. Ferguson et al.

4.1.

Varying-coefficient model testing

As with additive models, it is of interest to test whether the covariates in a varying-coefficient model are significant and whether varying-coefficient, rather than linear with constant coefficient, terms are required. Section 3.1 described model testing for additive models and similar results can be used for the varying-coefficient models, where the projection matrix P is now derived from the fitted varying-coefficient models. This can be formed by calculating each row in the projection matrix, for each data point, i, in the time series, using: Pi = Xi (X T Wi V −1 X)−1 X T Wi V −1 Therefore, the residual sum of squares and approximate degrees of freedom for error can be formed as in Section 3.1 using the above projection matrix. The varying-coefficient terms, βj (r)xj in Model (11), can be compared to corresponding linear components with constant coefficients, βj xj , by fitting semivarying-coefficient models, such as (14), using the backfitting algorithm with projection matrices updated at each iteration, in a similar way to Giannitrapani et al. (2005).

yi

= α(r) +

k X

βl (r)xli + βj xji + εi

(14)

l=1,l6=j

4.2.

Varying-coefficient model results

Varying-coefficient models can be implemented with irregularly spaced time series and hence the same periods of data that were considered with the additive models are used here. Four models were considered, Models (15) to (18). These models contain the same covariates for each response as in Section 3, for additive models, and Models (16) and (18) contain the terms suggested by the transfer function model approach: log(chlorophylla )t

= α(r) + β1 (r)log(SRP)t + β2 (r)log(Daphnia)t +β3 (r)watertempt + β4 (r)log(NO3 −N)t + εt

(15)

Model Comparison

log(chlorophylla )t

= α(r) + β1 (r)log(SRP)t−4

17

(16)

+β2 (r)log(Daphnia)t−1 + εt

log(Daphnia)t

= α(r) + β1 (r)log(chlorophylla )t

(17)

+β2 (r)watertempt + εt

log(Daphnia)t

= α(r) + β1 (r)watertempt−1 + εt

(18)

In all cases the smoothing parameter for month was taken to be the same as the smoothing parameters used for the corresponding models in Section 3. For each Model, (15) to (18), a plot has been produced for each component in Figures 6 to 9, excluding the varying-intercept term α(r). For these, sequences of values were constructed to span the months of the year and the component of interest (horizontal axis) and fitted values for the combined grid (vertical axis) were calculated using the estimated coefficients, for the component of interest, from the fitted model for each month. These fitted values are illustrated by the plane surface. The shaded patterns on the plane indicate the distribution of points for each component in each month i.e. those points on the x-axis that fall within the range of values for a particular month. The components were tested for ‘no effect’ and ‘linearity’ for each model using the approximate F-test and test based on quadratic moments (QF) with the residual sum of squares and degrees of freedom modified to incorporate correlated errors, as described in Section 3.1 and 4.1.

4.2.1.

Log chlorophylla as response

Figure 6 displays the component plots corresponding to Model (15). They illustrate that the relationship between log SRP and log chlorophylla is negative in the winter months but flattens out in summer, while for log Daphnia there appears to be very little change in the slope of the relationship over time. The main relationship for water temperature with log chlorophylla appears to be in spring with higher temperatures related to lower values for log chlorophylla and negative slopes in spring are also apparent for log NO3 -N. For Model (15), Table 3 highlights that, with the exception of log Daphnia, all other terms

C. A. Ferguson et al. B1(r)(log SRP)

B2(r)(log Daphnia)

r) B2(

2 1 0 −1 −2 −3

h Dap

th

8

(S 2 RP ,

m on

1 log

6 m3 ug /l)

8

ph 0 nia ,

4 4

12 10

−4 log−2 (D a

th

nia)

12 10

2

m on

RP) gS r)(lo

(log

B1(

2 1 0 −1 −2 −3

6 4

ind 2 /l)

4

B3(r)(water temp)

2

B4(r)(log NO3−N)

B4(

r)(w B3(

2 1 0 −1 −2 −3

6

m

te10 m p,

4

oC 15

2

lo−4 g( NO

8

th

on

th

8

er

10 6

3−−2 N, m

on

p)

5 wa t

12

N)

tem

12 10

O3−

gN

ater

r)(lo

2 1 0 −1 −2 −3

m

18

4 g/ 0 l)

2

Fig. 6. Model (15) for log chlorophylla as a response. (Top Left) Fitted values for relationship with log SRP and (Top Right) fitted values for relationship with log Daphnia. (Bottom Left) Fitted values for relationship with water temperature and (Bottom Right) fitted values for relationship with log NO3 -N. Correlation = 0.34 and smoothing parameter (h) for month = 0.34.

appear to be significant or borderline varying-coefficient terms, depending on the method of calculating the p-value that was used, with p-values of (0.072, 0.045), (0.099, 0.068) and (0.088, 0.059) respectively. Therefore, varying-coefficient terms should be considered for log SRP, water tem-

Model Comparison

19

Table 3. Testing for no effect and linearity (linear components with constant coefficients) in Models (15) and (16) Testing for no effect and linearity in Model (15) F-Test no effect QF no effect F-Test linearity

Variable log(SRP) log(Daphnia) wtemp log(NO3 -N)

0.002 0.546 0.144 0.012

6.53×10 0.575 0.108 0.003

QF linearity

0.072 0.408 0.099 0.088

−5

0.045 0.418 0.068 0.059

Testing for no effect and linearity in Model (16) F-Test no effect QF no effect F-Test linearity

Variable log(SRP) lag 4 log(Daphnia) lag 1

0.493 0.067

0.511 0.054

QF linearity

0.715 0.987

0.757 0.989

perature and log NO3 -N. This table also highlights that these variables each have a significant or borderline effect in addition to the other model components.

B1(r)(log SRP lag 4)

B2(r)(log Daphnia lag 1)

B2(

2 1 0 −1 −2 −3

m3 ug /l)

4 4

2

th

8

ph 0 nia ,

6

on

m

6

on

th

8

(S 2 RP ,

ia)

1 log

12 10

−4 log−2 (D a

m

12 10

n aph

RP)

gS

r)(lo

gD r)(lo

B1(

2 1 0 −1 −2 −3

4

ind 2 /l)

4

2

Fig. 7. Model (16) for log chlorophylla as a response. (Left) Fitted values for relationship with log SRP lag 4 and (Right) fitted values for relationship with log Daphnia lag 1. Correlation = 0.48 and smoothing parameter for month = 0.34.

However, for the model with the lagged variables, Model (16), Figure 7 displays the component plots and illustrates only a very small change in the slope for log SRP lag 4 throughout the year and the relationship between log chlorophylla and log Daphnia lag 1 does not appear to change

20

C. A. Ferguson et al. Table 4. Testing for no effect and linearity (linear components with constant coefficients) in Model (17) Variable log(chlorophylla ) water temp

Testing for no effect and linearity in Model (17) F-Test no effect QF no effect F-Test linearity 0.089 1

0.069 1

0.074 1

QF linearity 0.052 0.995

indicating that a varying-coefficient term may not be required. Both components appear to have mainly negative relationships with the response with the slope for log Daphnia appearing to be slightly steeper. Table 3 also contains the results for testing the components in Model (16). It highlights that with all p-values > 0.7 varying-coefficient terms are not required in this model. However, log Daphnia lag 1 has a borderline significant relationship with log chlorophylla (pvalues = 0.067, 0.054). While the log Daphnia lag 1 term is not significant as a varying-coefficient component it does have a (borderline) significant negative relationship with the response. These results are consistent with those for the additive model.

4.2.2.

Log Daphnia as a response

Figure 8 displays the plots for log Daphnia as a response, Model (17). The relationships with log chlorophylla and water temperature both appear to change throughout the year. The main relationship with water temperature appears to be the positive slopes in spring with a (possibly corresponding) negative relationship between log chlorophylla and the response log Daphnia. Figure 9 (Left) contains a plot for Model (18) illustrating the positive relationship between water temperature (this time lagged by one month) and Daphnia throughout the winter and spring. Table 4 highlights that in Model (17) the varying-coefficient term for log chlorophylla is borderline significant (p-value = 0.07, 0.05) with water temperature not significant. Log chlorophylla also has a borderline significant effect in addition to water temperature. Similar testing for Model (18) highlights that water temperature lag 1 does not have a significant relationship with log Daphnia and is not significant as a varying-coefficient term. However, Figure 9 (Right) displays the coefficients and standard errors for water temperature lag 1, in Model (18), illustrating that while the relationship between water temperature lag 1 and log Daphnia is not significant globally, in winter and spring the relationship is significant with higher temperatures in the corresponding months

Model Comparison B1(r)(log chlorophylla)

21

B2(r)(water temp)

r) B2(

5 0

t ter

0

) emp

−5

th

hlo 3 ro ph

6 yll 4 a, m

4 ug

/l) 5

12 10 5 wa te r

8

(c

m on

lo2g

8

th

12 10

te10 m p,

m on

−5

a) hyll

rop

chlo

(wa

(log

r) B1(

5

6 4

oC 15

2

2

Fig. 8. Model (17) for log Daphnia as a response. (Left) Fitted values for relationship with log chlorophylla and (Right) fitted values for relationship with water temperature. Correlation = 0.34 and smoothing parameter for month = 0.31.

B1(r)(water temp lag 1)

0.4

0.6

0.8

Coefficients for water temperature lag 1

r)(w B1(

12

p)

5 wa t

6

m

te10 m p,

on

th

8

er

4

oC 15

−0.2

10

−0.4

tem

−5

0.2

0

0.0

ater

B1(r)

5

2

2

4

6

8

10

12

month

Fig. 9. Model (18) for log Daphnia as a response. (Left) Fitted values for relationship with water temperature lag 1, correlation = 0.42 and smoothing parameter for month = 0.32. (Right) Coefficients for each month with shaded region to identify ± 2 standard errors.

22

C. A. Ferguson et al.

related to higher values for log Daphnia. There is a strong relationship (concurvity) between month and water temperature which has resulted in non-significant p-values for water temperature and water temperature lag 1 in Models (17) and (18). As a result of this relationship the varyingintercept term based on month, α(r), is dominating the water temperature effect in Models (17) and (18). It was therefore of interest to force a constant intercept term. Comparing the models,

log(Daphnia)t

= α + β1 (r)watertempt−1 + εt

log(Daphnia)t

= α + β1 watertempt−1 + εt ,

produces highly significant p-values of 0.001 for the F-Test and 1.00×10−4 for QF. This indicates that for the model with a constant intercept term the varying-coefficient term for water temperature lag 1 is highly significant. Further testing for ‘no effect’ highlights a significant relationship between water temperature lag 1 and the response (p-values = 4.09 ×10−5 , F-Test, 2.00 ×10−7 , QF).

4.3.

Combining models

For log chlorophylla as a response, it was highlighted that log SRP, water temperature and log NO3 -N should be considered as varying-coefficient terms, Model (15), with log Daphnia lag 1 a linear term, as highlighted by the lagged model, Model (16). For log Daphnia, there is less evidence of significant varying-coefficient terms. However, both log chlorophylla and water temperature lag 1 are retained in the model to explore suspected ecological relationships. Figure 9 highlights that relationships between water temperature lag 1 and logDaphnia are significant in winter and spring and it is of interest to consider similar relationships between log chlorophylla and log Daphnia. Therefore, possible models could be:

log(chlorophylla )t

= α(r) + β1 (r)log(SRP)t + β2 (r)watertempt

(19)

+β3 (r)log(NO3 −N)t + β4 log(Daphnia)t−1 + εt log(Daphnia)t

= α(r) + β1 (r)log(chlorophylla )t +β2 (r)watertempt−1 + εt

(20)

Model Comparison

23

Table 5. Testing for no effect and linearity (linear components with constant coefficients) in Model (19) Variable

Testing for no effect and linearity in Model (19) F-Test no effect QF no effect F-Test linearity

log(SRP) water temp log(NO3 -N) log(Daphnia lag 1)

0.003 0.104 0.001 0.010

3.00×10 0.066 1.56×10−5 0.007 −4

0.050 0.093 0.018 0.599

QF linearity 0.026 0.057 0.004 0.645

Models (19) and (20) were fitted and tested in terms of ‘no effect’ and ‘linearity (constant coefficients)’. The lag 1 correlations of the residuals used for these models were 0.34 for Model (19), when Daphnia lag 1 is considered as a varying-coefficient term, 0.35 when Daphnia lag 1 is linear, and 0.37 for Model (20). Model (19) was initially considered with a varying-coefficient term for log Daphnia lag 1 in order to test whether varying-coefficient terms are required, Table 5. The results here are consistent with the separate models, (15) and (16), with varying-coefficient terms highlighted for log SRP, water temperature and log NO3 -N and a linear term for log Daphnia lag 1. This table also highlights that all of these terms have a significant relationship with the response with the exception of a borderline water temperature term. For Model (20), while the results for log chlorophylla are borderline significant the p-values for water temperature lag 1 are highly non-significant with α(r) dominating the effect of β2 (r)watertempt−1 , in a similar way to Models (17) and (18). However, it is of interest to explore the relationships in each month, using coefficients and standard errors. Figure 10, highlights that relationships in spring, in particular, are significant with high late winter/early spring water temperatures related to high spring Daphnia and possibly as a consequence lower chlorophylla (highlighted by the negative coefficients for β1 (r)).

5.

Conclusions and Discussion

It has been demonstrated that to explore a complex ecological system, such as that at Loch Leven, a variety of models are required. Transfer functions were used to identify lagged relationships and additive models were used to explore relationships between responses and covariates which were shown mainly linear for the period considered. In many cases, however, the relationships do appear

24

C. A. Ferguson et al.

1 0 −2

−1

B2(r)

0 −2

−1

B1(r)

1

2

Coefficients for water temp lag 1

2

Coefficients for log chlorophylla

2

4

6

8

10

month

12

2

4

6

8

10

12

month

Fig. 10. Model (20) for log Daphnia as a response. Coefficients for each month with shaded regions to identify ± 2 standard errors.

to change throughout the year. In comparison to the transfer function and additive model techniques the varying-coefficient models highlight a much clearer ecological picture of changes in relationships between drivers and responses in the system. They highlight that while global relationships between variables may be non-significant and issues of concurvity have to be considered, in particular months of the year relationships may be highly significant. The varying-coefficient models suggest that the main events of the year are in spring. Higher winter and spring temperatures were related to increased spring Daphnia and, presumably as a consequence, less chlorophylla . This could be a key feature when predicting future impacts of climate change, since the greatest increase in water temperature over the current monitoring period has been observed in Spring. This changing phenological shift between predators (Daphnia grazers) and prey (algae) associated with warmer springs is likely to have greatest impact on the water quality of shallow lakes, such as Loch Leven, where control of algae by grazers is most significant. In this paper, log chlorophylla and log Daphnia have been considered as individual responses.

Model Comparison

25

However, of the variables discussed, only water temperature is independent of the system in the sense that it will impact upon the other variables but changes in the other variables will not affect it. Chlorophylla and Daphnia in particular are not independent variables in the system since the Daphnia graze on the phytoplankton. Future work will therefore focus on multivariate models.

Acknowledgements The assistance of Monica Chiogna and Marco Giannitrapani in providing software to fit transfer functions (SMISO) and additive models with correlated errors is gratefully acknowledged. Laurence Carvalho was, in part, funded for this work by the Eurolimpacs Project. Eurolimpacs is funded by the European Union under Thematic Sub-Priority 1.1.6.3 “Global Change and Ecosystems” of the 6th Framework Programme. CEH gratefully acknowledge Loch Leven Fishery for providing access to the loch and assistance with fieldwork over the years.

References Bailey-Watts, A.E., and Kirika, A. (1987). A re-assessment of the phosphorus inputs to Loch Leven (Kinross, Scotland): rationale and an overview of results on instantaneous loadings with special reference to runoff. Earth Sciences, 78, 351–367. Bailey-Watts, A.E., and Kirika, A. (1999). Poor water quality in Loch Leven (Scotland) in 1995, in spite of reduced phosphorus loadings since 1985: the influences of catchment management and inter-annual weather variation. Hydrobiologia, 403, 135–151. Bowman, A.W. and Azzalini, A. (1997). Applied Smoothing Techniques for Data Analysis. Oxford University Press; Oxford. Box, G.E.P., Jenkins, G.M. and Reinsel, G.C. (1994). Time Series Analysis, Forecasting and Control, third edition. Prentice Hall, New Jersey. Buja, A., Hastie, T.J. and Tibshirani, R.J. (1989). Linear smoothers and additive models (with discussion). The Annals of Statistics, 17, 2, 453–510.

26

C. A. Ferguson et al.

Carvalho, L., and Kirika, A. (2003). Changes in shallow lake functioning: response to climate change and nutrient reduction. Hydrobiologia 506-509, 789–796. Chiogna, M., Gaetan, C. and Masarotto, G. (2003). Automatic Identification of Seasonal Transfer Function Models by means of Iterative Stepwise and Genetic Algorithms. preprint. Website: www.dst.unive.it/∼gaetan/Preprints/tf.pdf. Cleveland, W.S. (1979). Robust locally-weighted regression and smoothing scatterplots. Journal of the American Statistical Association, 74, 368, 829–836. European Parliament. (2000). Directive 2000/60/EC. of the European Parliament, establishing a framework for community action in the field of water policy. Official Journal of the European Communities, 327. Fan, J. and Zhang, W. (1999). Statistical estimation in varying-coefficient models. The Annals of Statistics, 27, 15, 1491–1518. Fan, J. and Zhang, W. (2000). Simultaneous confidence bands and hypothesis testing in varyingcoefficient models. Scandinavian Journal of Statistics, 27, 715-731. Giannitrapani, M., Bowman, A. and Scott, E.M. (2005). Additive Models with Correlated Errors. Technical Report No.05-05, Department of Statistics, The University of Glasgow. Website: www.stats.gla.ac.uk/Research/TechRep2005/2005. Hastie, T.J. and Tibshirani, R.J. (1990). Generalized Additive Models. Chapman & Hall; London. Hastie, T.J. and Tibshirani, R.J. (1993). Varying-Coefficient Models. J.Roy.Statist.Soc., Series B, 55, 4, 757–796. Hoover, D.R., Rice, J.A., Wu, C.O. and Yang, L (1998). Nonparametric smoothing estimates of time-varying coefficient models with longitudinal data. Biometrika, 85, 4, 809–822. Latini, G. and Passerini, G. (2004). Handling Missing Data: Applications to Environmental Analysis. WIT Press. Schafer, J.L. (1997). Analysis of Incomplete Multivariate Data. Chapman & Hall; London. Scheffer, M. (1998). Ecology of Shallow Lakes. Chapman & Hall; London.

Model Comparison

27

Stone, C.J. (1985). Additive regression and other nonparametric models. The Annals of Statistics, 13, 2, 689–705. Young, P. (1984). Recursive Estimation and Time-Series Analysis - An Introduction. SpringerVerlag; Berlin. Xia, Y., Zhang, W. and Tong, H. (2004). Efficient estimation for semivarying-coefficient models. Biometrika, 91, 3, 661–681.

View publication stats

Lihat lebih banyak...

Comentarios

Copyright © 2017 DATOSPDF Inc.