A Robust Conflict Measure of Inconsistencies in Bayesian Hierarchical Models

Share Embed


Descripción

doi: 10.1111/j.1467-9469.2007.00560.x © Board of the Foundation of the Scandinavian Journal of Statistics 2007. Published by Blackwell Publishing Ltd, 9600 Garsington Road, Oxford OX4 2DQ, UK and 350 Main Street, Malden, MA 02148, USA Vol 34: 816–828, 2007

A Robust Conflict Measure of Inconsistencies in Bayesian Hierarchical Models FREDRIK A. DAHL Health Services Research Unit, Akershus University Hospital, and Department of Mathematics, University of Oslo

JØRUND GÅSEMYR and BENT NATVIG Department of Mathematics, University of Oslo

ABSTRACT. O’Hagan (Highly Structured Stochastic Systems, Oxford University Press, Oxford, 2003) introduces some tools for criticism of Bayesian hierarchical models that can be applied at each node of the model, with a view to diagnosing problems of model fit at any point in the model structure. His method relies on computing the posterior median of a conflict index, typically through Markov chain Monte Carlo simulations. We investigate a Gaussian model of one-way analysis of variance, and show that O’Hagan’s approach gives unreliable false warning probabilities. We extend and refine the method, especially avoiding double use of data by a data-splitting approach, accompanied by theoretical justifications from a non-trivial special case. Through extensive numerical experiments we show that our method detects model mis-specification about as well as the method of O’Hagan, while retaining the desired false warning probability for data generated from the assumed model. This also holds for Student’s-t and uniform distribution versions of the model. Key words: double use of data, Markov chain Monte Carlo simulations, model evaluation, one-way analysis of variance

1. Introduction Modern computer technology combined with Markov chain Monte Carlo (MCMC) algorithms has made it possible to analyse complex Bayesian hierarchical models. The resulting popularity of complex models has also increased the need for ways of evaluating such models. In a frequentist setting, this is often done by way of p-values, which quantify how surprising the given data set is, under the assumed model. By construction, a frequentist p-value is preexperimentally uniformly distributed on the unit interval, where low values are interpreted as surprising. In the present paper, the term pre-experimental refers to the distribution under the assumed model prior to using the data. Several Bayesian p-values have been suggested over the last few decades. The so-called prior predictive p-value of Box (1980), measures the degree of surprise of the data, according to some metric of choice, under a probability measure defined by the product of the prior and the likelihood given the model. It therefore differs from a frequentist p-value through the introduction of the prior distribution. The prior predictive p-value is a natural choice in cases where the prior of a Bayesian model represents our true beliefs about the distribution of our parameters prior to seeing data. Usually, however, we apply quite vague priors that represent general uncertainty about parameters that could, in principle, be arbitrarily precisely estimated with enough data. In these cases, sampling under the prior makes little sense, and is not even defined for improper priors. Rubin (1984) therefore introduced posterior predictive p-values that rely on sampling hypothetical future replications from the posterior distribution. This construction

Scand J Statist 34

Inconsistencies in Bayesian hierarchical models

817

also allows metrics that evaluate discrepancies between data and parameter values (see Gelman et al., 1996). However, posterior predictive p-values use data twice; both directly through the discrepancy function, and indirectly by sampling from the posterior distribution. This has been criticized by Dey et al. (1998) and Bayarri & Berger (2000), both coming up with alternative approaches. The former paper introduces a simulation-based approach where the posterior distribution given the observed data is compared with a medley of posterior distributions given replicated data sets generated from the prior distribution. Hence, the approach is essentially in accordance with the prior predictive approach. The latter paper suggests two variants; the conditional predictive p-value and the partial posterior predictive p-value, both designed to avoid the double use of data by eliminating the influence of a chosen test statistic on the posterior distribution. Robins et al. (2000) proves that the pre-experimental asymptotic distribution of the posterior predictive p-value is more concentrated around 1/2 than a uniform, as opposed to the two variants of Bayarri & Berger (2000). Hence, as also pointed out by Meng (1994) and Dahl (2006), the posterior predictive p-values tend to be conservative in the sense that extreme values get too low probability. Hjort et al. (2006) analyses this in depth, and designs a double simulation scheme that alleviates the problem. This scheme can be thought of as essentially treating the posterior predictive p-value as a test statistic in itself, and using it in an extensive prior predictive p-value computation. In model choice problems the task is to choose the best model from a given set of candidates. Bayes factors, see Kass & Raftery (1995), provide a useful methodology for such problems. Information criteria give a different approach to model the choice based on weighing model fit against the number of free parameters. The Bayesian information criterion (BIC) was defined by Schwartz (1978) and more recently analysed by Clyde & George (2004). A different information criterion that is used for Bayesian models is the so-called divergence information criterion (DIC) (see Spiegelhalter et al., 2002). Although model evaluation and model choice are related, these tasks are different, and model choice methods cannot readily be applied for the purpose of model evaluation. The variants of Bayarri & Berger (2000) work well in some simple cases, and also for the partial posterior predictive p-value, for a simple hierarchical model as demonstrated in Bayarri & Castellanos (2004). However, it seems difficult to use this method to criticize arbitrary aspects of Bayesian hierarchical models. An approximate cross-validation approach aimed at criticizing such models is given in Marshall & Spiegelhalter (2003). Dey et al. (1998) introduced tools for evaluating different parts of such models. Similarly, we extend and refine in this paper a tool suggested by O’Hagan (2003) for evaluating inconsistencies of a model, through analysis of what he calls information contributions. This is a flexible tool that can be used at any node in the model network. However, in the present paper, we restrict our attention to location parameters. Under suitable conditions, our conflict evaluation for a given node will pre-experimentally be a squared normal variable. Our main hypothesis is that this is close to be true for a larger class of models. This gives a surprise index which is similar to a frequentist p-value. In cases where we have domain knowledge that makes us suspect a given node, we test that one. Otherwise, one should do Bonferronilike adjustments to the significance level to control the overall false alarm probability. This does not mean that we advocate basing the model-building process on a formal hypothesis testing scheme alone. Rather, we envisage an informal procedure, where the conflict analysis suggests points in the model structure that might be problematic. However, without reasonable control over the pre-experimental distribution of the conflicts in the model, it would be difficult to use this tool, in practice, without a computationally demanding empirical normalization. © Board of the Foundation of the Scandinavian Journal of Statistics 2007.

818

F. A. Dahl et al.

Scand J Statist 34

The paper is laid out as follows: in section 2 we explain the original idea of O’Hagan (2003) in the setting of a Gaussian hierarchical model, followed by our modifications of the method. Our modifications include the splitting of data, so as to avoid double use of it, and this is discussed further in section 3. Section 4 gives some theoretical results in a special case of our model. In section 5 we give results from our massive simulation experiments, and section 6 concludes the article. In the Appendix, we give the proofs of the theoretical results in section 4.

2. Measuring conflict O’Hagan (2003) introduces some tools for model criticism that can be applied at each node of a complex hierarchical or graphical model, with a view to diagnosing problems of model fit at any point in the model structure. In general, the model can be supposed to be expressed as a directed acyclic graph. To compare two unimodal densities/likelihoods he suggests the following procedure. First, normalize both densities to have unit maximum height. The height of both curves at their point of intersection is denoted by z. Then the suggested conflict measure is c1 = −2 ln z. In the present paper we consider, as O’Hagan (2003), the simple hierarchical model for several normal samples (one-way analysis of variance) to clarify what we see as problematic aspects of his approach. Observations yij for i = 1, . . ., k and j = 1, . . ., ni are available. The model has the form: i = 1, . . ., k; j = 1, . . ., ni

yij | , 2 ∼ind N(i , 2 ), i | , 2 ∼ind N(, 2 ),

i = 1, . . ., k,

(1)

where  = (1 , . . ., k ), and is completed by a prior distribution for 2 , 2 and . In the model (1), consider the node for parameter i . In addition to its parents  and 2 , it is linked to its child nodes yi1 , . . ., yini . The full conditional distribution of i is given by: p(i | y, −i , 2 , 2 , ) ∝ p(i | , 2 )

ni 

p(yij | i , 2 ),

(2)

j =1

where y = (y11 , . . ., yknk ) is the complete set of data, and −i = (1 , . . ., i−1 , i + 1 , . . ., k ). This shows how each of the ni + 1 distributions can be considered as a source of information about i . When we are considering the possibility of conflict at the i node, we must consider each of these contributing distributions as functions of i . In the present model, contrasting the information about i from the parent nodes with that from the child nodes, the conflict measure simplifies to: c1i =

( − y¯ i )2 , √ ( + / ni )2

(3)

where y¯ i =

ni 1  yij , ni j = 1

noting that the last ni factors of (2) can be written as p(y¯ i | i , 2 ). When the parameters 2 , 2 and  are given by prior distributions, O’Hagan (2003) suggests using MCMC to estimate the quantity 2 , 2 ,

c1,i y, med = M 

|y

(c1i ),

© Board of the Foundation of the Scandinavian Journal of Statistics 2007.

(4)

Inconsistencies in Bayesian hierarchical models

Scand J Statist 34

819

where M denotes the median under the posterior distribution of 2 , 2 and . He claims that a value

p c (y¯ p − y¯ kc )2 (y¯ p − y¯ kc )2 = c2, y , y , ipd , 2 2 > k (1/mk )0 + 0 ((l + 1)/l)((1/mk )20 + 20 )

where l = k in cases S1 and S2 , and mk = n/2 in case S2 , whereas mk = n otherwise. Moreover, CM2 , PA1 , Sj > CM2 , PA2 , Sj for j = 1, 2, 4, 5. Finally, for any of the splittings, any combination of CM and PA give identical calibrated detection probabilities, i.e. CM1 , PA1 , Sj = CM1 , PA2 , Sj = CM2 , PA1 , Sj = CM2 , PA2 , Sj for j = 1, 2, 4, 5. The first of these inequalities shows an exaggeration effect because taking the median exaggerates the numerator, whereas the second inequality demonstrates an additional exaggeration effect arising from a too small variance term in the denominator. This justifies our assertion in section 2 that for the median conflict, the variance terms in the denominator of (5) capture only part of the variability of the numerator. 5. Simulation experiments In this section, we present the results of some simulation experiments, designed to evaluate false alarm probabilities ( values) and calibrated detection probabilities ( values) for the given model. We assume that the prior distributions for 2 , 2 and  are independent. The parameters 2 and 2 are inverse Gamma distributed, both with shape parameter 4, and scale parameters 12 and 3 respectively. The rather vague prior distribution for  is normal with mean 0 and variance 2 = 9. This is an important part of our modelling. Furthermore, we choose k = 6, and ni = n, i = 1, . . ., k. We run identical experiments with n = 10 and n = 100. In the following subsection, we present results for the model, with normally distributed . In the next subsections, we analyse modified versions with Student’s-t and uniformly distributed , so as to illustrate how departure from normality affects the results. 5.1. Normally distributed  Let f (x; , 2 ) be the probability density of the N(, 2 ) distribution. From (1) we arrive at the following likelihood for 2 , 2 , and  L(2 , 2 ,  | yij , i = 1, . . ., k; j = 1, . . ., n)  ∞  ∞ n k  = ... f (yij ; i , 2 ) f (i ; , 2 ) d1 . . .dk −∞

 = =



−∞ i = 1 j = 1

1 2 

k   i =1

(n−1)k



1 2 

2) 

e−(1/2

n−1

yi )2 i, j (yij −¯

n−k/2

k   i =1

1 √ e n

 −(1/22 ) nj= 1 (yij −¯yi )2



−∞



f y¯ i ; i , 2 /n f (i ; , 2 ) di



f y¯ i ; , 2 /n + 2 .

(16)

From the prior distributions and (16) we generated posterior samples of (2 , 2 , ) by the Metropolis–Hastings algorithm. We used a random-walk version of the algorithm, with simultaneous steps in each direction. The steps are constructed as mixtures of centred uniform variables with varying size. After a burn-in of 106 steps, we generated a sample of 10,000 parameter vectors, sampled 1000 simulation steps apart. This gives close to no serial © Board of the Foundation of the Scandinavian Journal of Statistics 2007.

824

F. A. Dahl et al.

Scand J Statist 34

correlation in the sampled points. We do not claim that this is optimal in any sense, but it runs sufficiently fast for the present application. Also a total of 10,000 independent data sets were generated in separate files, each containing kn = 60 and kn = 600 observations. In the test model, the prior expected values are 20 = 4, 20 = 1 and 0 = 0. The corresponding 10,000 alternative data sets have k = 3, i.e. k is located in the tail of its test model distribution N(0, 1). The S1 approach requires only one MCMC run, whereas S2 , S4 , S5 require two MCMC runs each, for a total of seven MCMC runs. These runs must be carried through for 10,000 data sets both from the test model and the alternative model. The total number of MCMC runs is therefore 7 × 2 × 10,000 = 140,000. In order to test our different conflict measures, we have stored the 140,000 posterior distributions in separate files, each contaning 10,000 parameter triplets (, 2 , 2 ). Due to our elimination of the ’s in the likelihood, the MCMC simulations run very efficiently. Therefore, the main challenge of the process has been the management of the posterior distribution files, rather than computing power. In Table 1 we give the  and  values estimated from the experiments with n = 10. In each cell, the upper number is the estimated value, whereas the lower number is the estimated standard error. Table 2 gives the corresponding results for n = 100. We observe from Tables 1 and 2 that, in accordance with theorem 1 and corollary 1, we obtain significance levels quite close to 0.05 when combining our suggested modifications of the approach of O’Hagan (2003), provided we use the vertical splittings (10). In fact, we have CM2 , PA2 , Sj ∈ [0.051, 0.060] for j = 4, 5. The significance levels exceed 0.05 slightly, as suggested by the discussion following theorem 1. We also note that replacing CM2 by CM1 results in a substantial drop in significance level for all combinations of the factors PA and S, confirming proposition 1. For any given combination of the factors CM and PA, such a drop is also observed when replacing S4 , S5 with the no splitting option S1 , and to a somewhat smaller extent with the horizontal splittings (8), represented by S2 and S3 . For the replaced combinations CM2 , PA2 , Sj , j = 1, 2, this is in accordance with corollary 1 and proposition 2. On the other hand, for any combination of CM and S we observe that replacing PA2 with PA1 results in an increase in the significance level. This is to be expected from proposition 3 for the combinations CM2 , Sj , j = 1, 2, 4, 5. The net effect of combining the conflict reducing Table 1. Simulation results for  and  with n = 10 CM1 PA1

 

PA2

 

CM2 PA1

 

PA2

 

S1

S2

S3

S4

S5

0 0.0001 0.70 0.0065 0 0.0001 0.71 0.0065

0.007 0.0003 0.41 0.0038 0.006 0.0003 0.40 0.0039

0.003 0.0002 0.65 0.0058 0.003 0.0002 0.65 0.0058

0.026 0.0010 0.64 0.0063 0.023 0.0008 0.64 0.0064

0.024 0.0013 0.69 0.0152 0.021 0.0014 0.69 0.0149

0.026 0.0006 0.70 0.0066 0.008 0.0004 0.70 0.0067

0.053 0.0006 0.40 0.0041 0.031 0.0004 0.45 0.0040

0.057 0.0009 0.65 0.0057 0.029 0.0020 0.64 0.0056

0.104 0.0018 0.64 0.0064 0.051 0.0013 0.65 0.0063

0.091 0.0033 0.69 0.0148 0.055 0.0028 0.69 0.0141

The upper number in a cell is the estimated value, whereas the lower number is the estimated standard error. © Board of the Foundation of the Scandinavian Journal of Statistics 2007.

Inconsistencies in Bayesian hierarchical models

Scand J Statist 34

825

Table 2. Simulation results for  and  with n = 100 CM1 PA1

 

PA2

 

CM2 PA1

 

PA2

 

S1

S2

S3

S4

S5

0.003 0.0002 0.93 0.0038 0.002 0.0002 0.93 0.0037

0.004 0.0003 0.86 0.0045 0.003 0.0002 0.82 0.0058

0.010 0.0005 0.93 0.0032 0.007 0.0004 0.93 0.0032

0.065 0.0012 0.86 0.0062 0.054 0.0011 0.86 0.0062

0.053 0.0024 0.92 0.0069 0.045 0.0021 0.92 0.0067

0.021 0.0008 0.91 0.0043 0.004 0.0002 0.91 0.0041

0.031 0.0005 0.79 0.0061 0.010 0.0004 0.84 0.0052

0.054 0.0009 0.91 0.0043 0.020 0.0007 0.92 0.0045

0.121 0.0018 0.85 0.0064 0.056 0.0011 0.85 0.0059

0.102 0.0028 0.91 0.0077 0.060 0.0024 0.91 0.0076

The upper number in a cell is the estimated value, whereas the lower number is the estimated standard error.

factors CM1 , S1 with the conflict increasing factor PA1 , which constitutes the original suggestion by O’Hagan (2003), is a significance level dramatically smaller than 0.05. However, for some combinations the upward and downward acting factors cancel, resulting in significance levels fairly close to 0.05. This is the case for the combinations CM2 , PA1 , Sj with j = 2, 3 when n = 10, and CM1 , PA1 , Sj with j = 4, 5, as well as CM2 , PA1 , S3 , when n = 100. The combinations CM1 , PA2 , Sj , j = 4, 5, also give acceptable significance levels when n = 100, despite the conflict reducing effect of the factor CM1 . This is probably due to the fact that this effect is relatively small when n = 100, since then 2 /n is much smaller than 2 . Turning to calibrated detection probabilities, we observe that, in accordance with proposition 3, for any given splitting the detection probabilities are almost the same for all combinations of CM and PA. The splitting S2 appears to be somewhat exceptional in this respect. The most important feature influencing the calibrated detection probability seems to be the amount of data used in the estimation of the nuisance parameters for the different splittings. This especially affects S2 . Comparing these gives the ordering S1 , S5 , S4 , S2 , with S5 almost at the level of S1 , and with S3 at the level of S4 for n = 10, respectively, S1 and S5 when n = 100. The combinations (CM2 , PA2 , Sj ), j = 4, 5 give false warning probabilities close to the 0.05 significance level. No other combination, except for CM2 , PA1 , S3 where the upward acting factor PA1 and the downward acting factor S3 cancel out, does this both for n = 10 and n = 100. Among the vertical splittings, S5 obtains a calibrated detection probability practically at the level of the no splitting alternative S1 . The symmetric vertical splitting S4 , which uses less data in the estimation of the nuisance parameters, has a somewhat lower detection probability. The relative difference is smaller in the case of abundant data (n = 100 compared with n = 10). However, the S4 splitting has the advantage of being able to handle all the six possible conflicts at the i -nodes with only two MCMC runs. The splitting S5 needs 2 × 6 = 12 MCMC runs to evaluate all these conflicts. 5.2. Student’s-t distributed  The practical usefulness of our method will be limited, if it only applies to normal models. We have therefore made an experiment with a non-normal version of our model. There are © Board of the Foundation of the Scandinavian Journal of Statistics 2007.

826

F. A. Dahl et al.

Scand J Statist 34

of course infinitely many ways for a model to be non-normal, and there is no obvious choice. However, the normal distribution has very light tails, and ‘real data’ tend to have a higher probability for extreme values. We have therefore made our experiment with a heavier tailed distribution. A natural choice was the Student’s-t distribution. We set the degrees of freedom to 3, to make the tails as heavy as possible, while still having a finite variance. Our original model has normal distributions both for s and for data given i . There is little point in changing the distribution of the data to t-distributions, because the average of these data will be close to normal anyway, due to the central limit theorem. We have therefore chosen to modify the distribution of the s only, setting i | , 2 ∼ T3 scaled and located so that E[i | , 2 ] =  and var [i | , 2 ] = 2 . We somewhat arbitrarily chose n = 10 for the experiment. This is not likely to be important, as it made little difference in the normal case. In our MCMC simulations, we again use the total likelihood directly, and simulate the i parameters together with , 2 , 2 , using the same Metropolis–Hastings algorithm as before. This is rather less efficient than our simulations for the normal model, where we were able to eliminate the  values from the likelihood expression, but still sufficiently fast for our purpose. Our main hypothesis is that the -level, the false warning probability, is close to 0.05, for our vertical splitting schemes. We have focused on the central splitting (S4 ), because this makes it possible to gather data for all k = 6 i nodes from the same experiment, due to symmetry of the model. Following the procedure of our original experiment, we generated 10,000 data sets from the model. The estimated -level was 0.042, with a standard error of 0.0008. 5.3. Uniformly distributed  After testing our approach with the heavy-tailed Student’s t-distribution, we have also run tests with the opposite extreme of uniformly distributed s. Again, we have scaled and located the distribution such that E[i | , 2 ] =  and var [i | , 2 ] = 2 . The experiment was otherwise identical to the previous one, and the estimated -level was 0.047. These results with non-normal distributions for  support our hypothesis that our method is robust with respect to deviations from normality. 6. Conclusions We have shown that although the original procedure of O’Hagan (2003) for evaluating conflict is unreliable even in a Gaussian setting, our improvements give a method that can detect problems with a proposed model. Our method is backed up by theoretical computations in a non-trivial special case. It is particularly encouraging that our experiments show a false warning probability close to the preset value of 0.05 for our Gaussian model, and that this appears robust with respect to the normality assumption. This work has been based on theoretical analysis and experiments with computer generated data sets. A computational approach is in most cases the only way of testing a method’s ability to detect deviations from an assumed model, and to evaluate its false warning probability when data are in fact generated from the assumed model. However, the obvious line for future work is to test our method on real data. Acknowledgements This work has benefitted from the ‘Evaluation of Bayesian Hierarchical Models’ programme, supported by the Research Council of Norway. We are also grateful to Alan Gelfand for his very useful comments on earlier drafts of the manuscript. © Board of the Foundation of the Scandinavian Journal of Statistics 2007.

Inconsistencies in Bayesian hierarchical models

Scand J Statist 34

827

References Bayarri, M. J. & Berger, J. O. (2000). P values for composite null models. J. Amer. Statist. Assoc. 95, 1127–1142. Bayarri, M. J. & Castellanos, M. E. (2004). Bayesian checking of hierarchical models. Technical Report, 04-32. Institute of Statistics and Decision Science, Duke University, Durham, NC. Box, G. E. P. (1980). Sampling and Bayes inference in scientific modelling and robustness (with discussion and rejoinder). J. Roy. Statist. Soc. Ser. A 143, 383–430. Clyde, M. & George, I. I. (2004). Model uncertainty. Statist. Sci. 19, 81–94. Dahl, F. A. (2006). On the conservativeness of posterior predictive p-values. Statist. Probab. Lett. 76, 1170–1174. Dey, D. K., Gelfand, A. E., Swartz, T. B. & Vlachos, P. K. (1998). A simulation-intensive approach for checking hierarchical models. Test 7, 325–346. Gelman, A., Meng, X. L. & Stern, H. (1996). Posterior predictive assessment of model fitness via realized discrepancies (with discussion and rejoinder). Statist. Sinica 6, 733–807. Hjort, N. L., Dahl, F. A. & Steinbakk, G. H. (2006). Post-processing posterior predictive p-values. J. Amer. Statist. Assoc. 101, 1157–1174. Kass, R. E. & Raftery, A. E. (1995). Bayes factors. J. Amer. Statist. Assoc. 90, 773–795. Marshall, E. C. & Spiegelhalter, D. J. (2003). Approximate cross-validatory predictive checks in disease mapping models. Statist. Med. 22, 1649–1660. Meng, X. L. (1994). Posterior predictive p-values. Ann. Statist. 22, 1142–1160. O’Hagan, A. (2003). HSSS model criticism (with discussion). In Highly structured stochastic systems (eds P. J. Green, N. L. Hjort & S. Richardson), 423–453. Oxford University Press, Oxford. Robins, J. M., van der Vaart, A. & Ventura, V. (2000). Asymptotic distributions of p values in composite null models (with discussion and rejoinder). J. Amer. Statist. Assoc. 95, 1143–1156. Rubin, D. B. (1984). Bayesianly justifiable and relevant frequency calculations for the applied statistician. Ann. Statist. 12, 1151–1172. Schwartz, G. (1978). Estimating the dimension of a model. Ann. Statist. 6, 461–464. Spiegelhalter, D. J., Best, N. G., Carlin, B. R. & van der Linde, A. (2002). Bayesian measures of model complexity and fit (with discussion and rejoinder). J. Roy. Statist. Soc. Ser. B 64, 583–639. Received December 2005, in final form January 2007 Fredrik A. Dahl, Health Services Research Unit, Akershus University Hospital, PB 95, 1478 Lørenskog, Norway. E-mail: [email protected]

Appendix Proof of theorem 1. Due to the improper prior for  we have ( | yp , 2 , 2 ) = N(y¯ p , (1/l) ((2 /n) + 2 )), leading to the following simplifications in (7) E( | yp ) = y¯ p , E(2 | yp ) + var( | yp ) = E(2 | yp ) + E(var( | 2 , 2 , yp )) + var(E( | 2 , 2 , yp ))         1 1 l +1 1 = E(2 | yp ) + (E(2 | yp ) + E(2 | yp )) = E(2 | yp ) + E(2 | yp ). l n l ln This proves (14). Moreover, using (16) it can be shown that the posterior distributions of (2 , 2 ) given yp , respectively, yc depend on yp , yc only through sums of squared differences to the mean, proving the independence assertion. Finally, the variance expression (15) follows from the fact that with the splitting (10), Yp and Yc are independent under the test model. Proof of proposition 2. With no splitting we obtain © Board of the Foundation of the Scandinavian Journal of Statistics 2007.

828

F. A. Dahl et al.

Scand J Statist 34

  Y¯ 1 + · · · + Y¯ k−1 k −1 − Y¯ k k k −1    2  2     2  0 0 1 k −1 k −1 = + 20 + 20 . +1 = k n k −1 k n

var0 (Y¯ − Y¯ k ) = var0



On the other hand, for l = k the right-hand side of (15) takes the form ((k + 1)/k)(20 /n + 20 ), and it follows that under no splitting, c2,k Y, ipd ∼ ((k − 1)/(k + 1))χ21 . To analyse the splitting (8) we express the data in the form n n Yk, j = k + k, j , j = + 1, . . . , n, Yi, j = i + i, j , i = 1, . . . , k, j = 1, . . . , , 2 2 where the variables i, j and i, j are independent ∼ N(0, 20 ). With this decomposition, remembering that n is replaced by n/2 and l is replaced by k, using the right-hand side of (15), we can write the conflict as ¯ +(¯ k − ¯ )]2 [(k − ) [((k − 1)/k)(k − (1/(k − 1))(1 + · · · + k−1 )) +(¯ k − ¯ )]2 = ((k + 1)/k)(20 +(1/(n/2))20 ) ((k + 1)/k)(20 +(1/(n/2))20 ) =

[((k − 1)/k)(k − (1/(k − 1))(1 + · · · + k−1 )) +(¯ k − ¯ )]2 ((k − 1)/k)20 +(2(k + 1)/nk)20 × ((k − 1)/k)20 +(2(k + 1)/nk)20 ((k + 1)/k)(20 +(1/(n/2))20 )



(k − 1)20 +((k + 1)/(n/2))20 2 χ1 . (k + 1)(20 +(1/(n/2))20 )

From this the proposition follows. Proof of Proposition 3. Remember that ( | yp , 2 , 2 ) is normal with expectation y¯ p . When p  20 and 2 = 20 are fixed, the median conflict is M  | y (( − y¯ kc )2 /((1/mk )20 + 20 )), where, as before, mk is either n [no splitting or (10)] or n/2 [the splitting (8)]. As  is the only random quantity in the conflict, the median of ( − y¯ kc )2 /((1/mk )20 + 20 ) can be found by solving the equation 2=

1 P( > y¯ kc + v | yp ) + P( < y¯ kc − v | yp ) = , 2 or equivalently, assuming without loss of generality that y¯ p ≥ y¯ kc and letting v = (y¯ p − y¯ kc ) + z, 1 P( > y¯ p + z | yp ) + P( < y¯ kc − (y¯ p − y¯ kc ) − z | yp ) = . 2 Clearly, as P( > y¯ p | yp ) = 1/2, z must be positive, and the resulting median conflict is (y¯ p + z − y¯ kc )2 /((1/mk )20 + 20 ). Hence, c2,k y

p , yc ,

med

>

p c (y¯ p − y¯ kc )2 (y¯ p − y¯ kc )2 = c2, y , y , ipd . 2 2 > 2 2 k (1/mk )0 + 0 ((l + 1)/l)((1/mk )0 + 0 )

This covers (10). In the case of (8) or no splitting, (l + 1)/l) must be replaced by (k + 1)/k. This proves the first part, and the corresponding inequalities between the significance levels are immediate. It is seen that z is a deterministic, monotonically decreasing function of y¯ p − y¯ kc . This implies a pairwise deterministic, monotonic relationship between any of p c p c ci,ky , y , med , ci,ky , y , ipd , i = 1, 2 because of the fact that the denominators of these conflicts are fixed. It follows that the calibrated detection probabilities are identical for a given splitting, i.e. CM1 , PA1 , Sj = CM1 , PA2 , Sj = CM2 , PA1 , Sj = CM2 , PA2 , Sj for j = 1, 2, 4, 5, as asserted.

© Board of the Foundation of the Scandinavian Journal of Statistics 2007.

Lihat lebih banyak...

Comentarios

Copyright © 2017 DATOSPDF Inc.