Zero entries in contingency tables

July 13, 2017 | Autor: Peter Lane | Categoría: Econometrics, Statistics, Computational Statistics and Data Analysis

Descripción

Computational Statistics & Data Analysis 3 (1985) 33-45 North-Holland

33

Zero entries in contingency tables R.J. B A K E R Rothamsted Experimental Station, Harpenden, Herts A L5 2JQ, England

M.R.B. C L A R K E Department of Statistics and Computing Science, Queen Mary College, Mile End Road, London E1 4NS, England

P.W. L A N E Rothamsted Experimental Station, Harpenden, Herts AL5 2JQ, England Received December 1983 Revised December 1984

Abstract: The current literature on the analysis of multiplicative models for contingency tables suggests that in the presence of random zero entries (1) the degrees of freedom must be adjusted and (2) unique maximum likelihood estimates for the expected values do not exist. Both (1) and (2) are incorrect. Such an adjustment to the degrees of freedom is unnecessary and may lead to an incorrect analysis. Unique maximum likelihood estimates for the expected values do exist, even with random zeros, so no special adjustment is needed to interpret the analysis. The algorithms implemented in many commonly-used statistical packages on the basis of the current literature incorrectly analyze such tables.

Keywords: Log-linear models, Sparse tables, Random zeros, Structural zeros, Degrees of freedom, Generalized linear models, Maximum likelihood estimates, Prediction, GLIM, Genstat.

1. Introduction This paper arose from consideration of the recent article by Brown and Fuchs [4]. That article gave an analysis of a contingency table containing zero entries. We believe that analysis to be incorrect, and more importantly, to exhibit serious misconceptions of the treatment of zero entries in contingency tables. These misconceptions have a considerable history in the literature and have led many statistical packages to implement algorithms that will analyze such tables incorrectly. We discuss the analysis, and the literature on which it is based, under 3 headings. In Section 2 we discuss the confusion over expected and estimated 0167-9473/85/$3.30 © 1985, Elsevier Science Publishers B.V. (North-Holland)

34

R.J. Baker et al. / Zero entries in contingency tables

values for cells with ' r a n d o m ' zeros, a confusion that has given rise to the practice of setting the expected values for such cells to zero and then 'adjusting' the degrees of freedom. This adjustment is erroneous and furthermore it can invalidate consequent test procedures. In Section 3 we assert that, contrary to the widely stated view in the literature, maximum likelihood estimates (MLEs) for the expected values (/~) always exist and are unique even in the presence of random zeros. Such a result helps to unify the treatment of zero and nonzero entries in the table. Finally, in Section 4, we illustrate these conclusions by analyzing the data from Brown and Fuchs.

2. Estimated or expected?

We first emphasize the important and commonly-made distinction between ' structural' and ' r a n d o m ' zeros. 2.1. Structural and random zeros

A distinction is often made in the literature between those cells that, because of the structure of the experiment, must contain a zero entry, and those for which a positive count is possible but a zero count obtained. The former are called structural or fixed zeros or necessarily empty cells; the latter are called random or sampling zeros or accidentally empty cells. The distinction is important because structural zeros are not part of the data set, whereas r a n d o m zeros are; an alternative description is that a cell with a structural zero has an expected value of zero, while the expected value for a cell with a random zero can take any nonnegative value. If a zero occurs in a margin that was fixed prior to the experiment (e.g. a zero sample size with multinomial sampling) then by definition the cells in the table contributing to that margin are structural zeros. By extension, if we choose to analyze the table conditional on a margin that was not actually fixed in the experiment, then cells in the table that were not structural zeros in the experiment will become structural zeros in the analysis if they contribute to a zero cell in the conditioning margin. Since such conditioning introduces new structural zeros into the model we must be explicit at the outset as to which margins we are conditioning on. Because structural zeros, having observed and expected values of zero, can provide no information on the model, we need not include them in the analysis; just as, when analyzing data from a field trial we would not include a yield-value of zero for a plot on which the experiment was not performed. The only justification for including structural zeros when analyzing contingency tables is that it keeps the tables rectangular; however, this property only confuses the description of the data while permitting no simplification of the analysis. Programs like G L I M [2] and Genstat [1] for example, make no assumptions about rectangularity and can analyze the data set as presented. Hence, we make no further references to analyzing structural zeros, assuming that they have been omitted from the data. We discuss below only the case of random zeros.

R.J. Baker et al. / Zero entries in contingency tables

35

2.2. Adjusting degrees of freedom There is an established literature on the treatment of random zeros. Bishop, Fienberg and Holland [3], in a standard text on the analysis of contingency tables, discuss the treatment of random zeros in Section 3.5 and the 'adjustment' for empty cells in section 3.8. Fienberg [5] devotes Section 8.1. to their treatment, while his Appendix II discusses the existence of MLEs when they occur. The common thesis is best summarized by Fienberg [5], p. 109: " I n order to test the goodness-of-fit of a model that uses an observed set of marginal totals with at least one zero entry, we must reduce the degrees of freedom associated with the test statistic. I f an observed marginal entry is zero, both the expected and observed entries for all cells included in that total must be zero, and so the fit of the model for those cells is known to be perfect once it is observed that the marginal entry is zero. As a result, we must delete those degrees of freedom associated with the fit of the zero cell values."

Now the phrase we have italicized is wrong: the occurrence of a zero value, even a row of zero values leading to a random marginal zero, does not imply that the expected value, which is unknown, is also zero. It is important to be quite clear about this; the expected value is a population parameter which is unaffected by whatever value, zero or otherwise, that we happen to observe in the data. However, the same is not true o f the MLE, which is totally dependent on the observed value; if the word 'expected' is replaced by 'estimated' in the quotation then the phrase becomes true, since the ML estimate of the expected value for a cell will be zero given the model. This confusion of the words 'estimated' and 'expected' is found throughout the literature. For example, Bishop, Fienberg and Holland [3], p. 188: " T h e expected values under the model of quasi independence are identical to the observed values whenever the n u m b e r of independent parameters being fitted is equal to or greater than the total number of cells."

Similarly, in Brown and Fuchs, p. 5: " W h e n there is a zero in a marginal configuration defined by a log-linear model, the cells comprising that marginal zero have expected values that are zero."

Indeed, in almost its every occurrence in the paper the word 'expected' should be replaced by the word 'estimated'. (As an historical aside, we conjecture that the confusion has its roots in the use of such formulae as that for Pearson's X2: E (0- E) 2 E where the E is often read as 'expected' though when it is a function of the data, as is usually the case, it should, of course, be read as 'estimated'.) It could be thought that use of the word 'expected' for 'estimated' is at worst an abuse of terminology that a discerning reader could overlook. But the problem is deeper than this, and is exemplified in the final sentence of the Fienberg quotation above. This sentence justifies the deletion of degrees of freedom for cells with zero entries, and indeed it is a valid conclusion provided such cells have zero expected values. For if a cell has an expected value of zero it is, by definition, a structural zero and should not be included in the analysis, and thus should not

36

R.J. Baker et al. , / Z e r o entries in contingency tables

have a degree of freedom associated with it. However, when the word 'expected' in the quotation is replaced by the word 'estimated' the conclusion is false. There is now no reason to omit a zero observation or to delete a degree of freedom simply because it gives rise to an estimate of zero. Consider the analogous case in the analysis-of-variance. If we collect data for a (say) two-level one-way classification and both our data values happen to be zero (or even just equal), we are still able to analyze the data. The sum of squares for treatments will be zero, though the degrees of freedom will not, and we would not consider any adjustment. The justification for adjusting the degrees of freedom in the Poisson case comes solely from the misidentification of expected and estimated values in the presence of random zeros. Such misidentification implies that a r a n d o m zero (whose expected value is non-negative but whose estimated value is zero) will be treated like a structural zero, i.e. with both an expected and an estimated value of zero, and the degrees of freedom will be adjusted accordingly. As Fienberg ([5] p. 110) states: "The procedure for handling zero marginal totals discussed here would also be used if all the sampling zeros, corresponding to the observed zero marginal totals, were actually fixed zeros."

A related, but separate, issue concerns the effect of reducing the residual degrees of freedom on the distribution of the deviance. It is well known that, in the presence of small counts, the distribution of the deviance with r degrees of freedom is not well approximated by X~, and it may be that its distribution is better approximated by a X 2 with fewer degrees of freedom. This, however, is not an argument that has been advanced in the quoted sources in support of such a reduction and, moreover it would have to be demonstrated that the suggested reduction is the most appropriate one. Indeed, as Brown and Fuchs point out at the end of their article, Haberman [7] has indicated that the asymptotic properties of the deviance are still applicable even when some counts are small, so long as the total sample size and number of cells in the table are large. Additionally, while it is conceivable that a reduction in degrees of freedom could improve the approximation in some cases, it is important to remember that this reduction will apply to the approximating X 2 distribution and not to the degrees of freedom of the associated deviance. A further argument for adjusting the degrees of freedom concerns conditionality considerations. It is obvious that if a set of margins in the table was fixed prior to the experiment then we must analyze the data conditional on those margins, and if a zero occurs in o n e of these margins then the corresponding cells in the table will be structural zeros. (It is worth noting that if multinomial data are analyzed as Poisson data, using the well-known equivalence of their likelihoods, then the degrees of freedom must be corrected for the structural zero cells treated as r a n d o m zero cells under the Poisson model.) A n extension of this conditionality argument proposes that we also condition on margins that were not fixed in the experiment but which can be considered ancillary to the parameters of interest in the model. Thus to test the goodness-of-fit of model M 1 or to assess the usefulness of M 2 ( c M 1) over M 1 we would analyze the data conditional on the margin associated with M1 treating that margin as an ancillary statistic. (In the

R.J. Baker et al. , / Z e r o entries in contingency tables

37

2-way table under the model of independence such arguments lead to the Fisher exact test and its generalizations.) If we accept the extended conditionality argument then marginal zeros in M1 give rise under the model to structural zeros in the table, leading to fewer degrees of freedom than would be obtained in an unconditional analysis; though it should also be noted that marginal zeros occurring in M 2 but not in M~ do not give cause for reducing the degrees of freedom. Although recommended in theory, the conditional analysis is, except for certain special cases, rarely performed in practice, because of the computational difficulties involved in evaluating the conditional likelihood and its maximum. All commonly-used statistics packages perform an unconditional analysis of the general n-way table, in which case no structural zeros are introduced and there is no reason to omit degrees of freedom. We summarize the position so far; structural zeros need not be included in the analysis, but if they are they have an expected (and hence, a priori, an estimated) value of zero and because the expected value is zero they do not have an associated degree of freedom. If random zeros occur in the data they have unknown nonnegative expected values and, under certain models, will have an estimated value of zero. Because the expected value is not identically zero, i.e. because the parameter space is not null, each random zero has a degree of freedom associated with it regardless of the model. If, for inferential purposes, we condition on a random margin containing zero entries then we introduce structural zeros into our model; the computations necessary for a conditional analysis of a general n-way table are rarely performed in practice.

2.3. The consequences of adjustment An obvious consequence of deleting degrees of freedom is that the wrong test statistic will be used. Thus the residual degrees of freedom given in Table 3.8-3 of Bishop, Fienberg and Holland [3] should be 16 for the 'all three-factor effects' model and 38 for the 'all two-factor effects' models and not 12 and 31 as given, and the computed log-likelihood ratio statistic, or deviance, should be referred to the X26 and X]8 tables not to those for X22 and X21. There are numerous further examples in the literature. Similarly, the algorithms for computing degrees of freedom, as implemented in BMDP, Minitab, PSTAT and SPSS are incorrect, since they adjust the degrees of freedom for random zeros. The difficulty can only be overcome by requiring the user to specify which margins are fixed and which are random, though should the resulting distribution be other than multinomial or Poisson the usual methods of analysis would no longer apply. A further serious consequence of reducing the degrees of freedom, and one that has been ignored in the sources quoted, is that it invalidates the procedure for testing differences in the deviances under different models..A condition for the asymptotic X 2 distribution of the difference between the deviances under models M 1 and M 2 (say) is that the parameter spaces associated with M 1 and M 2 be nested. A simple example will illustrate that this nesting is lost if we set to zero

38

R.J. Baker et a L / Zero entries in contingency tables

the expected values of random zeros and reduce the residual degrees of freedom. Consider the A × B table B

A

Yll Y12 Y21 Y22

with Y12 =)'22 = 0, and with the B margin not fixed. Let [~n

I"1+

[Yn]

l-V,+l --e/y2,1'

[ tt 22

k Y22]

and consider fitting the models: M1;

main effect of A only,

M:;

main effects of A and B (independence model).

If we allow M 1 and M a to have their usual parameter spaces then the parameter space of M 1 is, of course, a subset of that for M 2 and the usual test procedure for B is possible. If, however, we set /-h2 and ~22 to zero under model M 2 on the grounds that the margin Y+2 is zero, then the spaces are not nested, as t h e following illustrates. Let M; stand for M 2 under these constraints and let

Yn/2 ~ a = _Y2t/2y21/2'

/lb= [021

Then ~a ~ M1 but ~t~ ~ M~ while ~b ~ M1 but ~b ~ M~; thus the spaces are not nested. The difference in deviances under M~ and M~ cannot then be used to test the significance of the effect of B. By treating ~2 and itt22 as structural zeros under M~ but not under M~ we are effectively estimating within 2 different sample spaces, and hence within 2 different parameter spaces. Similar considerations apply to the analysis of more complex tables, such as that in the Brown and Fuchs article, as we shall see later.

2.4. Summary Structural zeros are cells that are necessarily empty, either because of the design of the experiment or because, in a conditional analysis, they contribute to a zero entry in the conditioning margin. Structural zeros need not be included in the analysis but if they are they have an expected and an estimated value of zero, and because of the former have no associated degrees of freedom. Random zeros are a part of the dataset and must

R.J. Baker et al. / Zero entries in contingency tables

39

be included in the analysis. They have non-negative expected values and, depending on the model, will have zero or positive estimated values. Each random zero, whether or not it contributes to a marginal zero, has an associated degree of freedom that can be deleted only by omitting the observation from the analysis, a practice that cannot be justified purely on the grounds that the observed value is zero. Omission of different random zeros under different models may invalidate the usual comparative test procedures.

3. Existence of estimates

We note that an authoritative text on contingency-table analysis, Haberman [6], does not recommend deleting degrees of freedom for random zeros. His Chapter 7 discusses the computation of degrees of freedom, but only for incomplete tables, i.e. tables with structural zeros. His Theorem 2.2 and Appendix B, however, illustrate another, though perhaps less important, source of confusion over the treatment of random zeros. We first define our notation. Haberman denotes the expected value of an observation by m and the logarithm of an expected value by g. This seems an unfortunate notation as g usually denotes an expected value, and Roman letters such as m are often reserved for random variables. Given the existing confusion between an expected value and its estimate (which is a random variable) we prefer to denote an expected value by g, its logarithm, termed a linear predictor by Nelder and Wedderburn [10], by ,/ and, as usual, their estimates by/2 and ~.

3.1. Extended M L E A common model for contingency table data postulates a Poisson distribution for the cell entries, Yi, i ~ I, where I indexes the data: pr(y,; g~) = e-J',gY,/yi!

(y, >1O, g, > 0),

where the distribution degenerates to a single point at y~ = 0 if /.t, = O. (The arguments given below are easily extended to cover multinomial distribution models.) A natural way of expressing the dependence of g~ on the values of the classifying variables is to write

gi = exp{ZjxijBj }

(1)

or in an obvious notation g=exp(n),

7/=Xfl,

where X denotes levels of the classifying factors (or covariates). But it is important to note that it is not possible to represent the whole of the p-space under such a parameterization, since a g for which some g~ = 0 is not expressible under this parameterization: there is no r/~ ~ R such that exp(T/i)= 0. Thus this parameterization represents the p-space only if we disallow those points g for

R.J. Baker et aL , / Z e r o entries in contingency tables

40

which some ~ i - - 0 . H a b e r m a n ([6], p. 3) does just this, by restricting the parameter space to (our) ~ > 0. This is convenient since it makes the/.t-space the image under the exponential transformation of the ~/-space but it has the undesirable consequence that if l~, = 0 is not part of the t~-space then I~, = 0 is not an admissible estimate. Thus, given his parameter spaces, H a b e r m a n ' s Theorem 2.2 can correctly assert that maximum likelihood estimates for/~ and ~ do not exist if certain patterns of zeros occur in the data. Equivalently, a maximum over ~ cannot always be found in (0, oo) when random zeros occur in the data. Consider, however, the simplest contingency table: a single observation, which we postulate to have a Poisson distribution. Suppose the observed value is zero. Direct inspection of the likelihood exp(-~),

/~>0,

shows that the maximum does exist (at ~ = 0) and is unique, whereas Haberman's theorem concludes that it does not (using his notation, n l = 0 and 81 = 0). The only difference is that Haberman has made the point ~ = 0 inadmissible and hence cannot allow/2 = 0 as an estimate. To cover such cases Appendix B of Haberman [6] introduces the concept of an 'extended' MLE, that is /2 is an extended M L E if fi, ~ 0 for some i under parameterization (1). Such an extension is unnecessary. As in the above example, we need to extend the maximum likelihood method only if we have previously restricted the parameter space to/.t~ > 0. We show now that the M L E for/~ exists under a model that includes (1), and disappears only under a logarithmic reparameterization. 3.2. M L E for l~ always exist

The problem is purely an artefact of the parameterization. The logarithmic function is discontinuous at the point ~ = 0 and thus not all models for/_t can be expressed via this explicit parameterization. Instead we express the ~-space allowable under the model M implicitly, via a set K of constraints on /~:

M=

FI >=FI P, k K} i~l

i~I

(2)

where 0 ° = 1. If/.t i > 0 for all i then this could equivalently be written as M = ( # : # = e x p ( 7 1 ) and L ~ = O ) or

M = (/~:/~ = e x p ( ~ ) a n d ~ = X f l } where L = [c,~ - d,~] and L X = O. Thus a 'constraints' representation of a model is valid for the whole of the ~-space and is equivalent to a log-linear parameterization where this exists. It is easily seen that a unique maximum of the log-likelihood can be found in M as specified by (2). Where Haberman's Appendix B talks

R.J. Baker et al. / Zero entries in contingency tables

41

of an extended M L E for ~ under a log parameterization we simply interpret it as the ordinary MLE for ~ under the specification (2). So we interpret his theorems to read: any of the models we consider can be specified via constraints as in (2); a model for which ~i > 0 for all i can also be specified by a logarithmic parameterization; if the log likelihood is maximized by a model for which /L > 0 for all i then this /~ too can be expressed by a logarithmic parameterization; if the log-likelihood is maximized by a model for which /~i = 0 for some i then this/2 cannot be expressed by a logarithmic parameterization; in this case such a/2 still maximizes the log-likelihood but no corresponding ~ exists. Thus an M L E for/~ always exists (and H a b e r m a n ' s theorem shows that it is unique) though a corresponding ~l may not. Furthermore H a b e r m a n ' s theorem provides us with the justification for a very simple method of evaluating ~ in both cases, whilst still retaining the computational simplicity of the logarithmic parameterization. We explain this below with reference to the N R algorithm and also describe what happens in GLIM-3.

3.3. Use of the logarithmic parameterization Computationally, the logarithmic parameterization is undoubtedly more convenient to work with than the constraints specification, but the difficulty to be overcome is its inability to represent all valid models, due to its discontinuity at zero. Let I o be that subset of I with y; = 0 and J0 (not known a priori) be the subset of I 0 for which zero fitted values will be obtained. Suppose J 0 - I 0 - I and let R ( X ) stand for the set of all ~/ such that ~ = X/~, - oo 0, for all i, even if 10 is non-null. Furthermore even when these conditions are not satisfied, J0 non-null, as in the Brown and Fuchs example, he shows (Appendix B) that an ~l can always be defined with the property that as r/---, ~, 71 ~ R(x), L(~) approaches its supremum and the ~ that are finite, i ~ I - J 0 , can be found by omitting those observations, i ~ J0, for which ~/~~ - o o and applying standard methods to the reduced data set I - JoIn practice it is sufficient to allow the iteration to take its natural course, the ~/;, i ~ J0, tending to large negative values, the corresponding weights and fitted values tending to zero. When underflow occurs in the weights then some /3's become aliased but the remainder stabilize at finite values. The ~ ' s corresponding to nonzero weights can be computed as normal from the t3 values while those corresponding to zero weights must be set to zero. Note that the presence of a zero weight due to underflow must not be interpreted as the loss of an observation nor must the subsequent aliasing be interpreted as a reduction in the parameter space, both of which would lead to an adjustment of the degrees of freedom. The degrees of freedom must be calculated before underflow occurs, and the consequences of the underflow must be interpreted as the result of our inability to represent - o o on a computer, not as a change in the dimensions of the underlying spaces.

42

R.J. Baker et al. / Zero entries in contingency tables

In GLIM-3 the convergence criterion is set at a compromise value that minimizes unnecessary iteration because fitted values are hardly ever required to full machine accuracy. In cases of the kind considered here convergence is usually signalled well before the iterative weight underflows. In these cases full accuracy in the computed estimates etc is not achieved, though the inaccuracy usually affects only the fifth or sixth significant figure and is certainly of no practical statistical significance. However, on the rare occasions that such underflow does occur in GLIM-3 a fault message about loss of degrees of freedom is triggered. This is inappropriate and in the next version of G L I M the algorithm will proceed as normal and estimate the finite ~/i and nonzero/~i on the reduced data set I - J0 as described previously. (It is also worth noting that the new version will give the user full control over aliasing tolerances, convergence criteria and underflow constants.) The resulting 'estimates' of fl will then sometimes be zero (or aliased) instead of infinite, in the presence of random zeros. We claim below that this will not cause difficulties for the statistician interpreting the results from such an analysis since, we maintain, the fl's are not in any case a necessary or even suitable subject for scrutiny.

3.4. In place of the fl-parameters The estimation problems caused by random zeros in contingency tables and affecting the B-parameters are theoretical and computational. They rarely have any bearing on the interpretation of the results, and should be invisible to the user if the analysis is done with a well-designed computer program. This is because the problems concern the parameter estimates/3, which exist only for the computational convenience of expressing the model in log-linear form. The estimates are not convenient for interpreting the data or the analysis, partly because they are expressed on this transformed scale, and not on the natural scale of the observations. (Even in the relatively simple case of a table classified by factors with 2 levels, the parameters can only be interpreted as logarithms of odds ratios. These are not practically useful as expressions of the effects fitted in the model.) Additionally, there are many possible types of parameterization for fl especially when data are not balanced, for example when a set of counts contains structural zeros; parameters may then be constrained so that unweighted sums are zero, or so that some form of weighted sum is zero. Each type of parameterization has its advantages and disadvantages, but none provides values that are readily interpretable and so the form chosen is not practically important. The results that are relevant for the interpretation of an analysis include the deviances, the residuals and the fitted values of the chosen model. The differences between the residual deviances of alternative models are approximate chi-squared statistics, and show the importance of the terms in the model. An analysis of residuals can identify aberrant observations that lead to ill-fitting models. The fitted values show the effects of the chosen model; they can be combined to provide summaries of individual effects on the natural scale of the observations.

R.J. Baker et al. / Zero entries in contingency tables

43

(The paper by Lane and Nelder [8] shows how such 'predicted values' may be presented in simple tables; the facility to produce predicted values has been incorporated into Genstat.) The use of such statistics is also illustrated in such readable introductions as Nelder [9] or Plackett [11]. In Section 4 we present a partial analysis of the Brown and Fuchs data that does not involve inspection of the /3 values. We do not claim that it is not possible to interpret such values (as functions of the data they do after all contain information) but we maintain that the information they do contain is not readily available to the analyst, and that simpler statistics such as the deviance, the residuals and the predicted values are more direct tools for extracting and presenting the information contained in the data.

3.5. Summary A log-linear parameterization of a model can only be used if ~t, > 0 for all i, whereas a constraints specification is valid for all ~. MLE for bt always exist and are unique under a constraints specification, but can only be expressed in log-linear form if ~ > 0 for all i. Algorithms using the log-linear parameterization may still be used if iterative weights are allowed to underflow to zero and the corresponding/.t~ is set to zero. Corresponding il's and certain/3's will then not exist. This must be allowed for by the package writer but, since the ~7's and /3's were only introduced to permit a log-linear expression of the model, and are not of themselves directly interpretable, this presents no difficulties for the analyst. Instead, we emphasise the importance of the deviance, the residuals and predicted values as simple direct methods for understanding the data and the model.

4. An example analysis We illustrate the discussion by analyzing the data in Brown and Fuchs. Note that the analysis does not involve inspecting the/3's. The interrelationship of the factors ENMB is not of primary interest; it is rather their interaction with the factor D that is important. Hence we fit conditional on the ENMB margin, obtaining a binomial distribution for the observations. By conditioning on this margin, 4 of whose entries are zero, we have transformed into structural zeros the 4 cells in the table that correspond to the marginal zeros, and are now left with 12 observations. The effects DN, DM, and DEB were chosen by Brown and Fuchs to be of primary interest. However, there is no information about the three-way interaction DEB in the data, because cells that are needed to estimate the effect are structural zeros. Therefore we fit only the effects of E, B, N and M on D - - all these terms are important (Table 1). If, on the other hand, we were also to include the NB term then the fitted value for the last cell is zero, since the marginal entry N(2)- B(2)- D(1) is zero. However, under the model, this remains a random zero and gives no cause for adjusting the

44

R.J. Baker et al. / Zero entries in contingency tables

Table 1 Term

Deviance (approx. X~)

E N M B

5.7 6.9 2.7 3.2

Table 2 1

E N M B

0.125 0.191 0.241 0.232

2

(0.046) (0.040) ' (0.037) (0.038)

0.386 0.500 0.698 0.800

(0.051) (0.076) (0.127) (0.199)

degrees of freedom. Returning to the E + N + M + B model, we can summarize the effects of these factors by presenting tables of predicted values with approximate standard errors. For each factor, the table is formed by taking averages of the fitted values over the other factors in the model, weighting according to the numbers of observations at each combination of the factors. Hence the proportions summarize the effect of a factor, taking account of the distribution of observations, in this sample, over the levels of the other factors. This process can be applied to form summaries of any generalized linear model, with balanced or unbalanced data (see [8]). Table 2 shows how the average proportions of subjects responding to D (0.280), changes at the different levels of each of the other factors. The deviances and tabulated values above are not affected by random zeros, so long as the computer program used deals sensibly with zero estimates. All the results required to summarize the analysis can therefore be presented without any reference to the parameter estimates.

5. Conclusion The current literature on contingency tables perpetuates two misconceptions that complicate and invalidate their analysis. The correct method for calculating the degrees of freedom of the residual deviance is the usual method of subtracting from the number of observations the dimension of the parameter space under the given model. Formulae for adjustment in the presence of random zeros are unnecessary and seem to derive from confusion over the terms 'expected' and 'estimated'. Such adjustment may invalidate consequent test procedures. MLE for/~ always exist and are unique though it may not always be possible to

R.J. Baker et al. / Zero entries in contingency tables

45

express them in a log-linear form, in particular when there are random zeros in the data. A simple adjustment to the usual N R algorithm allows it to be used in all cases, though the parameters become a little more complex. This adjustment will not concern the analyst using the algorithm since the most suitable statistics for interpreting the data (the deviance, the residuals and predicted values in particular) are unaffected by the choice of parameterization. Man 3• of the most commonly-used statistical packages incorrectly analyze contingency table data by erroneously adjusting degrees of freedom in the presence of zero entries.

Acnowledgements The authors wish to thank Bob Gilchrist for the original stimulating discussion that led to the writing of this paper. Special thanks are also due to Murray Aitkin, Robin Plackett and Jeff Wood for comments on an earlier draft of the paper.

References [1] Alvey et al., Genstat Manual (Rothamsted Experimental Station, Harpenden, Herts, 1977). [2] R.J. Baker and J.A. Nelder, The GLIM System, Release 3 (Numerical Algorithms Group, Oxford, 1978). [3] Y.M.M. Bishop, S.E. Fienberg and P.W. Holland, Discrete Multivariate Analysis: Theory and Practice, 2nd Ed. (MIT Press, Cambridge, MA, 1976). [4] M.B. Brown and C. Fuchs, On maximum likelihood estimation in sparse contingency tables, Comput. Statist. Data Anal. 1 (1983) 3-15. [5] S.E. Fienberg, The Analysis of Cross-Classified Categorical Data (MIT Press, Cambridge, MA, 1977). [6] S.J. Haberman, The Analysis of Frequency Data (University of Chicago Press, 1974). [7] S.J. Haberman, Log-linear models and frequency tables with small expected cell counts, Ann. Statist. 5 (1977) 1148-1169. [8] P.W. Lane and J.A. Nelder, Analysis of covariance and standardization as instances of prediction, Biometrics 38 (1982) 613-621. [9] J.A. Nelder, Log-linear models for contingency tables: a generalization of classical least squares, Appl. Statist. 23 (1974) 323-329. [10] J.A. Nelder and R.W.M. Wedderburn, Generalised linear models, J. Roy. Statist. Soc. Ser. A 135 (1972) 370-384. [11] R.L. Plackett, The Analysis of Categorical Data, 2nd Ed. (Griffin, London, 1981).

Lihat lebih banyak...

Zero entries in contingency tables

Descripción

Comentarios