Parameter estimation in linear multinomial models

July 5, 2017 | Autor: Douglas Bonett | Categoría: Econometrics, Statistics, Computational Statistics and Data Analysis

Descripción

179

Computational Statistics & Data Analysis 17 (1988) 179-187 North-Holland

Parameter estimation in linear multinomial models Douglas

G. BONETT

Department

of Statistics,

University of Wyoming. Laramie,

WY 82071, USA

J. Arthur WOODWARD Department

of Psychology, University of California, Los Angeles, CA 90024, USA

Received 2 June 1987 Revised 20 March 1988 Abstract: The minimum modified &i-square estimate, the linearized maximum likelihood estimate, and the maximum likelihood estimate of the parameter vector in a linear multinomial model are defined. A simple computational algorithm is based on the penalty function method of imposing constraints. Keywork Multinomial, Linear models, Minimum modified &i-square, Penalty function.

Maximum

likelihood,

1. Introduction Hypotheses regarding marginal homogeneity, symmetry (see Bishop, Fienberg, and Holland [2, Ch. 8]), and equiprobability (see Haberman [7, p. 31) can be specified in terms of the parameter vector of the linear multinomial model F=X~+E

0)

where F is a 1 X 1 multinomial random vector, X is a t X q full rank matrix of known constants, B is a q X 1 vector of unknown constants, and E is a t X 1 unobservable random vector of deviations from expectation. It is assumed that the expected value of E is equal to the null vector and 1’Xp = n where 1 is a one vector of length t and n is the known sample size. Gokhale [3] and Haber [5] give an algorithm for obtaining the maximum likelihood estimate of /I in (1). In this’ paper the minimum modified &i-square estimate, the linearized maximum likelihood estimate, and the maximum likelihood estimate are defined and computational algorithms for each estimate are given. The algorithms presented give an estimate of the asymptotic covariance matrix of each estimator as a computational by-product. 0167-9473/88/%3.50

0 1988, Elsevier Science Publishers B.V. (North-Holland)

180

D.G. Bonett, J.A. Woodward / Parameter estimation

2. Parameter estimation

Rao [ll] defines several asymptotically equivalent methods of estimating the parameters of a multinomial model. We will consider two of these. In addition, we will define a linearized maximum likelihood estimate of #I that, like the maximum likelihood estimate, has second order efficiency. The minimum modified &i-square estimate of /3 in (1) is defined as the vector that minimizes the function H,(B) =

( f- X79’~(.0-‘(

f-

XS)

(2)

where f is the t x 1 sample realization of the random vector I; and D( f) is a diagonal matrix with the elements of the vector f along the principal diagonal. A zero value of f in D( f ) is replaced with a value of one (Rao [ 111). Grizzle, Starmer and Koch [5] define a minimum modified &i-square estimate for the linear multinomial model AI: = Xp + E where A is s x t (s c t ). The maximum likelihood estimate of /3 in (1) is defined as the vector that maximizes the Poisson likelihood function

H,(B)=l’[f.In(xB)-xS-ln(f!)l

(3)

where (a) denotes the Hadamard product and f ! denotes elementwise factorials. The maximum likelihood estimate for the linear multinomial model has been given by Gokhale [3] and Haber [5]. We will want the value of fi that minimizes (2) or maximizes (3) to respect the multinomial constraint 1’Xp = n. In addition, we may want to impose other linear constraints on /3 to represent marginal homogeneity or symmetry for example. We will represent these linear constraints on /3, including the multinomial sampling constraint, by the equation h = C/3 where h is an r x 1 vector of known constants and C is an r X q matrix of known constants. Optimization of (2) and (3) subject to C/3 = h can be accomplished in many ways and one simple method involves the use of a penalty function G(B) = d(C/3 - h)‘(@

- h)

(4

where d is a sufficiently large constant to insure a minimum of (2) or a maximum of (3) in the feasible region (see Judge et al. [lo]). Adding (4) to (2) or subtracting (4) from (3) defines modified objective functions, H:(8)

= H,(B) + G(B),

(5)

H;(B)

=%(B)

(6)

- G(B),

that can be optimized using a Newton-Raphson

6+

1 =

6

-

algorithm

P,&

where P, = (i32H*(/3)/i3/3i3~‘)-‘, g, = (aH*(~)/~~) at /3 = B,, and H * denotes either HI* or H2*.

(7)

with derivatives evaluated

D. G. Bonett, J.A. Woodward / Parameter estimation

181

The first derivative vectors are X’o(f)-‘(p,-f)+d(C’(c~-h)),

(8)

X’WJ’(

(9)

f-

P,) - d(C’(Clj, - h))

for (5) and (6) respectively and the second derivative matrices are X’D( f)-‘x+ -(X’D(

dC’C,

(IO)

f)D(fi,)-‘x+

dC’C)

01)

for (5) and (6) respectively where @,= X& Note that (7) gives a closed-form solution for the minimum modified &i-square estimate and is equal to & = [ X’D(

f)-‘x+

CK’C]

-‘[

xB(

f)-lf+

cm].

(12)

where D( f)-‘f may be replaced with a one vector to speed computation. Note that since the optimizing equations CWi*/a/3 = 0 are linear, (12) can be derived without the Newton-Raphson method. Using (9) and (11) in (7) gives the Newton-Raphson algorithm

&+l =/I* - [ X’D( f)D(&)-*x+

dC’C]-l

x[X’D(Ci,)(f-p,)-d(C’(c~-A))]

03)

for the maximum likelihood estimate where (12) is used for t = 1 and ones replace zeros in D(f). Iteration of (13) may be terminated when (pt+, - fi,)‘(/$+, - B,) < k for a prespecified small nonzero value of k and the final value is the maximum likelihood estimate denoted as &. A single iteration of (13) gives a linearized maximum likelihood estimate (see Harvey [9]) which, as Rothenberg, [12] shows, has second order efficiency when the starting vector is BAN. Thus, a closed-form estimate of j3 having second-order efficiency is the linearized maximum likelihood estimate /§*= pi + [x9( x [ x’W,)(

f)D(jq-*X+ f-

A)

dC’C] - d(C’(CB,

-l -q

(14)

where fii = X& is the minimum modified &i-square estimate of ~1 and @i is given in (12). Similarly, ii2 = Xfi2 and p3 = X& are the linearized maximum likelihood and maximum likelihood estimates of the expected cell frequencies. Note that f converges in probability to p (the vector of expected cell frequencies) and z/j = [ x’D(jqlx+

K’C]

-l

(15)

defines the common asymptotic covariance matrix of the maximum likelihood, linearized maximum likelihood, and minimum modified &i-square estimators.

182

D. G. Bonett, J.A. Woodward / Parameter estimation

Letting B = (X’D(p)-‘X), the asymptotic rearranged to yield (Seber 113, p. 5191). (B + dC’C)-’

= B-’ - B-‘C’(d-‘I+

covariance matrix in (15) can be

CB-‘C’)-‘CB-’

(16)

and as d approaches infinity this reduces to B-1 _ B-‘C’(CB-‘C’)-lCB-’ which is the asymptotic constrained maximum likelihood estimator matrix given by Aitchison and Silvey [l] since B = -E(~H,(j3)/@3i3/3’)

and

C= a(@

covariance

- h)/a/3’.

Thus, when d is sufficiently large, the penalty function method of imposing constraints described here is equivalent to the method of Lagrange described by Aitchison and Silvey [l] and used by Gokhale [3] and Haber [6]. An estimate of the asymptotic covariance matrix of the estimator of /3, denoted as Es is obtained by replacing p with its estimate in (15). An estimate of the asymptotic covariance matrix of the expected cell frequency estimator is defined as xri = XXfiX’

(17)

and the square roots of the diagonal elements of (17) gives the standard errors described by Haberman [7].

3. Test statistics In this section we present the modified Pearson and likelihood ratio test statistics. These test statistics are well known and are presented here only for completeness. The modified Pearson &i-square test statistic is defined as H,(A)

= (r-P,)‘Nf)-l(f-flJ

(18)

and the likelihood ratio test statistic is defined as H,(&

= 2I’[ f* 14 MUI

(19)

where f/i& denotes elementwise division. The sample values (18) or (19) are compared with the point on the &i-square distribution with t - q + r - 1 degrees of freedom that is exceeded with a given probability.

4. Computational

example

Consider the 3 x 3 contingency table of observed frequency counts shown in Table 1. This table gives a cross-classification of the number of lambs born to 227

D.G. Bonett, J.A. Woodward / Parameter estimation

183

Table 1 Cross-classification of 227 Ewes according to number of lambs born in 1952 and 1953 1952

1953

0

1

2

0

58

52

1

1

26

58

3

2

8

12

9

Source: Bishop, Fienberg and Holland (1975, p. 288)

ewes in two consecutive years (Bishop et al. [3, p. 2881). We will estimate the expected cell frequencies under the linear constraints of marginal homogeneity, symmetry, and equality of number of lambs born in 1952 and 1953. For these constraints, it will be most convenient to let the design matrix X in (1) equal the identity matrix so that the unknown parameter vector j3 contains the expected cell frequencies. None of these constraints involve the diagonal cells so we will follow Bishop et al. [2, Ch. 51 and specify the multinomial sampling constraint in terms of the off-diagonal cells. In each example, we set d = lo6 and k = 10V4. The multinomial sampling constraint and the marginal homogeneity constraints are defined as C/3 = h where 101 110 -1 0 0 -1 _; ;] and h= [“i] 101 0 Table 2 Predicted cell frequencies under constraints of marginal homogeneity 1952

1953

0

1

2

0

58.00 58.00 58.00 (7.62)

44.34 40.36 39.30 (2.13)

1.80 1.76 2.43 (1.26)

1

41.08 36.54 37.77 (2.68)

58.00 58.00 58.00 (7.62)

6.49 10.79 10.03 (2.21)

2

5.06 5.58 3.97 (1.96)

3.23 6.97 8.50 (2.12)

9.00 9.00 9.00 (3.00)

Note: The first, second, and third entries in each cell are the modified cm-square, maximum likelihood, and linearized maximum likelihood estimates respectively. Estimated standard errors are in parentheses using fis in (15).

184

D. G. Bonett, J.A. Woodward / Parameter estimation

Table 3 Predicted cell frequencies under constraints

of symmetry

1952

1953

0

1

2

0

58.00 58.00 58.00 (7.62)

42.81 39.00 39.00 (2.14)

2.20 4.50 4.50 (1.43)

1

42.87 39.00 39.00 (2.14)

58.00 58.00 58.00 (7.62)

5.94 7.50 7.50 (1.79)

2

2.20 4.50 4.50 (1.43)

5.94 7.50 7.50 (1.79)

9.00 9.00 9.00 (3.00)

Note: The first, second, and third entries in each cell are the modified &i-square, maximum likelihood, and linearized maximum likelihood estimates respectively. Estimated standard error are in parentheses using 4, in (15).

The predicted frequencies subject to the above constraints 2. The likelihood ratio and modified Pearson statistics respectively with 2 -degrees of freedom. We will now estimate the expected cell frequencies sampling constraint and the constraints of symmetry. defined by letting

cc

I r0

1

1

0

1

0

0 0

0 0

0 1

101 -100 001 000-l

1 0 0

are presented in Table equal 18.65 and 22.06 under the multinomial These constraints are

10 00 -1

0 00

The predicted cell frequencies subject to these constraints are presented in Table 3. Note that the linearized maximum likelihood and maximum likelihood estimates are identical. The likelihood ratio and modified Pearson statistics equal 20.81 and 24.13 respectively with 3 degrees of freedom. Finally, we estimate the expected cell frequencies under the multinomial constraint and the constraint that the number of lambs born in 1952 and 1953 are equal. The constraints are defined by letting 101 -1

1 0

1

-2

10 -1

0I

and

h =

The predicted cell frequencies subject to these constraints are presented in Table 4. The likelihood ratio and modified Pearson statistic equal 0.069 and 0.070 respectively with 1 degree of freedom.

D.G. Bonett, J.A. Woodward

/ Parameter

estimation

185

Table 4 Predicted cell frequencies under constraint of equal number of lambs born in 1952 and 1953 1952

1953

0

1

2

0

58.00 58.00 58.00 (7.62)

50.83 50.83 50.83 (2.35)

0.95 0.96 0.96 (0.94)

1

26.62 26.61 26.61 (3.83)

58.00 58.00 58.00 (7.69)

2.93 2.93 2.93 (1.65)

2

8.38 8.39 8.39 (2.39)

12.29 12.28 12.28 (3.14)

9.00 9.00 9.00 (3.00)

Note: The first, second, and third entries in each cell are the modified &i-square, maximum likelihood, and linearized maximum likelihood estimates respectively. Estimated standard errors are in parentheses using & in (15).

As noted above, all computations are based on d = 106. Letting d be as small as 100 produces inaccuracies in the values of Tables 1, 2, and 3 in the second decimal place. Letting d = lo3 produces inaccuracies in the third decimal place.

5. Summary

The modified &i-square, linearized maximum likelihood, and maximum likelihood estimates are defined for the linear multinomial model. The modified &i-square estimates are important because they can be expressed in closed form and therefore are the easiest to compute. Also, the modified &i-square estimates are required in the computation of the closed-form linearized maximum likelihood estimator and provide BAN starting values for the maximum likelihood iterations. The linearized maximum likelihood estimator has second order efficiency like the maximum likelihood but requires less computational effort. For tests of symmetry and equiprobability, the linearized maximum likelihood and maximum likelihood estimates are identical since the maximum likelihood estimate converges on the second iteration. In fact, for these models, the closed-form estimate given in (12) with the identity matrix replacing D(f) gives the maximum likelihood estimate. It should be noted that while the Newton-Raphson iteration for the log-linear multinomial model converges very rapidly, convergence to the maximum likelihood estimates under marginal homogeneity in the linear multinornial model can be slow (using a convergence criterion of k = 10m4, 6 iterations were required to obtain the estimates in Table 2). Thus, the closed-form linearized

D.G. Bonett, J.A. Woodward / Parameter estimation

186

maximum likelihood estimate presented here is an important alternative to the maximum likelihood estimate from a computational standpoint. The computational formulas given here are expressed in matrix notation and will be particularly useful in the increasingly popular matrix-based languages such as APL, SAS/IML, SPEAKEASY, and GAUSS. The estimation algorithm given here also provides an estimate of the asymptotic covariance matrix of the parameter estimates as a computational by-product. The results presented here represent an extension of earlier work on computational methods of parameter estimation in linear multinomial models. The minimum modified chi-squre estimate has been given by Grizzle, Starmer and Koch [5]. Apparently, the linearized maximum likelihood estimate has not previously been defined for linear multinomial models. As noted above, this estimate is important since it can be expressed in closed form yet possesses the second order efficiency of the maximum likelihood estimate. The penalty function method, rather than the commonly used method of Lagrange, is used here. As shown in Section 2, the method of Lagrange and the penalty function methods become equivalent in linear models as d --) co. In practice, we let d be as large as can be accurately represented in the computing device so that computational results will be virtually identical for both methods. As noted by Graybill [4] and other, the method of substitution may be more simple to implement than the method of Lagrange for special kinds of constraints such as constraints that interaction or main effects equal zero. With simple constraints of this type, one simply generates a design matrix that excludes the columns that code the effects constrained to equal zero. For more complex constraints, the method of substitution requires the computation of a generalized inverse of an orthogonal completion of the constraint matrix C (Graybill [4, p. 1861). The alternative method of Lagrange (or the equivalent penalty function method) will be preferred when the constraints cannot easily be substituted into the model. The results presented here allow both substituted constraints to give a reduced model as in (1) and constraints on the parameter space of (1) using the penalty function method. In contrast, both Haber [6] and Gokhale [3] consider only constraints on the cell probabilities. Thus, in a 2” contingency table for example, a main effects model requires 1024 - 11 = 1013 constraints on the cell probabilities and a matrix of order 1013 must be inverted at each iteration in the algorithms given by Haber [6] and Gokhale [3]. A computationally more efficient method involves the generation of a 1024 X 11 design matrix in (1) containing only a one vector and 10 columns coding the 10 main effects so that a matrix of order 10 is inverted at each iteration of (13). For computational purposes, rather than to compute ( X’D( f ) X + dC’C)-’ in (12) for example, (16) can be used to instead compute

(X’Nf

Lo-’

xC(X’D(

-(X3(

f)X)-'C'(d-'I+

C(X'D(

f)X)-'C')-'

f)K)_'

which requires the inversion of a

q x q

matrix and an r

x r

matrix. For the

D.G. Bonett, J.A. Woodward

/ Parameter

estimation

187

computation example of Section 4, we let X be the identity matrix so that q = 9 and r is small but the inversion of the 9 X 9 matrix (X’D( f )X)-l = (ID( f )I)-’ = D( f )-’ is trivial.

References [l] J. Aitchison and S.D. Silvey, Maximum likelihood estimation of parameters subject to restraints, Annals of Mathematical Statistics 29 (1958) 813-828. [2] Y.M.M. Bishop, S.E. Fienberg and P.W. Holland, Discrete Multivariate Analysis: Theory and Practice (MIT Press, Cambridge, 1975). [3] D.V. Gokhale, Iterative maximum likelihood estimation for discrete distributions, Sankhya B 35 (1973) 293-298. [4] F.A. Graybill, Theory and Application of the Linear Model (Duxbury Press, Boston, 1976). [5] J.E. Grizzle, C.F. Starmer and G.G. Koch, Analysis of categorical data by linear models, Biometrics 25 (1969) 498-504. [6] M. Haber, Maximum likelihood methods for linear and log-linear models in categorical data, Computational Statistics & Data Analysis 3 (1985) l-10. [7] S.J. Haberman, The Analysis of Frequency Data (University of Chicago Press, Chicago, 1974). [S] S.J. Haberman, Analysis of Qualitative Data (vol. 1) (Academic Press, New York, 1978). [9] A.C. Harvey, Z7te Econometric Analysis of Time Series (Wiley, New York, 1981). [lo] G.G. Judge, W.E. Griffiths, R.C. Hill and T.-C. Lee (1980), The Theory and Practice of Econometrics (Wiley, New York, 1980). [ll] C.R. Rao, Criteria of estimation in large samples, Sankhya A 25 (1963) 189-206. [12] T.J. Rothenberg, Approximating the distributions of Econometric Estimators and Test Statistics, in: Z. GriIiches and M.D. Intriligator (Eds.), Handbook of Econometrics, Volume II (North-Holland, Amsterdam, New York, 1984). [13] G.A.F. Seber, Multivariate Observations (Wiley, New York, 1984).

Lihat lebih banyak...

Parameter estimation in linear multinomial models

Descripción

Comentarios