Estimating Kings Ecological Inference Normal Model via EM Algorithm

May 19, 2017 | Autor: R. Silva de Mattos | Categoría: Monte Carlo Simulation, Maximum Likelihood

Descripción

Estimating King’s ecological inference normal model via the EM Algorithm 1

Rogério Silva de Mattos

Álvaro Veiga

Universidade Federal de Juiz de Fora Faculdade de Economia e Administração Juiz de Fora, Brazil [email protected]

Universidade Católica do Rio de Janeiro Departamento de Engenharia Elétrica Rio de Janeiro, Brazil [email protected]−rio.br

Recently, King (1997) introduced a new model for ecological inference (EI), based on a truncated bivariate normal, which he estimates by maximum probability and uses to simulate the predictive densities of the disaggregate data. This paper reviews King's model and its assumption of truncated normality, with the aim to implement maximum probability estimation of his model and disaggregate data prediction in an alternative fashion via the EM Algorithm. In addition, we highlight and discuss important modeling issues related to the chance of non– existence of maximum likelihood estimates, and to the degree that corrections for this non–existence by means of suitably chosen priors are effective. At the end, a Monte Carlo simulation study is run in order to compare the two approaches. Keywords: ecological inference, disaggregate data, exponential families, truncated normal; EM Algorithm.

1. Introduction Ecological Inference (EI) research is concerned with developing techniques for the recovery of information on individual behavior from aggregate data. As such, EI techniques are suitable for a number of problems of disaggregate data estimation and prediction that arise in a variety of areas from the social sciences. The typical problem for which EI techniques were developed is associated to the estimation/prediction of cells contents in a contingency or values tables, when it is known only the marginal totals of the tables, say, the column and row sums. In this paper, we present some findings of a study2 of ours on recent contributions to EI that emerged from political methodology. We will be particularly 1

Prepared for presentation at the 2000 Midwest Political Science Association Meeting (April 27–30). The paper is based on an ongoing research being developed under the Ph.D. thesis work from the first author (http://www.ufjf.br/~rmattos), who acknowledges support from the PICDT−UFJF/CAPES Program and from the Faculdade de Economia e Administração of the Univesidade Federal de Juiz deFora. 2 Our basic motivation to undertake this study are some important challenges, in large part associated to the scarcity of disaggregate data, that some developing countries are facing nowadays to implement a decentralized and more democratic system of public policy management. In the particular case of Brazil, after the decline of a system of highly centralized decisions that marked the military dictatorship period, it was launched a “municipalization” process, by means of which the responsibilities of policy decisions regarding social development, formerly under control of the federal government, have been progressively transferred to state and local authorities. A major challenge to efficient implementation of the new decentralized system is the absence of sub–regional and local level data to guide public policy management and planning at these levels.

2

interested here on the basic parametric model proposed by King (1997). Though innovative in various aspects, King’s model displays, as a major distinct feature from other EI models, the use of a truncated (over a rectangle in R2) bivariate normal distribution to represent the disaggregate data generating process. For this reason, we will be calling it here as the EI normal model (EINM)3. Even being a quite recent contribution, King’s method is now well known among political scientists and other social scientists who work with ecological data and models. Its launching had a sound impact on research by motivating a number of new EI studies, theoretical and applied (Cho, 1997; Rivers and Cho, 1997; Lewis, 1998; Penubarti and Schuessler, 1998, and King, Rosen and Tanner, 1999). However, it also provoked a good amount of controversy and debate (Cho, 1998; Freedman et al, 1998; and King, 1999). Among the various arguments set forth on the discussion, there is a complaint placed against King’s method regarding some lack of practical guidance, as long as the dagnostics and the methodological checklist proposed by this author are not taken to be effective. Disconsidering possible misundertsandings of King’s method, our view is that improvements in diagnostics and practical guidances for using King’s method, or of any new EI method, will develop from a deeper understanding of the intrinsic characteristics of the underlying statistical model. The still preliminary results we present here, since our study is an ongoing research, runs in this direction. We studied a different approach to estimate King's EINM based on the Expectation Maximization Algorithm (EMA). The EMA is an alternative optimization technique for maximizing likelihood/posterior functions in incomplete data problems of estimation. We are not proposing a different EI model, just an alternative way to implement the same maximum likelihood/posterior estimation of the EINM parameters as King (1997) does, though with a slightly different approach to predict the disaggregate data. An important consequence of our efforts was that we had to work with the likelihood function for the unobservable disaggregate data (the complete data in the EMA formalism), which is an essential ingredient for the EMA to undertake the maximization of the aggregate (incomplete) data likelihood (or posterior if a prior is combined). For the new (complete data) likelihood function, we can state in precise terms its statistical properties (like the conditions for existence and non–existence of a stationary point and for a unique maximum), since, as will be shown here, it belongs to a regular exponential family of probability densities. We (and King) were not so lucky with regard to the likelihood function for the aggregate (incomplete) data, which does not belong to an exponential family4. The rest of this paper is organized as follows: In section 2, we state the EI problem and introduce some notation; in Section 3, the EINM is briefly described for reference purposes; in Section 4, the EM Algorithm and its advantages for regular 3

In addition, we shall refer to King’s EI method as the set formed by the EINM, its parameter estimation and its disaggregate data prediction approaches, the diagnostic procedures and the methodological checklist, which are all explained in King (1997). 4 King(1997, p. 311) says: “ There is usually little uncertainty about convergence, which in my experience occurs almost every time. The main exceptions I find are artificially generated data sets that the model does not remotely fit”.

3

exponential family cases is reviewed; in Section 5, the representation of the EINM under the EM Algorithm formalism is introduced; in Section 6, properties of the TBNR and the possibility of non–existence of a solution to the complete data likelihood equations are discussed; in Section 7, corrections for non–existence are considered; in Section 8, a comparison of methods by Monte Carlo simulation is presented; and finally, in Section 9, concluding comments are made.

2 The EI problem In technical terms, the EI problem represents a situation where the analyst/planner is interested in cell data for one or more contingency tables (or values tables), but he/she knows only the row and column totals of the table(s). These totals are called the aggregate (or ecological) data. Analyst's goal is to determine the contents of tables cells. Table 1 depicts this situation for the simplest case of a 2×2 tables problem. Table 1 Representation of the EI problem for the 2×2 tables case

Variable I

Variable II 1

2

Totals

1

Bi

1 − Bi

Xi

2

Wi

1 − Wi

1− Xi

Totals

Ti

1 − Ti

1

where: Bi = proportion of 1st category of variable II in 1st category of variable I; Wi = proportion of 1st category of variable II in 2nd category of variable I; X i = proportion of 1st category of variable I in the total of its two categories; Ti = proportion of 1st category of variable II in the total of its two categories. Variables in Table 1 are defined as proportions, instead of absolute values, because it allows for a direct interpretation of results. For instance, in voting behavior studies, variable I might be race, e.g. blacks and whites, and Variable II might be partisan candidate, e.g. Republican or Democrat; thus, Bi would represent proportion of blacks, and Wi proportion of whites, voting for the Republican candidate. By their turn, X i would be the proportion of blacks, and Ti the proportion of people voting for the Republican candidate, both in total turnout of voting age population. The notation in Table 1 is general and applicable to a variety of contexts: In economics, variable I might be levels of family income, and variable II number of goods

4

purchased; in sociology, variable I might be number of crimes by city regions, and variable II number of crimes by type; in transport planning, variable I might be number of residents by residential colonies, and variable II number of jobs by trade areas. Thus, EI techniques are suited for a wide range problems where the lack of disaggregate information is a major drawback. EI research is also concerned with developing techniques for R×C tables problems, though implementation of such techniques are still limited with this respect (King, 1999). In Table 1, Ti and X i represent the known aggregate data. Subscript i indexes the tables, or sample units, ranging from 1 to P − the number of tables used in the EI analysis. By their turn, Bi and Wi represent the unknown disaggregate data and the target of EI problem solving; for this reason, Bi and Wi are called the quantities of interest. Once they become known, the contents of all cells in all tables also become known. Figure 1 illustrates this distinct feature of EI techniques, as compared to others that allow for just one table at a time.

B1

1 − B1

X1

W1

1 − W1

1− X1

T1

1 − T1

1

Bi

1 − Bi

Xi

Wi

1 − Wi

1− Xi

Ti

1 − Ti

1

BP

1 − BP

XP

WP

1 − WP

1− X P

TP

1 − TP

1

Figure 1: The use of various tables. Adapted from Mattos and Veiga (1999). The EI problem is solved in such a way that the cells contents in all tables are determined simultaneously. The proportions appearing outside the 2×2 tables of Figure 1, say, the pairs { (Ti , X i ) : i = 1,...,P}, are known aggregate data used to estimate each quantity of interest in each table. Various tables allows the use of more observations (each table is a sample unit), and for "borrowing strength" from the information in other tables, when cells values of a particular table are estimated. This latter aspect may bring- efficiency to estimation, if it happens that all tables have "something in common" (King, 1997). In practice, not always exists such a communality, at least among all the P tables, and King's model admits extensions that allows for different mean patterns of the quantities of interest in different tables.

5

3 King's EI normal model In order to predict the quantities of interest, King (1997) proposed a new method that makes extensive use of the available information in the EI problem. Indeed, King’s EINM is a statistical model that combines deterministic features of the EI problem with mathematical and probabilistic assumptions. For purposes of further reference, we described it briefly in this section.

3.1 Model Features The deterministic features of the EINM consist the following facts: a) Accounting identity: Ti = Bi X i + Wi (1 − X i ) is a (everywhere) true relationship between the two aggregate Ti and X i variables, mediated by the disaggregate Bi and Wi ones, that is valid for any table like Table 1; b) Cells bounds: given the aggregate data, the quantities of interest may belong to narrower intervals than [0,1] in a way that is easy to be computed; it is important for it may promote substantial reduction in the uncertainty regarding the prediction of the cells values. Using the notations L and U to denote lower and upper bounds, respectively, for a quantitiy of interest, we have that Bi ∈ [ Lbi ,U ib ] and Wi ∈ [ Lwi , U iw ] , where the bounds of these intervals are computed as5:  T − (1 − X i )   ≥ 0 Lbi = max 0, i Xi   T  U ib = min i ,1 ≤ 1  Xi 

 T − Xi  ≥0 Lwi = max 0, i X i    T  U iw = min i ,1 ≤ 1 1− X i 

(3)

For more details and examples, see Duncan and Davis (1953), who introduced the method of bounds in the EI literature, Anchen and Schively (1995, pp. 190193), King (1997) and Mattos and Veiga (1999). By their turn, the probabilistic features of the EINM consist of the following assumptions: 1. Non–stochastic regressors: Xi: i = 1,...,P are non–random deterministic variables; 2. Truncated normality: ( Bi , Wi ) T : i = 1,...,P, follows a bivariate normal distribution truncated on the unit square [0,1] × [0,1] ∈ R 2 , as: N bw (bi , wi | ψ ) f bw (bi , wi | ψ ) = i = 1,...,P R(ψ ) where ψ = [ µ b , µ w , σ b2 , σ w2 , ρ ]T and: 5

(4)

The formulas for the accounting identity in a) and for the method of bounds in (3) formulas can be generalized to R×C tables (King, 1997, Chapter 15).

6

1 1 R(ψ ) = ∫ ∫ N bw (bi , wi | ψ )dbi dwi

(5)

0 0

is a normalizing factor that assures f bw integrates to one (the symbol "∪" over the parameters in (5) is used to indicate they are from an untruncated distribution, which was truncated to produce (5)); 3. Constant means: ( µ b , µ w ) T : i = 1,...,P do not depend functionally on the regressors X i and 1 − X i ; (here the notations µ b and µ w are not covered by a "∪" symbol to indicate they are means of the truncated variables); 4. Spatial independence: The conditional random variable Ti | X i is independent across different tables or sample units.

3.2 Estimation and Prediction From the facts and assumptions above, King (1997) developed an implementation of the EINM that works in two stages: First, model parameters are estimated; and second, the estimated model is used to generate point and interval predictions of the disaggregate data variables. 1. Parameters estimation: The likelihood function based on the observed aggregate data t = [t1 , , t P ]T is given by: P S (ψ , t i , xi ) L(ψ | t ) = ∏ N ti µ i (ψ , xi ), σ i2 (ψ , xi ) t i ∈ [0,1] (6) R(ψ ) i =1

(

)

where N ti is the untruncated normal density of the aggregate data variable Ti , µ i and σ i2 are both functions of ψ and xi ; S (ψ , t i , xi ) is the normalizing factor for f b|t (bi | t i ,ψ ) (which is a doubly truncated normal density); and R(ψ ) is the TBN factor defined in (5). However, a reparameterezation φ = c(ψ ) is adopted, where φ = [φ1 , φ 2 , φ 3 , φ 4 , φ 5 ]T is such that: µ b − 0.5 φ1 = 2 − ∞ < φ1 < ∞ (7) σ b + 0.25 µ − 0.5 φ 2 = 2w − ∞ < φ2 < ∞ (8) σ w + 0.25 φ 3 = ln σ b − ∞ < φ3 < ∞ (9) φ 4 = ln σ w − ∞ < φ4 < ∞ (10) 1+ ρ  φ5 = 0.5 ln (11) − ∞ < φ3 < ∞  1− ρ  In addition, the likelihood is combined with a bayesian prior for φ , what in turn leads to a posterior function:

(

p (φ | t ) ∝ p (φ ) L c −1 (φ ) | t

)

(12)

7

where p(φ ) is a prior density for φ . Then, parameter estimates φˆ are obtained by maximizing (12) for φ . This maximization is carried out through an iterative search algorithm, because the posterior (12) (and the likelihood) is implicitly non linear in φ . So, the produced estimates are:

φˆ = arg max p(φ | t )

(13)

φ

2. Disaggregate data prediction: Given the estimated parameter vectors φˆ and θˆ = c −1 (φˆ) , the conditional (on t and φˆ ) distributions p(b | t , φˆ) and a normal i

i

i

approximation to the posterior function p(φˆ | t ) are used in composition to simulate the conditional (on t i only) predictive densities pb (bi | t i ) and p w ( wi | t i ) for the disaggregate data variables; then, point and interval predictions for Bi and Wi are produced by computing the mean and the standard deviations using simulated values for those predictive densities, as follows: 1 Bˆ i = K

K

~ ∑ Bi ( k ) k =1

S Bˆ = i

1 K

∑ (B K

~ i(k )

k =1

(

− Bˆ i

)

(14)

)

(15)

2

2 1 K ~ 1 K ~ Wˆ i = ∑ Wi ( k ) SWˆ = Wi ( k ) − Wˆ i ∑ i K k =1 K k =1 ~ ~ where Bi ( k ) and Wi ( k ) are the simulated values for Bi and Wi .

An extended version of the EINM presented above was also developed by King (1997) that allows for the effects of explanatory variables. Since we will be discussing here an implementation only for the basic (with no explanatory variables) version of the EINM, we refer the reader to King (1997, Chapter 9) for more details on the extended version.

4 The EM Algorithm The Expectation–Maximization Algorithm (EMA) is an optimization device for the maximization of likelihood/posterior functions in incomplete data problems. In principle, it is no more than an alternative to other optimization algorithms (like Newton Hapson and Quasi–Newton based algorithms) specifically designed for finding modes of incomplete data likelihoods/posteriors. The principles of the EMA have been applied in statistical analyses for a long time, but it was after the seminal paper from Dempster, Laird and Rubin (1977), in short DLR, that the EMA saw a widespread of its use. DLR introduced a structured formalism to apply the methodology, proved its mains useful properties, and gave it the name by which it is well known today. Since DLR’s paper, the EMA theory has evolved in important aspects. For instance, its formalism has been exploited as a data augmentation technique (Tanner,

8

1996) and applied beyond specific incomplete/missing data environments.6 Also, a variety of extensions like the Monte Carlo EM and the Expected Conditional Maximization (ECM) algorithms, together with other practical improvements for speeding up the convergence of the EMA, and for computing the variance–covariance matrix of estimated parameters were developed. Comprehensive reviews of such developments can be found in a recent book from McLahlan and Krishnan (1997), and in Meng and van Dyk (1997), the last being a celebrating paper for the 20th anniversary of DLR’s paper. The EMA can be useful to tackle the EI problem, since it can be suitably described as an incomplete data problem. In particular, the EMA is of value for the estimation of EI models because, in addition to providing estimates of models parameters, it also produces predictions of the quantities of interest. And also, it induces the exploitation of the modeling assumptions regarding the disaggregate data generating process, which can increase comprehension of intrinsic features of an EI model.

4.1 EMA concepts and functioning EMA concepts are easy to be understood. In briefly explaining them here, we will take an incomplete/missing data perspective (and not the general data augmentation) point of view. Key concepts of the EM Algorithm formalism are those of complete and incomplete data. Sometimes, we are interested on a random variable for which observations are unavailable or taking measurements is impossible. It represents a variety of situations: For instance, a time series missing data for some periods; a survey research database with non-responded entries, or a set of variables for which we known only their sum. In a word, the complete data can be viewed as the sample information we should have to estimate the parameters of a model; actually, however, we cannot observe that. By its turn, the incomplete data is an associated sample we can observe, but whose information content for parameter estimation is lower than that of the complete data. Implicit in the relation between the two forms of the data there is a many–to–one mapping linking them, in the sense that usually there are lots, even an infinity, of (unobservable) complete data samples associated to each single (observable) sample of incomplete data. In the EI problem, for instance, we observe a single sample of aggregate data (row and column totals for a number of tables); but, we know there may be many, indeed an infinity, of samples of unobserved dissaggregate data (cells contents) consistent with our (unique) sample of observed aggregate data. In a more formal fashion, suppose we are interested in a phenomenon described by a continuous (it might be also a discrete) random variable X that follows a known probabilistic model f X ( x * | θ ) , with x * an observation from X and θ a vector of 6

The data augmentation view of the EMA is a broader perspective taken on this technique. As stressed by Martin Tanner: “In the EM [algorithm], the data analyst augments the observed data with latent data to simplify the computations in the analysis. These latent data may be ‘missing’ data (...), parameter values (...) and values of sufficient statistics.” (Tanner, 1996: p.2). We believe that the particular incomplete/missing data perspective is the most sound way to look at the EI problem, since what we are searching, ultimately, is for the unobserved (thus, not available) disaggregate data.

9

unknown parameters. Suppose, in addition, that we cannot observe X; however, we can write its likelihood function f ( x | θ ) , where x = [ x1* ,, x P* ]T represents a nonobservable random sample from X. Vector x is called the complete data, and f ( x | θ ) the likelihood of the complete data (LCD). Assume also that we can observe the sample vector y = [ y1 ,, yQ ]T , which is related to x in a deterministic way such that: y = h(x) and Q < P. Vector y is called the incomplete data. Now, let Ξ be the sample space for x, and Υ the sample space for y. Since Q < P, it is easy to have many x ∈ Ξ associated to each y ∈ Υ , so that h : Ξ → Υ is a many-to-one mapping. Say, a given single point y ∈ Υ has associated to it a subset of Ξ − namely Ξ( y ) , the inverse image of y under the mapping h − containing many points x ∈ Ξ . This situation is illustrated in Figure 3.

Figure 3. Many–to–one mapping From the above, we are allowed to write the likelihood of the incomplete data (LID) as: g( y | θ ) = ∫

( y )

f ( x | θ )dx

(16)

What the EM Algorithm does is to find estimates of θ by using the incomplete data y and the form of the LCD to indirectly maximize the LID. It is not undertaken at once, but in a sequence of iterations, where, in each iteration, two steps are performed: The Expectation Step (E-Step), and the Maximization Step (M-Step). The general EM scheme is as follows: S1. Assume a guess for θ , say θ k ; S2. E-Step: Use θ k and the observed complete data y to compute: Q(θ ,θ k ) = E [log f ( x | θ ) | θ k , y ] =∫

Ξ( y)

log f ( x | θ ) × f ( x | θ k , y )dx

(17)

S3. M-Step: Maximize Q(θ ,θ k ) for θ , finding:

θˆ = arg max Q(θ ,θ k ) θ

S4. Set θ k +1 = θˆ ; S5. Repeat steps S1−S4 a number of times until convergence, say, until:

(18)

10

log g ( y | θ k +1 ) − log g ( y | θ k ) ≈ 0

(19)

θ k +1 − θ k ≈ 0

(20)

and/or: The EM Algorithm scheme S1-S5 presents good convergence properties: For instance, provided that Q(θ k +1 ,θ k ) ≥ Q(θ k ,θ k ) , as is guaranteed in the M–Step, the LID never decreases in each iteration; and if, in addition, the LID is bounded above, the EMA will always converge to a stationary point. We can also use, in place of the LCD and the LID defined above, the posterior for θ based on the complete data (PCD), written as p (θ | x) ∝ p (θ ) f ( x | θ ) and the posterior for θ based on the incomplete data (PID), written as p (θ | y ) ∝ p (θ ) g ( y | θ ) . For a detailed treatment of the EM Algorithm in bayesian analysis, see Gelman et al (1995, Chapters 7 and 9).

4.2 EMA for exponential families The most general way to implement the EMA, say, for any probability model, involves the direct computation of the Q–function using the definition in (17). If the Q–function is easy to be computed in the E–Step and there exists a closed form solution for the maximization undertaken in the M–Step, then the EMA can be attractive (DLR, 1977). Otherwise, the opposite may be true. Particular instances where the EMA tends to be simpler and attractive occurs when the LCD is from a regular exponential family of probability densities. In such a case, we can write the LCD as:

(

exp θ T z ( x) f ( x | θ ) = b( x ) a(θ )

)

(21)

where x ∈ X ; θ ∈ Θ ⊆ R d ; b( x) ≥ 0 is some real valued function of x; z (x) is a minimal vector of jointly sufficient statistics for the d–dimensional canonical parameter vector θ , and each element of z (x) is a real valued function of x . Consider also the parameter space Θ ⊆ R d is a convex open set (Barndorff–Nielsen, 1978). By taking the conditional expected log on both sides of (21), we find: Q(θ ,θ k ) = E [log b( X ) | y,θ k ] + θ T E[z ( X ) | y,θ k ] − log a(θ )

(22)

Now, by differentiating (22) with respect to θ , equating the result to 0 and making a few algebraic manipulations, we arrive at the following modified7 system of likelihood equations: E [z ( X ) | θ ] = E[z ( X ) | y,θ k ]

(23)

Expression (23) implies in a great simplification of the EMA. In order to maximize the Q–function at each iteration, we need only solve the system above by using, in the right hand side, the expected values of the (complete data based) sufficient statistics z (x) 7

We use the term “modified” because in a true system of likelihood equations from a distribution of an exponential family, the right hand side is given by the observed sufficient statistics, and not by their conditional expected values, as is the case here.

11

conditioned on the observed incomplete data y and on the instantaneous parameters guess θ k . For a number of regular exponential families, E [z ( X ) | y,θ k ] is easy to compute and the solution to the equation system (23) exists in closed form. The consequence is that the E–Step and the M–Step of the EMA scheme S1–S5 may be substituted by the simpler steps: S.2* E–Step (exponential family) – Use θ k and y to compute: z k = E [z ( X ) | y,θ k ]

(24)

S.3* M–Step (exponential family) – Maximize Q(θ ,θ k ) for θ , by solving the equation system: E [z ( X ) | θ ] = z k (25) Even in some cases where no closed form exists for the equations system in (23) (or (25)), iterative search algorithms may run fast in the M–Step due to the well known property of strict convexity of minus the logarithm of a likelihood function belonging to a regular exponential family (Barndorff–Nielsen, 1978). All we discussed above regarding likelihoods is valid also for posterior functions. But, in spite of the simplicity gained if the PCD or the LCD belongs to a regular exponential family, some difficulties remain. As pointed by DLR (1977; p.4), the equations system (23) does not always have a solution for θ in Θ ; in these cases, the maximand vector θˆ stays somewhere at the boundary of Θ (or may not converge at all). However, provided that a solution exists for θ in Θ , this must be unique due to the strict convexity property mentioned in the last paragraph.

5 The EINM under the EMA formalism A major reason making the EMA attractive for EI is that it produces, jointly with the parameter estimates, predictions for the disaggregate data variables. According to the EMA scheme S1-S5, it is done in every E-Step performed in every iteration. Of interest are those complete data estimates (which correspond to the predictions of the disaggregate data variables) generated at the last EMA iteration, or after convergence has been achieved. By this property, a path is open to the use of the EMA with EI models. For instance, to write King’s EINM under the EM formalism, we may assume: 1. Complete (disaggregate) data vectors: b = [b1 ,..., bP ]T and w = [ w1 ,..., wP ]T ; 2. Incomplete (aggregate) data vector: t = [t1 ,..., t p ]T ;

3. Parameter vector: ψ = [µ b , µ w , σ b2 , σ w2 , ρ ]T (or φ = [φ1 , φ 2 , φ 3 , φ 4 , φ 5 ]T ); 4. Many to one mapping:8 t = h(b, w) = Xb + [I P − X]w ; where IP is the P×P identity matrix, and X = diag ([ x1 ., x P ]T ) P 5. LCD: f CD (b, w | ψ ) = ∏i =1 f bw (bi , wi | ψ ) ; 8

Note the many-to-one mapping is given by the accounting identity, presented in a generalized form in item 4.

12

P 6. LID: f ID (t | ψ ) = ∏i =1 f ti (t i | ψ ) ; 7. PCD: pCD (ψ | b, w) ∝ p (ψ ) f CD (b, w | ψ ) ; 8. PID: p ID (ψ | t ) ∝ p(ψ ) f ID (t | ψ ) ; 9. Q-Function: Q(ψ ,ψ k ) = E β [log pCD (ψ | b, w) t ,ψ k ]

5.1 Using properties of the exponential family A central feature of the EINM is that the disaggregate data variables follow a TBN with support over the unit square. Such a TBN is a particular case of a bivariate normal truncated over a rectangular region (TBNR) in Euclidean Space. Since a TBNR belongs to a regular exponential family, the joint density f CD (b, w | ψ ) , which is a product of TBNRs, also belongs to a regular exponential family. Indeed, it is easy to check that this last function can be written as: exp(c(b, w,ψ ) ) f CD (b, w | ψ ) = q (b, w) (26) a(ψ ) which is the exponential family representation for a multivariate density. Using the TBNR assumption, let us write it as: exp(g (b, w;ψ ) ) f CD (b, w | ψ ) = (27) P 2π | Σ(ψ ) | R(ψ )

(

)

where:  b − µ  i b ∑ σb i =1   P

g (b, w;ψ ) = −

2  b −µ  + 2 ρ  i b   σb

 w − µ  i w  σ w 2 1− ρ 2

(

)

2   wi − µ w    +    σ w    

and note that g( ,) can be written as: g (b, w;ψ ) = c(b, w;ψ ) − k (ψ )

(28)

(29)

where:

µ bσ w2 − µ w ρσ bσ w P µ wσ b2 − µ b ρσ bσ w P c(b, w;ψ ) = bi + wi + ∑ ∑ | Σ(ψ ) | | Σ(ψ ) | i =1 i =1 − σ b2 P 2 − σ w2 P 2 ρσ bσ w P + ∑ bi + ∑ wi + ∑ bi wi | Σ(ψ ) | i =1 | Σ(ψ ) | i =1 | Σ(ψ ) | i =1  µ b2σ w2 − 2 µ b µ w ρσ bσ w + µ w2σ b2  k (ψ ) = P   2 | Σ(ψ ) |   Substituting (29) in (27), we find: exp(c(b, w;ψ ) ) f CD (b, w | ψ ) = P exp k (ψ ) 2π | Σ(ψ ) |R(ψ )

(

)

(30)

(31)

(32)

13

Now, by defining q(b, w;ψ ) = 1 and a (ψ ) = exp k (ψ ) × [2π | Σ(ψ ) R(ψ )] P , then (32) and (26) are the same. Take into account that c(b, w,ψ ) = γ (ψ ) T z (b, w) we can go a step further and write (32) in a canonical form: f CD (b, w | ψ ) =

(

)

exp γ (ψ ) T z (b, w) exp k (ψ ) 2π | Σ(ψ ) |R (ψ )

(

)

P

(32’)

In (32’), z (b, w) is a 5–dimensional vector of minimal sufficient statistics for γ = [γ 1 , γ 2 , γ 3 , γ 4 , γ 5 ]T , where γ = m(ψ ) is a vector of canonical parameters and m : Ψ → Υ is a bijection from the parameter space Ψ (of ψ ) to the canonical parameter space Υ (of γ ). The vector of sufficient statistics is given by: z (b, w) =

[∑ b , ∑ w , ∑ b ∑ w , ∑ b w ] i

i

2 i

and the canonical parameter vector γ : µ bσ w2 − µ w ρσ bσ w γ1 = | Σ(ψ ) | µ σ 2 − µ b ρσ b σ w γ2 = w b | Σ(ψ ) | − σ w2 γ3 = 2 | Σ(ψ ) | − σ b2 γ4 = 2 | Σ(ψ ) | ρσ b σ w γ5 = | Σ(ψ ) | where9:

| Σ(ψ ) |= σ b2σ w2 (1 − ρ 2 )

2 i

T

i

i

− ∞ < γ1 < ∞

(34)

−∞

Lihat lebih banyak...

Estimating Kings Ecological Inference Normal Model via EM Algorithm

Descripción

Comentarios