Approximately Exact Inference in Dynamic Panel Models: a QUEST for Unbiasedness

Share Embed


Descripción

Working Paper Series _______________________________________________________________________________________________________________________

National Centre of Competence in Research Financial Valuation and Risk Management Working Paper No. 305

Approximately Exact Inference in Dynamic Panel Models: A Quest for Unbiasedness

Simon Broda

Marc S. Paolella

Yianna Tchopourian

First version: June 2006 Current version: June 2006 This research has been carried out within the NCCR FINRISK project on “Financial Econometrics for Risk Management” ___________________________________________________________________________________________________________

Approximately Exact Inference in Dynamic Panel Models: a QUEST for Unbiasedness Simon A. Brodaa ∗ a

Marc S. Paolellaa

Yianna Tchopouriana

Swiss Banking Institute, University of Zurich, Switzerland

First Draft: Current Draft:

June 2006 June 2006

Abstract Based on the notion of a quantile unbiased estimating function, this paper develops a general method for conducting exact small-sample inference in models which allow the estimator of the (scalar) parameter of interest to be expressed as the root of an estimating function, and which is particularly simple to implement for linear models with a covariance matrix depending on a single parameter. Our method of quantile unbiased estimation, or QUEST, involves the computation of tail probabilities of the estimating function. In the context of dynamic panel models, both the least squares and maximum likelihood paradigms give rise to estimating functions involving sums of ratios in quadratic forms in normal variates, the distribution of which cannot be straightforwardly computed. We overcome this obstacle by deriving a saddlepoint approximation that is both readily evaluated and remarkably accurate. A simulation study demonstrates the validity of the procedure, and shows the resulting estimators to be vastly superior over existing ones.

Keywords: Dynamic Panel Data, Bias Correction, Estimating Equation, Saddlepoint Approximation.

∗ Corresponding author. E-mail address: [email protected]. Part of the research of M. Paolella and Y. Tchopourian has been carried out within the National Centre of Competence in Research “Financial Valuation and Risk Management” (NCCR FINRISK), which is a research program supported by the Swiss National Science Foundation.

1

Introduction

Dynamic panel data (DPD) models receive a considerable amount of attention in both the theoretical and applied literature (see, for example, the references in Arellano, 2003). Due to its tractability and wide applicability, the first–order DPD model is by far the most popular. Since the seminal work of Anderson and Hsiao (1981), the literature has mainly focused on generalized method of moments (GMM) procedures for its estimation. The first–difference GMM estimator introduced by Arellano and Bond (1991) is now one of the standard procedures used in empirical applications. Ahn and Schmidt (1995) exploit additional moment conditions in the presence of exogenous variables. More recently, Kruiniger (2002) examines various estimators, under different specifications of the individual effects, and derives conditions of consistency. A recent addition to this strand of literature is Moon and Phillips (2004). One of the reasons why GMM procedures enjoy such popularity is the “incidental parameters” problem and associated asymptotic bias, which was first discussed by Neyman and Scott (1948) and is pertinent to the least squares or maximum likelihood estimation of (fixed effects) DPD models (Nickell, 1981). Addressing this issue, Hsiao et al. (2002) propose a transformed likelihood approach along with a minimum distance estimator, which they demonstrate to outperform other commonly used estimators. Choi et al. (2004) propose a bias reduction technique based on recursive mean adjustment, and which is applicable to panel AR(p) models. In some recent publications, attempts have been made at correcting the bias of the least squares estimator (Kiviet, 1995; Hahn and Kuersteiner, 2002; Bun and Carree, 2005). These methods rely heavily on asymptotic results, typically as the number of individuals tends to infinity. As Hahn and Kuersteiner (2002, p. 1647) note, Unfortunately, our bias-corrected estimator does not completely remove the bias. This suggests that an even more careful small sample analysis based on higher order expansions of the distribution might be needed to account for the entire bias. The present manuscript is geared at providing such an analysis. We develop the concept of quantile unbiased estimation as a general method for median unbiased point estimation and the construction of exact confidence intervals, and apply it to the first–order DPD model. A saddlepoint approximation for the tail probabilities of the requisite estimating functions obviates the need for burdensome simulations as used by Phillips and Sul (2003). The work of these latter authors is in line with our paper in that it relies on median–bias correction. However, their methodology is based on the least squares estimator of the model, whereas our approach relies on exact maximum likelihood estimation, and enables us to allow both for a more general set of exogenous regressors and for non-homogenous individual error variances. Use of our otherwise exact inferential procedure in conjunction with the proposed saddlepoint approximation gives rise to the seemingly contradictory nomenclature approximately exact inference, a term originally coined by Strawderman and Wells (1998). 1

The remainder of this paper is organized as follows: Section 2 presents the general method for point and interval estimation. Section 3 introduces the dynamic panel model. Sections 4 and 5 apply the estimation methodology to the least squares and maximum likelihood estimation of the model, respectively. Section 6 contains numerical results. Section 7 concludes.

2

Quantile Unbiased Estimation

This section develops a general procedure for conducting exact inference in models that allow the estimator of the parameter of interest to be defined as the unique root of an estimating function. The approach generalizes the approach of Andrews (1993) and is related to the adjusted profile likelihood of McCullagh and Tibshirani (1990). In contrast to the latter, our approach uses quantiles, rather than moments, of the distribution. This has two advantages: i) under certain conditions, the resulting estimator is exactly median unbiased, as opposed to approximately mean unbiased, ii) it facilitates construction of confidence intervals. Consider a parametric model {Υ, θ}, where Υ is the data, θ ′ = (θ, δ ′ ), θ ∈ [θ, θ] is the scalar

parameter of interest, δ is a (possibly empty) set of nuisance parameters, and θ, θ are possibly infinite. Consider an estimator of θ defined as the (unique) root of a continuously differentiable ˆ estimating function η(θ, Υ) that does not involve δ, i.e., θ(Υ) solves the estimating equation

ˆ θ(Υ) =

  θ, if η(θ, Υ) < 0,   

θ, if η(θ, Υ) > 0,     c : η(c, Υ) = 0, otherwise.

(1)

Let Prθ (B) and Medθ (X) denote the probability of B and the median of X if the true parameter is θ, respectively. In analogy to the notion of (mean) unbiased estimating functions, it is natural to call an estimating function η median unbiased if ¡ ¢ Medθ η(θ, Υ) = 0.

More generally, if for a fixed value q ∈ (0, 1), η satisfies

¡ ¢ Prθ η(θ, Υ) ≤ 0 = q,

(2)

we shall refer to it (and to the corresponding estimating equation (1)) as 100q% quantile unbiased. Our motivation to consider quantile unbiased estimation is the following. It is well known that mean unbiased estimating functions do not, in general, lead to mean unbiased estimators; ¯ : η(c, Υ) = 0} and make the additional assumption if, however, we denote by C(Υ) ≡ {c ∈ (θ, θ)

∀c ∈ C(Υ),

¯ ¯ ¯C(Υ)¯ ≤ 1 almost surely, and

d η(c, Υ) < 0 almost surely, dc 2

(A1)

then if η is 100q% quantile unbiased, it follows that ³ ´ ³ ´ ˆ q = Prθ η(θ, Υ) ≤ 0 = Prθ θ(Υ) ≤θ ,

(3)

i.e., θˆ is a 100q% quantile unbiased estimator for θ. In particular, if η satisfies (2) with q = 0.5, then its unique root is a median unbiased estimator of θ, while with q = (1 ± τ )/2, it constitutes

the left (right) endpoint of an equal–tails 100τ % confidence interval.

The following proposition shows how a quantile unbiased estimating function can be constructed for any value of q ∈ (0, 1), and will serve as the main tool in the construction of our

estimators. Here and in the sequel, the dependence of η on the data will not generally be made

explicit; rather, if Υ appears explicitly, then η(c, Υ) will be understood as the (observed) sample value of the corresponding statistic. Proposition 1. Let η(c) : (θ, θ) 7→ R be a continuously differentiable estimating function for θ.

Assume that, for all c, its distribution function is constant in δ and denote it by Fη(c) (·; θ). Then −1 η ∗ (c) ≡ η(c) − Fη(c) (q; c)

(4)

is a 100q% quantile unbiased estimating function for θ. If, in addition, η ∗ satisfies (A1), then its unique root is a 100q% quantile unbiased estimator for θ. Proof. For all values of c, ³ ´ ³ ´ ³ ´ −1 −1 Prc η ∗ (c) ≤ 0 = Prc η(c) ≤ Fη(c) (q; c) = Fη(c) Fη(c) (q; c); c = q,

which, in particular, also holds for c = θ, i.e., η ∗ satisfies (2). The second assertion follows at once from (3).

¥

The root of the unbiased estimating function (4), say θˆq , can also be expressed as ¡ ¢ c : Prc η(c) ≤ η(c, Υ) = q,

(5)

which will be convenient for our purposes as it obviates the need to calculate the inverse distribution function appearing in (4). It is important to note that in (5), c occurs both as the argument of the estimating function and as the (hypothesized) true parameter. We close this section with a few remarks concerning related schemes of bias correction. Firstly, if estimator θˆ can be expressed in closed form, then it can be written as the solution to ˆ θ(Υ) − c = 0, and θˆq solves ³ ´ ˆ = q. c : Prc θˆ ≤ θ(Υ)

In this special case, our technique yields the same estimator as that used by, e.g., Andrews (1993) and Phillips and Sul (2003). Their requirement that the quantile function of θˆ be strictly 3

increasing in θ translates into our assumption (A1). As noted by Andrews (1993), it is not apparent how this can be formally proven. However, for our model, numerical results appear to confirm this assumption. Secondly, it appears natural to construct another bias-corrected point estimator by replacing −1 Fη(c) (q; c) in equation (4) by Ec [η(c)], i.e., the expected value of η(c) if the true parameter is c,

thus giving rise to the mean unbiased estimating function η ∗∗ (c) ≡ η(c) − Ec [η(c)] . This is the idea behind the adjusted profile likelihood of McCullagh and Tibshirani (1990), except that here, we are concerned with a general estimation function that need not necessarily be a profile score function. We shall refer to the resulting estimator as mean adjusted and denote it by θˆMean . If the estimator in question is closed-form, θˆMean solves h i ˆ c : θ(Υ) − c − Ec θˆ − c = 0,

(6)

which is the nonlinear–bias–correcting estimator of MacKinnon and Smith (1998).

3

The Model

We consider a first–order DPD model, with or without fixed effects. For each of the N ∈ N+

individuals, the model is characterized by an observed and a latent equation, given respectively by ℓ yi,t = x′i,t β + yi,t , ℓ ℓ + ui,t , = αyi,t−1 yi,t

t ∈ {0, . . . , T },

(7)

t ∈ {1, . . . , T },

where α ∈ (−1, 1], xi,t = (x1i,t , . . . , xki,t )′ is a vector of regressors with k < N T , β = (β1 , . . . , βk )′ , ³ ´ iid σi2 ℓ ∼ the error components ui,t ∼ N(0, σi2 ), and each initialization yi,0 if α ∈ (−1, 1) N 0, 1−α 2

and an arbitrary constant or random variable if α = 1. In matrix form, the observed and latent

panels become Y0 = X0 β + Y0ℓ , ℓ Yℓ = αY−1 + U,

respectively, where £ ¤′ Y0 = Y1,0 ′ , . . . , YN,0 ′ , Yi,0 = [yi,0 , . . . , yi,T ]′ , X0 = [X′1,0 , . . . , X′N,0 ]′ , Xi,0 = [xi,0 , . . . , xi,T ]′ , i′ h ′ i′ h ′ ′ ′ ℓ ′ ℓ ℓ ℓ ℓ ℓ , . . . , YN,−1 ], Yℓ = Y1ℓ , . . . , YN , Yiℓ = yi,1 , . . . , yi,T , Y−1 = [Y1,−1 i′ i′ h i′ h h ℓ ℓ ℓ ℓ ℓ ′ ℓ ′ ℓ ℓ ℓ , Y = y , . . . , y , Y = Y , . . . , Y Yi,−1 = yi,0 , . . . , yi,T i,0 i,0 i,T , 0 1,0 N,0 −1 4

and X0 is assumed to have full column rank. By combining the observable and latent equations the model can equivalently be written yi,t = αyi,t−1 + x′i,t β − x′i,t−1 βα + ui,t ,

t = 1, . . . , T,

(8)

or, in matrix form, Y = αY−1 + Zγ + U,

(9)

′ ′ where γ = [β ′ , −β ′ α]′ , Y−1 = [Y1,−1 , . . . , YN,−1 ]′ , Yi,−1 = [yi,0 , . . . , yi,T −1 ]′ , Z = [X, X−1 ], X =

[X′1 , . . . , X′N ]′ , Xi = [xi,1 , . . . , xi,T ]′ , X−1 = [X′1,−1 , . . . , X′N,−1 ]′ , and Xi,−1 = [xi,0 , . . . , xi,T −1 ]′ . We are particularly concerned with the following two special cases: I) σi2 = σ 2 ∀i (arbitrary regressors, identical variance) II) X0 = IN ⊗ X1,0 , β = (β ′1 , . . . , β ′N )′ (identical regressors, arbitrary variance) Here and in the sequel, we denote by ⊗ the Kronecker product.

4

Estimation by Least Squares

4.1

Case I: Arbitrary Regressors, Identical Variance

This section applies the method of quantile unbiased estimating functions to the least squares estimation of the model with arbitrary regressors and equal individual variances, thus slightly generalizing the procedure of Phillips and Sul (2003), and embedding it in our methodology. From the Frisch–Waugh theorem, the least squares estimator α ˆ LS can be expressed as c : η1 (c) ≡

′ MY Y−1 − c = 0, ′ MY Y−1 −1

(10)

where M = IN T − Z (Z′ Z)−1 Z′ . Generalizing a result of Phillips and Sul (2003), it is shown in

the appendix that η1 has distribution free of nuisance parameters, as required by Proposition 1. 4.1.1

Quantile Unbiased Estimation

Computation of (5) requires a method of evaluating the distribution function of η1 . With Υ = {Y0 , X0 }, θ = α, and δ = {β, {σi }}, it is shown in the appendix that the quantile unbiased estimator solves

Prc

µ

¶ U′0 AU0 ˆ ≤ θ(Υ) − q = 0, U′0 BU0

ˆ where matrices A = A(c) and B = B(c) are as in (35), and θ(Υ) =

(11) ′ MY Y−1 ′ Y−1 MY −1

is the observed

OLS estimator. While Andrews (1993) uses the Imhof (1961) algorithm to evaluate the requisite distribution function and Phillips and Sul (2003) resort to simulation, we replace these time-consuming 5

processes by a saddlepoint approximation. More generally, and as will be required in the nonhomoscedastic case, it is shown in the appendix that an approximation to the distribution function of the mean of N i.i.d. ratios of quadratic forms in standard normal variates is given by

Pr where

Ã

! µ ¶ N 1 u ˆn 1 X U′i A1 Ui ≤ r¯ ≈ Φ w ˆn + log , N U′i A2 Ui w ˆn w ˆn i=1

w ˆn =

r¯ 6= µ,

p N log |D| sgn(¯ r − µ),

" #(N −1)/2 q (2ˆ s tr K2 K3 + tr K2 )2 − 4ˆ s2 tr K22 tr K23 2 u ˆn = sˆ 2N tr K3 , tr2 K2 D = I−2ˆ sA3 , A3 = A1 − r¯A2 , µ = tr A1 / tr A2 , Ki = Ai D−1 , i ∈ {2, 3}, Φ denotes the standard

normal cdf, and, with λi denoting the eigenvalues of A3 , the saddlepoint sˆ solves tr K3 ≡

T X i=1

λi = 0. 1 − 2ˆ s λi

This generalizes the result for the N = 1 case as given in Lieberman (1994) and is of interest in itself, as it is potentially applicable in numerous other modelling contexts. Also, unlike in the N = 1 case, no exact methods exist for evaluating the distribution function, rendering the saddlepoint approximation the only practical means of its computation. Calculation of α ˆ q thus only requires a univariate root search over c in (11). Efficient estimates and approximate confidence intervals for β can be computed as usual from a GLS estimation using α = α ˆ 0.5 . 4.1.2

Mean Adjusted Estimation

We now turn to the construction of the mean adjusted estimator of α in model (7). It is given as the solution to η1 (c) − Ec [η1 (c)] = 0. In order to evaluate £ ¤ Ec η1 (c) = Ec

µ

U′0 AU0 U′0 BU0



− c,

(12)

we make use of the expression for the mean of a ratio of quadratic forms in normal variates given in Sawa (1978). Let PΛP′ be the spectral decomposition of B and set C = P′ AP. Then · ′ ¸ Z ∞X T +1 cj U0 AU0 Q dt, E = ′ 3/2 j=1 (1 + 2λj t) U0 BU0 (1 + 2λk t)1/2 0

(13)

k6=j

where cj and λj denote the j th diagonal element of C and Λ, respectively. The integrand in (13) dies off quickly, so that the integral is straightforward to evaluate numerically.

6

This method of bias correction was employed, in the pure time–series case, by Tanizaki (2000), but who used simulation for the evaluation of the mean function. In view of Equation (6), it is apparent that Ec [η1 (c)] ≡ bαˆ LS (c) is just the bias of the least

squares estimator under a true parameter of c. If the model includes only individual dummies, this bias is asymptotically (Nickell, 1981; Hahn and Kuersteiner, 2002):  ³ ´³ ³ ´´−1 2c 1−cT − 1+c 1 − 1−cT 1 − 1 − , T −1 T (1−c) (1−c)(T −1) T (1−c) lim bαˆ LS (c) = − 1+c , if N → k < ∞, N → 0. N →∞ T T T3

if

N T

→ ∞,

Replacing Ec [η1 (c)] in (12) with its limit as N/T → ∞ and N/T → k < ∞, we obtain the

estimators of Bun and Carree (2005) and Hahn and Kuersteiner (2002), respectively. As such, our estimator can be seen as a version of these estimators valid when both N and T are small, and for arbitrary sets of exogenous regressors.

4.2

Case II: Identical Regressors, Arbitrary Variance

In this section we relax the assumption of homoscedasticity under some additional constraints. In particular, we drop the assumption that σi2 = σ 2 , so that now, for i = 1, . . . , N , the errors of iid

the individual series satisfy uit ∼ N(0, σi2 ), t = 1, . . . , T .

The regressor matrix is assumed to be block diagonal, so that X0 = IN ⊗ X1,0 ,

where X1,0 represents the (T + 1) × k1 individual regressor matrix and k1 ≡ k/N is the number of parameters in β corresponding to each individual. This restricted model encompasses the

standard fixed effect model which includes a dummy regressor for each individual, and the models considered by Phillips and Sul (2003). ¢ ¡ 2 ⊗ I . The model in matrix form is then given by Let Vu = diag σ12 , . . . , σN T Y0 = X0 β + Y0ℓ ,

(14)

ℓ Yℓ = αY−1 + U,

U ∼ N(0, Vu ),

with β = (β ′1 , . . . , β ′N )′ , or, equivalently, Y = αY−1 + Zγ + U, where now Z = IN ⊗ Z1 , Z1 = [X1

U ∼ N(0, Vu ),

(15)

X1,−1 ], X1 and X1,−1 are defined analogously to X1,0

by omitting the observations at time t = 0 and t = T , respectively, γ = (γ ′1 , . . . , γ ′N )′ and γ i = (β ′i , −αβ ′i )′ . If Z1 is singular, it should be replaced by a full-rank matrix spanning the same

column space.

−1/2

Applying the Frisch-Waugh theorem to (15) premultiplied by Vu IT , we arrive at the GLS estimator α ˆ GLS , which solves −1/2

c:

′ V Y−1 u

−1/2

MVu

Y

′ V−1/2 MV−1/2 Y Y−1 u u −1

7

− c = 0,

¢ ¡ −1 ⊗ = diag σ1−1 , . . . , σN

where ³ ´−1 −1 −1/2 M = IN T − Vu−1/2 Z Z′ Vu Z Z′ Vu ¢−1 ′ ¡ Z1 ≡ IN ⊗ M1 . = IN T − IN ⊗ Z1 Z′1 Z1

Due to the simple structure of matrices M and Vu , α ˆ GLS can be equivalently written as the solution of c:

PN

1 ′ i=1 σi2 Yi,−1 M1 Yi PN 1 ′ i=1 σi2 Yi,−1 M1 Y−1,i

− c = 0.

Upon estimating the individual variances, for given c, by σ ˆi2 =

′ Yi,−1 M1 Yi,−1 , tr R′1 M1 R1

where R1 = R1 (c) is as in (34), we obtain the feasible GLS estimator c : η2 (c) ≡

N ′ M1 Yi 1 X Yi,−1 − c = 0. ′ N Yi,−1 M1 Yi,−1

(16)

i=1

As η2 (c) is the average of the individual OLS estimating functions, its independence of β and the σi follows from the corresponding property of the OLS estimating function proven in the appendix. 4.2.1

Quantile Unbiased Estimation

Following through the same steps that led to (35) shows that ! Ã N ¡ ¢ 1 X U′i,0 A1 Ui,0 ˆ ≤ θ(Υ) , Prc η2 (c) ≤ η2 (c, Υ) = Prc N U′i,0 B1 Ui,0 i=1

′ ′ ∗ ∗ ′ ′ where A1 = A1 (c) = 21 [A∗′ 1 + A1 ], A1 = R1 DT −1 M1 DT R1 , B1 = B1 (c) = R1 DT −1 M1 DT −1 R1 , ′ ¡ ¢ P Yi,−1 M1 Yi iid ˆ is the observed feasible GLS estimator. Ui,0 ∼ N 0, IN (T +1) , and θ(Υ) = N1 N i=1 Y ′ M1 Yi,−1 i,−1

This can be evaluated by means of the saddlepoint approximation for the mean of N i.i.d. ratios

of quadratic forms as discussed in Section 4.1.1. 4.2.2

Mean Adjusted Estimation

The mean unbiased estimating equation is given by η2 (c) − Ec [η2 (c)] = 0. It is immediate that

· ′ ¸ £ ¤ U0 A1 U0 Ec η2 (c) = Ec − c, U′0 B1 U0

which can be evaluated by (13) with matrices A1 and B1 replacing A and B, respectively. 8

5

Estimation by Maximum Likelihood

This section is concerned with the estimation of the model by maximum likelihood methods, in both the homoscedastic and heteroscedastic settings. The general idea is to concentrate out all nuisance parameters from the log likelihood, giving rise to the profile log likelihood and profile score functions. The latter takes the form of a sum of ratios of quadratic forms, to which our methodology can be applied.

5.1

Case I: Arbitrary Regressors, Identical Variance

The log likelihood of model (7), after dropping the constant, is given by ℓ(α, β, σ 2 ) = −

1 1 N (T + 1) log σ 2 − log |Σ| − 2 (Y0 − X0 β)′ Σ−1 (Y0 − X0 β) , 2 2 2σ

where Σ = Σ(α) = RR′ = IN ⊗ R1 R′1 with R and R1 = R1 (α) as in (34). The score functions are

1 ℓ˙β (α, β, σ 2 ) = 2 X′0 Σ−1 (Y0 − X0 β) , σ 1 N (T + 1) + 4 (Y0 − X0 β)′ Σ−1 (Y0 − X0 β) , ℓ˙σ2 (α, β, σ 2 ) = − 2σ 2 2σ

and

´ 1 ³ ˙ α + 1 (Y0 − X0 β)′ Σ−1 Σ ˙ α Σ−1 (Y0 − X0 β) , ℓ˙α (α, β, σ 2 ) = − tr Σ−1 Σ 2 2σ 2 ˙ α denotes the elementwise derivative of Σ with respect to α, given by Σ ˙ α = IN ⊗ where Σ ³ ´ ˙ ′ +R ˙ 1 R′ , where R1 R 1 1 

     ˙1= R     

b′

0

0

b′ α + b

0

0

b′ α2 + 2bα

1

0

b′ α3

2α .. .

1 .. .

+

3bα2

.. .

···

···

···

··· .. .

b′ αT + T bαT −1 (T − 1)αT −2 (T − 2)αT −3 · · ·

0 0



 0 0    0 0  , 0 0   .. ..  . .   1 0

¡ ¢−3/2 b′ = α 1 − α2 if α ∈ (−1, 1) and zero if α = 1. It is computationally advantageous to note

−1 −1 −1 that Σ−1 = IN ⊗ R−T 1 R1 , and R1 is a matrix that has top left element b , ones on the rest

of main diagonal, −α on the first subdiagonal, and zeroes everywhere else. Solving for β and σ 2 , we obtain

¢−1 ′ −1 ¡ ˆ X0 Σ Y0 β(α) = X′0 Σ−1 X0

and σ ˆ 2 (α) =

³ ´′ ³ ´ 1 ˆ ˆ Y0 − X0 β(α) Σ−1 Y0 − X0 β(α) , N (T + 1) 9

which can be substituted back into ℓ˙α to yield the profile score function ¯ ℓ˙α (α, β, σ 2 )¯¯

ˆ β(α),ˆ σ2 (α)

´ −1 ˙ −1 ′ ′ 1 ³ ˙ α + N (T + 1) Y0 MΣ Σ Σα Σ MΣ Y0 = − tr Σ−1 Σ 2 2 Y0′ M′Σ Σ−1 MΣ Y0 ´ ³ −1 ˙ −1 ′ ′ ˙ 1 + N (T + 1) Y0 MΣ Σ Σα Σ MΣ Y0 , = −N tr R−1 R 1 2 Y0′ M′Σ Σ−1 MΣ Y0

¢−1 ′ −1 ¡ X0 Σ . with idempotent matrix MΣ ≡ IN (T +1) − X0 X′0 Σ−1 X0 5.1.1

Quantile Unbiased Estimation

Denote by η3 (α) the stochastic part of the profile score function, i.e., let η3 (c) ≡

˙ c Σ−1 MΣ Y0 Y0′ M′Σ Σ−1 Σ , Y0′ M′Σ Σ−1 MΣ Y0

(17)

where for the sake of notational continuity we have changed the argument of η3 (and the matrices involved) from α to c. We require an expression for the probability Prc (η3 (c) ≤ η3 (c, Υ)) , where η3 (c, Υ) is the observed value of the estimating function at c. It is clear from scaling and the fact that MΣ X0 = 0 that η3 (c) has distribution free of (β, σ 2 ), so that ! ¢′ ˙ c Σ−1 MΣ Yℓ Y0ℓ M′Σ Σ−1 Σ 0 ≤ η3 (c, Υ) Prc (η3 (c) ≤ η3 (c, Υ)) = Prc ¡ ℓ ¢′ ′ −1 Y0 MΣ Σ MΣ Y0ℓ à ! ˙ c Σ−1 MΣ RU0 U′0 R′ M′Σ Σ−1 Σ = Pr ≤ η3 (c, Υ) U′0 R′ M′Σ R−T R−1 MΣ RU0 ! à ˙ c R−T MR U0 U′0 MR R−1 Σ ≤ η3 (c, Υ) , = Pr U′0 MR U0 á

where the last equality follows with ¡ ¢−1 ′ −1 R−1 MΣ R = R−1 (IN (T +1) − X0 X′0 Σ−1 X0 X0 Σ )R ¢ ¡ −1 X′0 R−T R−1 )R = R−1 (IN (T +1) − X0 X′0 Σ−1 X0 ¢−1 ′ −T ¡ X0 R ≡ MR , = IN (T +1) − R−1 X0 X′0 Σ−1 X0

with symmetric and idempotent matrix MR so defined. Therefore, the quantile unbiased estimating equation becomes c : Pr

µ

¶ U′0 A(c)U0 ≤ η3 (c, Υ) − q = 0, U′0 B(c)U0

(18)

˙ c R−T MR and B(c) = MR . This lends itself to calculation by means of where A(c) = MR R−1 Σ our proposed saddlepoint approximation. 10

5.1.2

Mean Adjusted Estimation

A mean unbiased estimating function can be constructed by subtracting from (17) its mean under a true parameter of c, given by # " " # ˙ c Σ−1 MΣ Y0 ˙ c R−T MR U0 Y0′ M′Σ Σ−1 Σ U′0 MR R−1 Σ Ec [η3 (c)] = Ec = Ec U′0 MR U0 Y0′ M′Σ Σ−1 MΣ Y0 ³ ´ −1 ˙ 2 tr I ⊗ R R MR −1 −T ˙ 1 N 1 tr MR R Σc R MR = = , tr MR N (T + 1) − k

(19)

where the last line follows because due to its special form, the ratio of quadratic forms appearing in the profile score is independent of its own denominator (Pitman, 1937), whence the mean of the ratio equals the ratio of the means. This bears the advantage that unlike in the least squares based counterpart (13), there are no unsolved integrals in (19), allowing for massive savings in computational time. The mean adjusted estimator therefore solves ³ ´ ˙ tr IN ⊗ R−1 1 R1 MR c : η3 (c) − = 0. N (T + 1) − k

(20)

This is the adjusted profile likelihood estimator of McCullagh and Tibshirani (1990), which, in this case, coincides with the marginal likelihood estimator of Wilson (1989). However, McCullagh and Tibshirani derive (20) only as an approximation, overlooking the fact that the mean of the ratio equals the ratio of the means. If there are no exogenous regressors, i.e., MR = IN (T +1) and k = 0, then the so-defined estimator coincides with the MLE of α.

5.2

Case II: Identical regressors, Arbitrary Variance

For the model with unequal variances and identical regressors, the log likelihood becomes, after dropping the constant, ¢−1 ¢¡ ¡ © ª¢ 1¡ 1 (Y0 − X0 β) Y 0 − X 0 β ′ V u ⊗ Σ1 ℓ α, β, σi2 = − log |Vu ⊗ Σ1 | − 2 2 ¢ ¡ T +1 N 1 =− (Y0 − X0 β) log |Vu | − log |Σ1 | − (Y0 − X0 β)′ Vu−1 ⊗ Σ−1 1 2 2 2 N N X T +1X 1 N ′ −1 2 =− log σi − log |Σ1 | − 2 (Yi,0 − Xi,0 β i ) Σ1 (Yi,0 − Xi,0 β i ) , 2 2 2σ i i=1 i=1

where Σ1 = Σ1 (α) ≡ R1 R′1 , and R1 is as in (34). The score functions are ¡ ¢ 1 ℓ˙βi α, β i , σi2 = 2 X′i,0 Σ−1 1 (Yi,0 − Xi,0 β i ) , σi

and

¢ ¡ 1 (T + 1) + 4 (Yi,0 − Xi,0 β i )′ Σ−1 ℓ˙σ2 α, β, σi2 = − 1 (Yi,0 − Xi,0 β i ) , i 2σi2 2σi

N ´ X ³ ´ ³ ¡ ¢ 1 ′ −T −1 ˙ −1 ˙ℓα α, β, σ 2 = −N tr R ˙ 1 R−1 + (Yi,0 − Xi,0 β i ) , (Y − X β ) R R R R i,0 i,0 1 i i 1 1 1 1 σ2 i=1 i

11

from which

³ ´−1 ¡ ¢ ¡ ¢−1 ˆ (α) = X′ R1 R′ −1 Xi,0 β X′i,0 R1 R′1 Yi,0 i i,0 1

and

´ ´′ ¡ ¢ ³ 1 ³ ˆ (α) . ˆ (α) R1 R′ −1 Yi,0 − Xi,0 β Yi,0 − Xi,0 β i i 1 T +1 Substituting back into ℓ˙α , we obtain the profile score function ³ ´ −T −1 ˙ −1 N Y ′ M′ ³ ´ R R R R MΣ1 Yi,0 ¯ X 1 © ª i,0 1 1 1 Σ1 ˙ 1 + (T + 1) = −N tr R−1 R ℓ˙α (α, β, σi2 )¯¯ , 1 ′ M′ (R R′ )−1 M Y ˆ Yi,0 β(α),ˆ σ2 (α) 1 1 Σ1 i,0 Σ1 i=1 σ ˆi2 (α) =

³ ´−1 −1 −1 with idempotent matrix MΣ1 = IT +1 − Xi,0 X′i,0 R−T R X Xi,0 R−T i,0 1 1 1 R1 . 5.2.1

Quantile Unbiased Estimation

Let

³ ´ −T −1 ˙ −1 N Y ′ M′ R R R R MΣ1 Yi,0 X 1 i,0 1 1 1 Σ1 1 η4 (c) = ′ M′ (R R′ )−1 M Y N Yi,0 1 1 Σ1 i,0 Σ1 i=1

(21)

denote the stochastic part of the profile score function. From scaling and the fact that MΣ1 Xi,0 = ¡ © ª¢ 0, η4 (c) has distribution free of β, σi2 , and we have that ´ ³   −T −1 ˙ −1 ′ M′ MΣ1 Yi,0 Y R R R R 1 1 i,0 1 1 Σ1 1 ≤ η4 (c, Υ) Pr (η4 (c) ≤ η4 (c, Υ)) = Pr  ′ M′ (R R′ )−1 M Y N Yi,0 1 1 Σ1 i,0 Σ1   ´′ ³ −1 ˙ ′ M M U R R U i,0 1 R R 1 1 1  1 i,0  = Pr  ≤ η4 (c, Υ) , ′ N Ui,0 MR1 Ui,0 with symmetric and idempotent matrix

´−1 ³ −T −1 ′ X′i,0 R−T X R R X MR1 = IT +1 − R−1 X i,0 i,0 i,0 1 . 1 1 1 Hence, the quantile unbiased estimator solves à ! N 1 X U′i,0 A(c)Ui,0 c : Pr ≤ η4 (c, Υ) − q = 0, N U′i,0 B(c)Ui,0 i=1

³

˙ where A(c) = MR1 R−1 1 R1 5.2.2

´′

MR1 , B(c) = MR1 .

Mean Adjusted Estimation

Similar to section 5.1.2, the mean of η4 (c) in (21) under a true parameter of c is ³ ´′ ˙ 1 MR tr R−1 R 1 1 , Ec [η4 (c)] = T + 1 − k1 12

so that the mean unbiased estimating equation can be written ³ ´′ ˙ 1 MR tr R−1 R 1 1 c : η4 (c, Υ) − = 0. T + 1 − k1 This is a panel version of the adjusted profile likelihood estimator of McCullagh and Tibshirani (1990), and also the marginal likelihood estimator of Wilson (1989). As before, if there are no exogenous regressors, then the estimator coincides with the MLE of α.

6

Numerical Results

6.1

Point Estimation

In order to exemplify the virtues of our proposed point estimators, a simulation study was conducted with 1, 000 samples from a model with individual dummies and time trends, and standard normal innovations. The sample sizes used included all possible combinations of N ∈ {10, 25, 50}

and T + 1 ∈ {10, 25, 50}. For all sample sizes under investigation, the relative performances of the

estimators were extremely similar; as such, we only report the results for N = 50 and T + 1 = 10, a somewhat typical setup when working with panel data. The results for other configurations are available from the authors upon request. In Figure 1, solid lines represent the MLE and its median unbiased counterpart, whereas dashed lines refer to the OLS–based estimators. In an attempt to unclutter the graphs, the results for the mean adjusted estimators are not shown: their performance generally was very close to that of the median unbiased ML-based estimator. Regarding the plain ML and OLS estimators, the top row of Figure 1 holds little surprise: they are asymptotically (in T ) equivalent, and asymptotically (in N ) biased. This is to be contrasted with the median unbiased point estimators, both of which exhibit a small mean bias only near α = 1, and are virtually median unbiased. This naturally has dramatic consequences for the root mean squared error (RMSE), shown in the rightmost column. In terms of this latter measure, the ML-based median unbiased estimator has a slight edge over the OLS–based one, while both improve on the RMSE of the standard estimators by a factor of up to 6. The bottom row of the same figure depicts the results for the model with unequal individual variances. As the theory suggests, the ML-based median unbiased estimator maintains its outstanding performance over the ML and OLS estimators in terms of both bias and RMSE. The OLS–based median-”unbiased” estimator shows a somewhat different picture: while still being far ahead of the plain MLE and OLSE, it is no longer (approximately exactly) median unbiased, especially for values of α close to the stationarity border; this must be attributed to the loss of accuracy of the associated saddlepoint approximation, as explained in the appendix. The (not shown) OLS–based mean adjusted estimator, on the other hand, does not suffer from this deficiency, and still performs close to the ML-based variant. 13

As a final remark, a comparison of the top and bottom rows of Figure 1 shows that the MLbased median unbiased estimator suffers almost no loss in accuracy as the assumption of equal individual variances is dropped. Mean Bias

Median Bias

Equal Variances

0.1

RMSE

0.1

0.7

0.6

0

0

−0.1

−0.1

−0.2

−0.2

−0.3

−0.3

−0.4

−0.4

−0.5

−0.5

0.5

0.4

0.3

0.2

−0.6

0.1

−0.6 MLE OLS

Unequal Variances

−0.7 −1

−0.5

0

0.5

1

−0.7 −1

−0.5

0

0.5

1

0 −1

0.1

0.8

0

0

0.7

−0.1

−0.1

0.6

−0.2

−0.2

−0.3

−0.3

−0.4

−0.4

−0.5

−0.5

−0.6

−0.6

−0.7

−0.7

−0.8 −1

−0.8 −1

0.1

−0.5

0

0.5

1

−0.5

0

0.5

1

0.5 0.4 0.3

−0.5

0

0.5

1

0.2 0.1

−0.5

0

0.5

1

0 −1

Figure 1: Bias and Root Mean Squared Error of MLE (solid) and OLS (dashes) estimators and their median unbiased counterparts, for a model with N = 50, T + 1 = 10, xit = [1 t].

6.2

Interval Estimation

This section investigates the quality of the ML and OLS–based quantile unbiased interval estimates. For all combinations of N ∈ {10, 25, 50} and T + 1 ∈ {10, 25, 50}, and using the same

samples as in Section 6.1, 1,000 90% equal-tail intervals for α were computed based on (5) with q = (1 ± 0.9)/2. Figure 2 contains the results for the homoscedastic case, with the average length

of the intervals being shown on the left scale, and the empirical coverage on the right scale. The

dotted lines represent 5% critical values of a two-sided binomial test of the null hypothesis that the true coverage of each intervals equals its nominal value, i.e., 90%. Similar to the results for the point estimators in the homoscedastic case, the results for the OLS (dashes) and MLE (solid) based intervals do not differ much, with empirical coverages lying well within the critical values for all sample sizes shown. The differences between estimators are substantially more pronounced for the heteroscedastic case. Figure 3 contains the results. While for the ML-based confidence intervals, the null hypothesis that the true coverage equals the nominal value is again well supported, this is not true for the OLS–based intervals, even though the latter tend to be considerably longer on average. As before, this must be attributed to the decreased accuracy of the associated saddlepoint approximation. In particular, the accuracy of the approximation deteriorates as i) N increases, 14

ii) T decreases, and iii) α approaches the stationarity border. This is exactly reflected in the performance of the OLS–based interval estimates. The comments made in the previous section regarding the relative performance of the MLbased estimators in the homoscedastic and heteroscedastic cases remain valid in the context of interval estimation. Considering that in many, if not most, empirical applications, the assumption of equal individual variances is questionable at best, this behavior is fortuitous, as it will allow researchers to abandon this restriction altogether. Equal Individual Variances T+1=10

T+1=25

2

0.95

T+1=50

0.6

0.95

0.4

0.95

0.90

0.3

0.90

0.85

0.2

0.85

0.5

N=10

0.90 0.4

1

0.85

0.3 0.2

0.1 0.1

0 −1

−0.5

0

0.5

N=25

1

0.5

1

0 −1

−0.5

0

0.5

1

−0.5

0

0.5

−0.5

0

0.5

1

0.95

0.4

0.95

0.2

0.95

0.90

0.3

0.90

0.15

0.90

0.85

0.2

0.85

0.1

0.85

0.1

0 −1

0 −1

1

0.8

0.95

0.6

0.90

0 −1

0.05

−0.5

0

0.5

0.3

1 0.95

0 −1

−0.5

0

0.5

0.2

1

0.95

N=50

0.25

0.90

0.90 0.2

0.4

0.85

0.15

0.85

0.1

0.85

0.1

0.2 0.05

0 −1

−0.5

0

0.5

1

0 −1

−0.5

0

0.5

1

0 −1

−0.5

0

0.5

1

Figure 2: Mean length (left scale) and empirical coverage (right scale) of OLS–based (dashes) and ML-based (solid) δ = 90% equal tails confidence intervals. The dotted lines represent 5% critical values of a two sided test of H0 : δ = 0.9.

15

Unequal Individual Variances T+1=10

T+1=25

2

0.95

T+1=50

0.6

0.95

0.4

0.95

0.90

0.3

0.90

0.85

0.2

0.85

0.5

N=10

0.90 0.4

1

0.85

0.3 0.2

0.1 0.1

0 −1

−0.5

0

0.5

1.2

1 0.95

0 −1

−0.5

0

0.5

0.5

1 0.95

0 −1

−0.5

0

0.5

0.4

1 0.95

N=25

1 0.90

0.90

0.85

0.85

0.90

0.8 0.6

0.2

0.85

0.4 0.2 0 −1

−0.5

0

0.5

N=50

1

0.5

1

0 −1

−0.5

0

0.5

1

−0.5

0

0.5

−0.5

0

0.5

1

0.95

0.4

0.95

0.2

0.95

0.90

0.3

0.90

0.15

0.90

0.85

0.2

0.85

0.1

0.85

0.1

0 −1

0 −1

1

0 −1

0.05

−0.5

0

0.5

1

0 −1

−0.5

0

0.5

1

Figure 3: Mean length (left scale) and empirical coverage (right scale) of OLS–based (dashes) and ML-based (solid) τ = 90% equal tails confidence intervals. The dotted lines represent 5% critical values of a two sided test of H0 : τ = 0.9.

16

7

Conclusions

This paper developed a general method for median unbiased estimation and the construction of exact confidence intervals. The two cornerstones of our approach are the notion of a quantile unbiased estimating function, and a saddlepoint approximation to the tail probabilities of the original estimating function. The method was demonstrated to be capable of extremely accurate inference in the first–order dynamic panel model, without having to rely on any asymptotic arguments. As regards point estimation, not only does it successfully tackle the problem of asymptotic bias, but essentially removes the bias, even for samples as small as N = 10, T +1 = 10. Furthermore, the method was shown to produce valid confidence intervals regardless of sample size. We have demonstrated how the estimator of Phillips and Sul (2003) can be embedded in our methodology. In the case with equal individual variances, the results from their least–squares based approach are very similar to our likelihood-based method. However, in the setting with unequal individual variances, this is no longer the case, at least if the time–consuming simulations are to be replaced with our saddlepoint approximation. We consider it an important feature of our likelihood–based method that the properties of both the point and interval estimates are virtually unaltered as the assumption of equal individual variances is dropped, as it allows us to recommend that this restriction be abandoned wherever it is maintained for the sole sake of feasibility. The methods developed herein are not restricted to the first–order DPD model. The concept of quantile unbiased estimation equations is quite general, and especially useful in linear models with a covariance matrix depending on one parameter. The saddlepoint approximation to the mean of ratios of quadratic forms has other applications as well, certainly in panel model contexts. We leave those for future research.

17

Appendices A

A Saddlepoint Approximation for the Mean of i.i.d. Ratios of Quadratic Forms

This section develops saddlepoint approximations for the density and distribution function of the mean of N i.i.d. quadratic forms in standard normal variates. Saddlepoint approximations typically require that the cumulant generating function (cgf) KX (s) of the random variable X in question be available in a serviceable form. In such cases, the approximations extend straightforwardly to the mean of N such random variables, the reason being that the cgf of a sum of N i.i.d. copies of X is simply N KX (s). The random variables to be dealt with here do not, however, permit a closed form for their cgf, and thus require custom treatment. A saddlepoint approximation for the density of R≡

X′ A1 X , X′ A2 X

A2 ≥ 0,

X ∼ N(0, σ 2 IT ),

(22)

A1 , A2 symmetric, is given by (Lieberman, 1994)

where w ˆ=

φ(w) ˆ tr K2 fˆ1 (r) = p , 2 tr K23

(23)

p log |D| sgn(r − µ), D = I − 2ˆ sA3 , A3 = A1 − rA2 , µ = tr A1 / tr A2 , Ki = Ai D−1 ,

i ∈ {2, 3}, φ denotes the standard normal pdf, and, with λi denoting the eigenvalues of A3 , the

saddlepoint sˆ solves

tr K3 =

T X i=1

λi = 0. 1 − 2ˆ s λi

(24)

¯ of N identical and independent copies of Interest centers on the distribution of the mean R R. For ease of exposition, we first consider the random variable S=

N X

Ri ,

i=1

where each of the Ri is identically and independently distributed as (22). Tierney et al. (1989a) show that for an N -dimensional random vector Y having density proportional to f (y) = b(y) exp (−H(y)) , a saddlepoint approximation to the marginal density of a (sufficiently smooth) scalar function x = g(y) is given by · ¸1/2 1 |H ′′ (ˆ y)| f (ˆ yx ) ˆ f (x) = √ , ′′ ′ ′′ −1 yx )| ∇g(ˆ yx ) H (ˆ yx ) ∇g(ˆ yx ) f (ˆ y) 2π |H (ˆ 18

(25)

ˆ is a local minimizer of H, and y ˆ x minimizes H subject to where H ′′ denotes the Hessian of H, y g(y) = x. In the i.i.d. case, the joint density takes the form # Ã ! " Y X Y b1 (yi ) exp − h(yi ) , f (y) = f1 (yi ) = i

i

i

and the Hessian becomes H ′′ (y) = diag(h′′ (y1 ), . . . , h′′ (yN )). If g(y) =

P

i yi ,

ˆx = then ∇g(y) = (1, . . . , 1), and, under certain regularity conditions, y

ˆ = (ˆ (x/N, . . . , x/N ). Furthermore, y y1 , . . . , yˆ1 ), and we can write

· ¸1/2 1 f1 (x/N )N h′′ (ˆ y1 )N ˆ f (x) = √ . f1 (ˆ y1 )N 2π h′′ (x/N )N −1 N Taking f1 (yi ) ≡ fˆ1 (yi ), i.e., the saddlepoint density given in (23), we have h(yi ) =

1 2

log |D|. Note that matrix D, as well as A3 , Ki , and quantities sˆ and w, ˆ depend on the argument of fˆ1 ; however, in a slight abuse of notation, we shall not make this explicit. We then have h′ (yi ) =

∂ˆ s 1 tr D−1 (−2 A3 + 2ˆ sA2 ) = sˆ tr K2 2 ∂yi

(26)

and ∂ˆ s ∂ˆ s tr K2 − sˆ tr A2 D−1 (−2 A3 + 2ˆ sA2 )D−1 ∂yi ∂yi ∂ˆ s ∂ˆ s tr K2 + 2ˆ s tr K2 K3 − 2ˆ s2 tr K22 . = ∂yi ∂yi

h′′ (yi ) =

Differentiating (24), ∂ˆ s = (2ˆ s tr K2 K3 + tr K2 ) /(2 tr K23 ), ∂yi and, plugging in, ′′

h (yi ) = 2

µ

∂ˆ s ∂yi

¶2

tr K23 − 2ˆ s2 tr K22 .

For the case at hand, yˆ1 = µ, where sˆ = 0, w ˆ = 0, K2 = A2 , and K3 = A1 − µA2 , so that tr A2 1 fˆ1 (ˆ y1 ) = √ r ³ ´. 2π 2 2 tr (A1 − µA2 ) Also,

and we obtain

³ ³ ´´ h′′ (ˆ y1 ) = tr A2 )2 /(2 tr (A1 − µA2 )2 , fˆ(s) =

·

(2π)N −1 h′′ (s/N )N −1 N

19

¸1/2

fˆ1 (s/N )N

(27)

¯ = S/N , a univariate transformation shows that for the sum of ratios. Similarly, for the mean R ¸1/2 · N (2π)N −1 fˆ1 (¯ r)N fˆ(¯ r) = h′′ (¯ r)N −1 · ¸1/2 N (2π)N −1 φ(w) ˆ N trN K2 = h′′ (¯ r)N −1 (2 tr K23 )N/2 # " #(N −1)/2 " r tr2 K2 N 2 tr K2 N p = exp − w ˆ . 2 2 2 2 2π 2 (2ˆ s tr K2 K3 + tr K2 ) − 4ˆ s2 tr K2 tr K3 2 tr K3 The approximate cdf can be written #" #(N −1)/2 Z x r " 2K tr N tr K N 2 2 2 p ˆ d¯ r Fˆ (x) = exp − w 2 2 2 2 2 2π 2 (2ˆ s tr K2 K3 + tr K2 ) − 4ˆ s tr K2 tr K3 2 tr K3 −∞ r Z ˆ N w(x) N 2 = ˆ )dw, ˆ (28) q(w) ˆ exp(− w 2π −∞ 2 where "

tr K2 q(w) ˆ = p 2 tr K23

#"

tr2 K2 (2ˆ s tr K2 K3 + tr K2 )2 − 4ˆ s2 tr K22 tr K23

#(N −1)/2

d¯ r . dw ˆ

Temme (1982) shows that the value of an integral of the form (28) is approximately Fˆ (x) ≈ q(0)Φ

³√

´ q(w(x)) ´ ˆ − q(0) ³√ √ N w(x) ˆ ˆ φ N w(x) − , N w(x) ˆ

where Φ is the standard normal cdf. It is easily seen that "

w(x) ˆ q(w(x)) ˆ = p sˆ 2 tr K23

#"

=

sˆ tr K2 w ˆ ,

so that

tr2 K2 (2ˆ s tr K2 K3 + tr K2 )2 − 4ˆ s2 tr K22 tr K23

and by applying l’Hˆ opital’s rule twice to the quantity w ˆn = and

dw ˆ d¯ r

p

sˆ tr K2 w ˆ ,

#(N −1)/2

,

we find q(0) = 1. By defining

N log |D| sgn(¯ r − µ)

#(N −1)/2 " q 2 2 tr K2 tr K2 w ˆn (2ˆ s tr K K + tr K ) − 4ˆ s 2 3 2 2 3 = sˆ 2N tr K23 u ˆn ≡ , q(w) ˆ tr2 K2

and writing r¯ instead of x in the final result, the approximation can be expressed in the more familiar form

³ 1 1 ´ Fˆ (¯ r) = Φ(w ˆn ) + φ(w ˆn ) − . w ˆn u ˆn However, for the case at hand, an alternative form of the approximate cdf due to Barndorff-Nielsen

(1986, 1990) turns out to be more reliable: it is given by Fˆ ∗ (¯ r) = Φ (wn∗ ) ,

wn∗ ≡ w ˆn + 20

u ˆn 1 log , w ˆn w ˆn

(29)

and is the approximation used throughout the paper. Note that, at r¯ = µ, u ˆn = w ˆn = 0, so that (29) is not meaningful, and wn∗ should be replaced by its limit, which is given by " # s 3 tr A tr A A 2 3,µ 2 3,µ + (N − 1) . lim w∗ = r¯→µ n tr A2 N tr A23,µ 3 tr A23,µ This can be shown as follows. Since lim

r¯→µ

µ

1 u ˆn log w ˆ w ˆn

w ˆn u ˆn



(30)

= q(w) ˆ and q (w(µ)) ˆ = q(0) = 1, − log q(w ˆn ) = lim r¯→µ w ˆn

¶ µ 0 . = 0

Using l’Hˆ opital’s rule, µ ¶ ¶ µ u ˆn 0 u ˆ′n w ˆn − u ˆn w ˆn′ u ˆ′n /ˆ un − w ˆn′ /w ˆn 1 log = lim = = lim lim r¯→µ r¯→µ r¯→µ w ˆn w ˆn w ˆn′ u ˆn w ˆn w ˆn′ 0 µ ¶ ′′ ′′ ′ u ˆn w ˆn − u ˆn w ˆn 0 lH = lim ′ = r¯→µ u ˆn w ˆn w ˆn′ + u ˆn (w ˆn′ )2 + u ˆn w ˆn w ˆn′′ 0 u ˆ′′′ ˆn + u ˆ′′n w ˆn′ − u ˆ′n w ˆn′′ − u ˆn w ˆn′′′ l′ H nw = lim r¯→µ u ˆn w ˆn w ˆn′′′ + 3ˆ un w ˆn′ w ˆn′′ + 2ˆ u′n w ˆn w ˆn′′ + 2ˆ u′n (w ˆn′ )2 + u ˆ′′n w ˆn w ˆn′ u ˆ′′ w ˆ′ − u ˆ′ w ˆ ′′ = lim n n′ ′ n2 n , r¯→µ 2ˆ un (w ˆn ) or, as 1 = limr¯→µ

′ u ˆn l H w ˆn =

limr¯→µ

u ˆ′n ′ , w ˆn

lim

r¯→µ

µ

1 u ˆn log w ˆn w ˆn



u ˆ′′n − w ˆn′′ , r¯→µ 2(ˆ u′n )2

= lim

evaluating which requires an expression for limr¯→µ u ˆ′n , limr¯→µ u ˆ′′n and limr¯→µ w ˆn′′ . Straightforwardly, u ˆ′n = sˆ′ (∗) + sˆ (∗)′ , where

#(N −1)/2 " q 2 2 tr K2 tr K2 (2ˆ s tr K K + tr K ) − 4ˆ s 2 3 2 2 3 . (∗) = 2N tr K23 tr2 K2

It easy to see that limr¯→µ K3 = A3,µ , where A3,µ = A1 − µA2 . Thus, q s′ ) 2N tr A23,µ . lim u ˆ′n = lim (ˆ r¯→µ

r¯→µ

From (27) and using limr¯→µ K2 = A2 , 2ˆ s tr K2 K3 + tr K2 2 tr K23 tr A2 , = 2 tr A23,µ

lim sˆ′ = lim

r¯→µ

r¯→µ

so that lim u ˆ′n = tr A2

r¯→µ

21

s

N . 2 tr A23,µ

It easily follows from (26) that w ˆn′ = and thus w ˆn′′ =

N sˆ tr K2 , w ˆn

˙ 2w N sˆ′ tr K2 w ˆn + N sˆ tr K ˆn − N sˆ tr K2 w ˆn′ , w ˆn2

where a matrix with a dot denotes the elementwise derivative. In the limit, µ ¶ ˙ 2w N sˆ′ tr K2 w ˆn + N sˆ tr K ˆn − N sˆ tr K2 w ˆn′ 0 lim w ˆn′′ = lim = r¯→µ r¯→µ w ˆn2 0 ˙ 2 − (w N sˆ′ tr K2 + N sˆ tr K ˆn′ )2 = lim r¯→µ w ˆn ′′ ˙ 2 + N sˆ tr K ¨ 2 − 2w N sˆ tr K2 + 2N sˆ′ tr K ˆn′ w ˆn′′ l′ H = lim r¯→µ w ˆn′ ˙2 N sˆ′′ tr K2 + 2N sˆ′ tr K = lim − 2 lim w ˆ ′′ r¯→µ r¯→µ n w ˆn′ ˙2 N sˆ′′ tr K2 + 2N sˆ′ tr K . ⇒ lim w ˆn′′ = lim r¯→µ r¯→µ 3w ˆn′ ˙ 2 = 2ˆ ˙2= where K s′ K2 K3 − 2ˆ sK2 , and therefore limr¯→µ K

tr A2 A A . tr A23,µ 2 3,µ

Differentiating (27), sˆ′′ =

˙ 2 K3 + tr K2 K ˙ 2 ) tr K2 + 2 tr K ˙ 2 tr K2 4ˆ s′ tr K2 K3 tr K23 + 4ˆ s(tr K 3 3 3 2 2 4(tr K3 ) ˙ 3 + 4 tr K2 tr K3 K ˙3 8ˆ s tr K2 K3 tr K3 K , − 2 (4 tr K3 )2

˙ 3 = −K3 K2 + 2ˆ and, using K3 K s′ K33 − 2ˆ sK23 K2 , sˆ′′ =

˙ 2 K3 + tr K2 K ˙ 2 ) tr K2 + 4ˆ 4ˆ s′ tr K2 K3 tr K23 + 4ˆ s(tr K s′ tr K2 K3 tr K23 − 4ˆ s tr K2 tr K23 3 3 4(tr K23 )2 ˙ 3 − 4 tr K2 tr K3 K2 + 8ˆ 8ˆ s tr K2 K3 tr K3 K s′ tr K2 tr K33 − 8ˆ s tr K2 tr K23 K2 . − 4(tr K23 )2

In the limit, using the limiting expressions for K2 , K3 , and sˆ′ , ′′

lim sˆ =

r¯→µ

=

2 limr¯→µ (ˆ s′ ) tr A2 A3,µ tr A23,µ + tr A2 tr A3,µ A2 − 2 limr¯→µ (ˆ s′ ) tr A2 tr A33,µ (tr A23,µ )2

2 tr A2 tr A3,µ A2 (tr A2 )2 tr A33,µ − . (tr A23,µ )2 (tr A23,µ )3

We now require an expression for limr¯→µ u ˆ′′n . It is immediate that u ˆ′′n = sˆ′′ (∗) + 2ˆ s′ (∗)′ + sˆ (∗)′′ ¡ ¢q tr A2 2N tr A23,µ + ⇒ lim u ˆ′′n = lim sˆ′′ lim (∗)′ , r¯→µ r¯→µ tr A23,µ r¯→µ 22

and straightforward but tedious calculations reveal that " # q ˙3 ˙2 √ tr K3 K tr K ′ 2 lim (∗) = 2N lim p + (N − 1) tr K3 r¯→µ r¯→µ tr K2 tr K23 " # s tr A2 tr A33,µ 2N + (N − 2) tr A2 A3,µ . = tr A23,µ tr A23,µ Plugging in and simplifying yields (30). An important special case occurs if A2 A2 = A2

and A1 = A2 GA2 .

(31)

This constellation appears not only in our ML-based quantile unbiased estimation procedure, but also in the computation of, e.g., the null distribution of the Durbin-Watson statistic. In that case, we have that tr K2 = rank A2 , and u ˆn and limr¯→µ wn∗ simplify to · ¸ q (2ˆ s tr K2 K3 + rankA2 ) (N −1)/2 2 u ˆn = sˆ 2N tr K3 rank A2 and lim w∗ r¯→µ n

=

r

tr A33,µ 2 h i3/2 , N 3 tr A23,µ

respectively. It is further computationally advantageous to note that the nonzero eigenvalues λi of K3 in (24) satisfy λi = ωi − r¯, where ωi are the eigenvalues of A1 . The aforementioned case is also the one where the approximation performs most favorably. This can be attributed to the fact that (25) is based on a Laplace approximation to the marginalizing integral over the joint density, which assumes that the dominant contribution to the integral is from a neighborhood of the maximum of the integrand, and that the location of this maximum is dominated by the exponential term. Now in our application, the maximum of the exponential occurs at µ = tr A1 / tr A2 , where w ˆ = 0. If A1 and A2 satisfy the above conditions, then µ ¯ which, to order N −1 , maximizes the joint density. coincides with the mean of R, The problems arising in other cases (lower accuracy, and the pdf is poorly normalized) could p potentially be remedied by incorporating the term tr K2 / 2 tr K23 into the exponent of the com-

ponent saddlepoint density, i.e.,

¾ ½ 1 1 2 1 2 ˆ f1 (r) = √ exp − w ˆ − log tr K3 + log tr K2 , 2 2 2π

(32)

and proceeding from there. This is what Tierney et al. (1989b) refer to as a fully exponential Laplace approximation. We shall not pursue this here, as, besides leading to less compact formulae, it would entail having to numerically find the mode of (32), where for every evaluation 23

of (32), the saddlepoint equation (24) would have to be solved, thus eroding the time savings otherwise associated with the use of the approximation. In order to exemplify the virtues of our proposed approximation, we consider the mean of N Durbin-Watson statistics. This can be seen as a version of Bhargava et al. (1982)’s test for serial correlation in panels, adapted for the case of unequal individual variances. We need to evaluate ! Ã N 1 X U′ R′1 M1 AM1 R1 U < r¯ , Pr N U′ R′1 M1 R1 U i=1

where the T + 1 × T + 1 matrix A is given by  1 −1  −1 2 −1   −1 2 −1  A=  .. .. . .    −1 



     ,  .. .   2 −1  −1 1

where M1 = I − X (X′ X)−1 X′ , and X is a T + 1 × k1 regressor matrix. Under the null of no

serial correlation, R1 = I, and the statistic is precisely in the form (31). Under the alternative of a serial correlation of α, R1 is as in (34). For a model with T + 1 = 25 and a constant and a time trend as regressors, the simulated and saddlepoint null distributions of the mean of N ∈ {1, 10, 50} independent copies of the Durbin-

Watson statistic depicted in the left panel of Figure 4 are graphically almost indistinguishable.

The right panel of the same figure shows the distribution under the alternative hypothesis that α = 0.95, demonstrating the loss of accuracy incurred when the statistic at hand is not in the form (31).

B

Proofs

α ˆ LS in (10) has distribution free of (β ′ , σ). Neglecting the common factor restrictions on the 2k parameters in γ, computation of (10) requires the N T × 2k matrix Z to have full column rank, which may fail to hold, e.g., if the regressors

include N individual dummies and T time dummies. Thus, if r = rank(Z) < 2k, replace Z by ˜ = QW, Z

where QWV′ is the singular value decomposition of Z, i.e., Q and V are N T × r and 2k × r

matrices, respectively, of full column rank r and W is an r × r diagonal matrix of full rank, such that Z = QWV′ and Q′ Q = V′ V = Ir .1 1

Clearly, only r different parameters in γ are identified.

24

1

1

0.9

0.9

0.8

0.8

0.7

0.7

0.6

0.6

0.5

0.5

0.4

0.4

0.3

0.3

0.2

0.2

0.1

0.1

0 0

0.5

1

1.5

2

2.5

3

3.5

0 0

4

0.5

1

1.5

2

2.5

3

3.5

4

Figure 4: Empirical (solid) and saddlepoint (dotted) cdf of the mean of N ∈ {1, 10, 50} Durbin-Watson statistics under H0 : α = 0 (left panel) and H1 : α = 0.95 (right panel). The empirical cdf was obtained by simulation with 10.000 replications.

˜ = V′ γ, rewrite (9) as Letting γ ˜ γ + U, Y = αY−1 + Z˜

(33)

and the OLS estimator for α can now be obtained as in (10) with M replaced by M = IN T − ˜ Z ˜ ′ Z) ˜ −1 Z ˜ ′ = IN T − QQ′ . Z( To show that α ˆ LS does not depend on the value of β, it suffices to show that the first–step

residuals MY and MY−1 do not depend on β. Since MY = MXβ + MY ℓ and MY−1 = ℓ , this amounts to showing that MX = 0 and MX MX−1 β + MY−1 −1 = 0, because neither ℓ MYℓ nor MY−1 depend on β. Partitioning V′ into the first k and the last k columns V1′

and V2′ , respectively, we obtain Z = [X, X−1 ] = [QWV′1 , QWV′2 ] and, thus, X = QWV′1 and X−1 = QWV′2 . Now, MX = (IN T − QQ′ )QWV′1 = (Q − Q)WV′1 = 0 and MX−1 = (IN T − QQ′ )QWV′2 = (Q − Q)WV′2 = 0, which proves the invariance of α ˆ LS with respect to

β. The invariance of α ˆ LS with respect to σ 2 (and to Yi0 if α = 1) can be proved as outlined by Andrews (1993).

¥

Representation of η1 as a ratio of quadratic forms. W.l.o.g., we can assume β = 0 and σ 2 = 1 by virtue of the above invariance property. Defining the T × (T + 1) selection matrices DT = [0 | IT ] and DT −1 = [IT | 0], we have that MY = M [IN ⊗ DT ] Y0ℓ and MY−1 = M [IN ⊗ DT −1 ] Y0ℓ .

25

With U0 ∼ N (0, IN T +N ), we have that Y0ℓ = [IN  b 0   bα 1   2 α R1 = R1 (α) =   bα  .. ..  . .  bαT

⊗ R1 ] U0 ≡ RU0 , where  0 ··· 0 0  0 ··· 0 0    1 ··· 0 0  , .. ..  .. .. . .  . . 

αT −1 αT −2 · · ·

(34)

α 1

¡ ¢−1/2 b = 1 − α2 if α ∈ (−1, 1) and zero if α = 1. Substituting this into (10) yields η1 (α) = =

(Y0ℓ )′ [IN ⊗ DT −1 ]′ M [IN ⊗ DT ] Y0ℓ −α (Y0ℓ )′ [IN ⊗ DT −1 ]′ M [IN ⊗ DT −1 ] Y0ℓ

U′0 [IN ⊗ DT −1 R1 ]′ M [IN ⊗ DT R1 ] U0 U′0 A∗ U0 − α, − α = U′0 BU0 U′0 [IN ⊗ DT −1 R1 ]′ M [IN ⊗ DT −1 R1 ] U0

where A∗ and B are so defined. Letting A =

1 2

[A∗′ + A∗ ], we can thus write ¡ ¢ Prα η1 (α) ≤ η1 (α, Υ) = Prα

µ

′ MY Y−1 U′0 AU0 ≤ ′ MY U′0 BU0 Y−1 −1



.

(35) ¥

References Ahn, S. C. and P. Schmidt (1995): “Efficient Estimation of Models for Dynamic Panel Data,” Journal of Econometrics, 68, 5–27. 1 Anderson, T. W. and C. Hsiao (1981): “Estimation of Dynamic Models with Error Components,” Journal of the American Statistical Association, 76, 598–606. 1 Andrews, D. W. K. (1993): “Exactly Median-Unbiased Estimation of First Order Autoregressive / Unit Root Models,” Econometrica, 61, 139–65. 2, 3, 4, 5, 25 Arellano, M. (2003): Panel Data Econometrics, Oxford University Press. 1 Arellano, M. and S. Bond (1991): “Some Tests of Specification for Panel Data: Monte Carlo Evidence and an Application to Employment Equations,” Review of Economic Studies, 58, 277–297. 1 Barndorff-Nielsen, O. E. (1986): “Inference on Full and Partial Parameters Based on the Standardized Signed Log Likelihood Ratio,” Biometrika, 73, 307–322. 20 ——— (1990): “Approximate Interval Probabilities,” Journal of the Royal Statistical Society B, 52, 485–496. 20 26

Bhargava, A., L. Franzini, and W. Narendranathan (1982): “Serial Correlation and the Fixed Effects Model,” The Review of Economic Studies, 49, 533–549. 24 Bun, M. J. G. and M. A. Carree (2005): “Bias-Corrected Estimation in Dynamic Panel Data Models,” Journal of Business and Economic Statistics, 23, 200–210. 1, 7 Choi, C.-Y., N. C. Mark, and D. Sul (2004): “Bias Reduction by Recursive Mean Adjustment in Dynamic Panel Data Models,” Economics Working Paper Archive at WUSTL. 1 Hahn, J. and G. Kuersteiner (2002): “Asymptotically Unbiased Inference for a Dynamic Panel Model with Fixed Effects when both n and T are large,” Econometrica, 70, 1639–1657. 1, 7 Hsiao, C., M. H. Pesaran, and A. K. Tahmiscoglu (2002): “Maximum Likelihood Estimation of Fixed Effects Dynamic Panel Data Models Covering Short Time Periods,” Journal of Econometrics, 109, 107–150. 1 Imhof, J. P. (1961): “Computing the Distribution of Quadratic Forms in Normal Variables,” Biometrika, 48, 419–26. 5 Kiviet, J. F. (1995): “On Bias, Inconsistency, and Efficiency of Various Estimators in Dynamic Panel Data Models,” Journal of Econometrics, 68, 53–78. 1 Kruiniger, H. (2002): “On the Estimation of Panel Regression Models with Fixed Effects,” Working Paper, Queen Mary, University of London. 1 Lieberman, O. (1994): “Saddlepoint Approximation for the Distribution of a Ratio of Quadratic Forms in Normal Variables,” Journal of the American Statistical Association, 89, 924–928. 6, 18 MacKinnon, J. G. and A. A. Smith, Jr. (1998): “Approximate Bias Correction in Econometrics,” Journal of Econometrics, 85, 205–30. McCullagh, P. and R. J. Tibshirani (1990): “A Simple Method for the Adjustment of Profile Likelihoods,” Journal of the Royal Statistical Society Series B, 52, 325–44. 2, 4, 11, 13 Moon, H. R. and P. C. B. Phillips (2004): “GMM Estimation of Autoregressive Roots Near Unity with Panel Data,” Econometrica, 72, 467–522. 1 Neyman, J. and E. Scott (1948): “Consistent Estimates Based on Partially Consistent Observations,” Econometrica, 16, 1–32. 1 Nickell, S. (1981): “Biases in Dynamic Models with Fixed Effects,” Econometrica, 49, 1417– 1426. 1, 7

27

Phillips, P. C. B. and D. Sul (2003): “Dynamic Panel Estimation and Homogeneity Testing Under Cross Section Dependence,” Econometrics Journal, 6, 217–259. 1, 3, 5, 7, 17 Pitman, E. J. G. (1937): “The “Closest” Estimates of Statistical Parameters,” Proc. Camb. Phil. Soc., 33, 212–222. 11 Sawa, T. (1978): “The Exact Moments of the Least Squares Estimator for the Autoregressive Model,” Journal of Econometrics, 8, 159–72. Strawderman, R. L. and M. T. Wells (1998): “Approximately Exact Inference for the Common Odds Ratio in Several 2 × 2 Tables,” Journal of the American Statistical Association, 93, 1294–1307. 1

Tanizaki, H. (2000): “Bias Correction of OLSE in the Regression Model with Lagged Dependent Variables,” Journal of Computational Statistics and Data Analysis, 34, 495–511. Temme, N. M. (1982): “The Uniform Asymptotic Expansion of a Class of Integrals Related to Cumulative Distribution Functions,” SIAM Journal on Mathematical Analysis, 13, 239–253. 20 Tierney, L., R. E. Kass, and J. B. Kadane (1989a): “Approximate Marginal Densities of Nonlinear Functions,” Biometrika, 76, 425–433. 18 ——— (1989b): “Fully Exponential Laplace Approximations to Expectations and Variances of Nonpositive Functions,” Journal of the American Statistical Association, 84, 710–716. 23 Wilson, G. T. (1989): “On the use of Marginal Likelihood in Time Series Model Estimation,” Journal of the Royal Statistical Society B, 51, 15–27. 11, 13

28

Lihat lebih banyak...

Comentarios

Copyright © 2017 DATOSPDF Inc.