Nonparametric conditional efficiency measures: asymptotic properties

Share Embed


Descripción

INSTITUT

DE

STATISTIQUE ´ CATHOLIQUE DE LOUVAIN UNIVERSITE

DISCUSSION P

A

P

E

R

0604

NONPARAMETRIC CONDITIONAL EFFICIENCY MEASURES: ASYMPTOTIC PROPERTIES S.-O. JEONG, B.U. PARK and L. SIMAR

http://www.stat.ucl.ac.be

Nonparametric Conditional efficiency measures: asymptotic properties

by

Seok-Oh Jeong Department of Statistics, Hankuk University of Foreign Studies, South Korea.

Byeong U. Park Department of Statistics, Seoul National University, South Korea.

Léopold Simar Institut de statistique, Université catholique de Louvain, Belgium.

E-mail: [email protected]

Phone: +32-10-474308

March 1, 2006

1

Abstract Daraio and Simar (2005a, b) developed a conditional frontier model which incorporates the environmental factors into measuring the efficiency of a production process. They also provided the corresponding nonparametric efficiency measures: conditional FDH estimator, conditional DEA estimator and conditional order- m estimator. The aim of this paper is to provide an asymptotic analysis of the first two estimators. Keywords Frontier model, environmental variable, conditional DEA, conditional FDH, asymptotic distribution

2

conditional

frontier,

Introduction Performance of any production unit is quantified by the efficiency measures, which is of the primary interest in productivity analysis. The efficiency of a producer is usually defined by its distance to the frontier built by the best production scenario. The production scenario is composed by two factors, that is, input factors and output factors. For example, labor and capital are most typical input factors, and profit is an output counterpart. Recently environmental factors are considered at the same time for assessing the performance of a production unit properly. Environmental variables are exogenous factors which are neither inputs nor outputs of a production process, but affect the performance of the production process. For this, several approaches have been developed: see Banker and Morey (1986), Adolphson, Cornia and Walters (1991), Fried, Lovell and Vanden Eeckaut (1993), McCarty and Yaisawarng (1993), Bhattacharyya, Lovell and Sahay (1997), Fried, Schmidt and Yaisawarng (1999), Daraio and Simar (2005a, b). Among them, Daraio and Simar (2005a, b), the most recent one, suggested a fully nonparametric approach for frontier models with environmental variables, which overcomes drawbacks of other approaches. They defined a conditional frontier model and a conditional efficiency measure, and then proposed the corresponding nonparametric estimators such as conditional FDH, conditional DEA and conditional order- m estimators. This paper aims at providing the asymptotic distributions of the first two estimators. The aymptotics of the order- m estimator was analyzed in Cazals, Florens and Simar (2002) and Park, Jeong and Lee (2006). This paper is organized as follows. In section 1 we provide a quick review on the frontier model and the nonparametric estimators such as FDH and DEA which are used for analyzing efficiency in productivity analysis. In section 2 we summarize a probabilistic formulation developed in Daraio and Simar (2005a, b), which is useful for introducing a conditional argument when defining the data generating process of the production process. In section 3, the basic idea and definition of conditional frontier and the corresponding nonparametric estimators are presented. Section 4 is devoted to an asymptotic analysis of the conditional estimators. Finally, section 5 concludes. 1 Frontier Analysis 1.1 The model

3

Suppose that activities of production units are characterized by pairs of inputs x ∈ R+p and outputs y ∈ R+q . The production set Ψ is defined by the set of all those technically feasible pairs of ( x, y ) : Ψ = {( x, y ) ∈ R+p × R+q | x can produce y} .

It is very common in economics to assume Ψ be free disposable, which means that it is always technically feasible to produce less output using more input. Precisely, a set Ψ is said to be free disposable if ( x, y ) ∈ Ψ implies ( x ' , y ' ) ∈ Ψ for any ( x ' , y ' ) such that x' ≥ x and y ' ≤ y . Throughout this paper, inequalities between vectors are to be understood as component-wise. Also, convexity is often assumed for the shape of the production set Ψ , i.e., it is assumed that if ( x, y ) ∈ Ψ and ( x ' , y ' ) ∈ Ψ then

(αx + (1 − α ) x ' , αy + (1 − α ) y ' ) ∈ Ψ for any α ∈ [0,1] . When the output is scalar, we may define a frontier function g (⋅) which forms the roof of the production set Ψ : g ( x ) = sup{ y | ( x, y ) ∈ Ψ} , x ∈ R+p . Then the production set Ψ is characterized by the frontier function in such a way that Ψ = {( x, y ) ∈ R+p × R+1 | y ≤ g ( x )} . Free disposability of Ψ implies that the frontier function g (x) is monotone increasing in x , and convexity of Ψ entails that g is a concave function of x . Since the boundary of Ψ defines the locus of optimal production scenario, one may assess the efficiency of a given level of input and output by measuring its distance to the boundary of Ψ . Particularly when outputs are scalar, one may measure the efficiency by referring the frontier function g since the function g defines the boundary of Ψ . For example, the efficiency of a production unit ( x0 , y0 ) ∈ Ψ ⊂ R+p+1 can be measured

by g ( x0 ) − y0 or g ( x0 ) / y0 . However, when outputs are multiple, we cannot think of such a way to measure the efficiency. In that case it is convenient to define the efficiency scores in a radial way: given a level of input and output ( x0 , y0 ) ∈ Ψ , (1) (2)

θ 0 = θ ( x0 , y0 ) = inf{θ > 0 | (θ x0 , y0 ) ∈ Ψ} λ0 = λ ( x0 , y0 ) = sup{λ > 1| ( x0 , λ y0 ) ∈ Ψ}

Since θ 0 is the proportionate reduction of inputs for a production unit ( x0 , y0 ) to be technically efficient, it is called the input efficiency score. It is always less than or equal to one, and θ 0 = 1 means that no proportionate reduction of inputs is available and ( x0 , y0 ) is efficient in terms of input-orientation. In parallel λ0 is the technically

4

feasible proportionate increase of outputs for ( x0 , y0 ) to be efficient, and it is referred to the output efficiency score. It is always greater than or equal to one, and λ0 = 1 indicates that ( x0 , y0 ) is efficient in terms of output-orientation. Note that θ 0 and λ0 are the reciprocals of Shephard's input and output distance functions, respectively, see Shephard (1970). By construction, both θ 0 x0 and λ0 y0 are laid on the boundary of Ψ . Therefore

θ 0 x0 is the technically efficient input level for producing the output level y0 among the input levels proportional to x0 . Similarly, λ0 y0 is the efficient output level produced by using input level x0 among the output levels proportional to y0 . 1.2 Nonparametric estimation Unfortunately, the production set Ψ is unknown in general. Hence we do not have any reference set for measuring efficiency in such a way as defined in the previous section. Instead, we observe a set of input and output levels performed by given production units: S n = {( X 1 , Y1 ),( X 2 , Y2 ),L , ( X n , Yn )} which can be considered as a random sample drawn from a joint distribution (or density) of ( X , Y ) ∈ R+p+ q supported on a set D . We assume Ψ ≡ D for technical convenience. We are interested in estimating the frontier of the production set Ψ or efficiencies of a given production unit ( x0 , y0 ) based on S n . Let us assume for the data generating process a deterministic frontier model for the identifiability. It means that no noise is allowed while observing S n , which results in P ( Sn ⊂ Ψ ) = 1 . Under this assumption, one is allowed to consider an idea of enveloping S n in order to estimate Ψ . Among the existing methods for doing this, the data envelopment analysis (DEA) and the free disposal hull (FDH) estimators are the most popular nonparametric estimators. Under the free disposability assumption on Ψ , Deprins, Simar and Tulkens (1984) proposed the FDH estimator defined as the minimal free disposable set which contains Sn : n

p+q ˆ Ψ | x ≥ X i , y ≤ Yi } . FDH = U {( x, y ) ∈ R+ i =1

Assuming convexity on Ψ as well as free disposability, the DEA estimator of Ψ is

5

defined as the smallest set containing S n that are convex and free disposable: n

n

p+ q ˆ Ψ | x ≥ ∑ ξi X i , y ≤ ∑ ξiYi for some ξi ≥ 0 , i = 1, 2,L , n DEA = {( x, y ) ∈ R+ i =1

i =1

n

such that ∑ ξi = 1}. i =1

Farrell (1957) is considered as the first empirical study of DEA approach, and Charnes, Cooper and Rhodes (1978) popularized it by adopting a linear programming technique. Using these estimates we can define corresponding efficiency scores of a production unit ( x0 , y0 ) as well, i.e. ˆ }, θˆ( x0 , y0 ) = min{θ > 0 | (θ x0 , y0 ) ∈ Ψ ˆ }. λˆ( x0 , y0 ) = max{λ > 1| ( x0 , λ y0 ) ∈ Ψ ˆ =Ψ ˆ When Ψ FDH , the resulting estimates of efficiency scores are the FDH efficiency ˆ ˆ =Ψ scores. If Ψ DEA , then we get the DEA efficiency scores. Especially, it is easily seen that the FDH efficiency scores are re-expressed in an explicit form:

θˆFDH ( x0 , y0 ) = min max i:Yi ≥ y0 1≤ k ≤ p

X i( k ) , x0( k )

Yi ( k ) i: X i ≤ x0 1≤ k ≤ q y ( k ) 0

λˆFDH ( x0 , y0 ) = max min

where a ( k ) denotes the k -th component of a vector a . The DEA efficiency scores are expressed as n

n

θˆDEA ( x0 , y0 ) = min{θ > 0 | θ x0 ≥ ∑ ξi X i , y0 ≤ ∑ ξiYi for some ξi ≥ 0 i =1

i =1

n

such that ∑ ξi = 1}; i =1

n

n

λˆDEA ( x0 , y0 ) = max{λ ≥ 1| x0 ≥ ∑ ξi X i , λ y0 ≤ ∑ ξiYi for some ξi ≥ 0 i =1

i =1

n

such that ∑ ξi = 1}. i =1

1.3 Statistical inference

6

Statistical inference on these efficiency scores is fully available. Park, Simar and Weiner (2000) showed that the FDH efficiency scores have the Weibull limit distribution. Kneip, Park and Simar (1996) proved the consistency of DEA efficiency scores in a quite general setup and obtained its rate of convergence. Gijbels et al. (1999) derived the explicit form of the limit distribution of the DEA estimator when the input and output variables are all scalar. Methods for approximating the sampling distribution of the DEA estimator in a general multidimensional setup were investigated by Kneip, Simar and Wilson (2003), Jeong (2004), Jeong and Park (2006). Jeong and Simar (2006) proposed a hybrid version of FDH and DEA, say LFDH, which is defined by interpolating the vertices of FDH. For a general review of statistical inference on nonparametric frontier models, see Simar and Wilson (2000), Park, Jeong and Lee (2006).

2 Probabilistic formulation of frontier models In section 1.2, we pointed out that the production set Ψ can be identified by the support of the density of ( X , Y ) . Precisely, Ψ = {( x, y ) ∈ R+p + q | f XY ( x, y ) > 0} where f XY is the joint density of ( X , Y ) . Define a probability function H XY by

H XY ( x, y ) = P( X ≤ x, Y ≥ y ) . Then, we may also assume the identity

Ψ = {( x, y ) ∈ R+p + q | H XY ( x, y ) > 0} , which implies free disposability of Ψ . If the conditional probability

H X |Y ( x | y ) = P( X ≤ x | Y ≥ y ) exists, we may consider the following decomposition:

H XY ( x, y ) = H X |Y ( x | y )S ( y ) , where SY denotes the survival function of Y , i.e. SY ( y ) = P (Y ≥ y ) . Likewise, conditioning on X , we have H XY ( x, y ) = H Y | X ( y | x ) FX ( x)

if H Y | X ( y | x ) = P (Y ≥ y | X ≤ x) exists, where FX is the distribution function of X ,

7

i.e., FX ( x) = P ( X ≤ x ) . Now suppose the following identities for Ψ hold: Ψ = {( x, y ) ∈ R+p + q | H X |Y ( x | y ) > 0} = {( x, y ) ∈ R+p+ q | H Y | X ( y | x) > 0} . Then, together with (1) and (2), given a production unit ( x0 , y0 ) we have (3)

θ ( x0, y0 ) = inf{θ > 0 | H X |Y (θ x0 | y0 ) > 0} ;

(4)

λ ( x0, y0 ) = sup{λ > 0 | H Y | X (λ y0 | x0 ) > 0} .

Interestingly, replacing H X |Y and H Y | X by their corresponding empirical versions n

(5)

Hˆ X |Y ( x | y ) =

∑ I(X i =1

i

n

∑ I (Y ≥ y) n

(6)

∑ I(X i =1

;

i

i =1

Hˆ Y | X ( y | x ) =

≤ x, Yi ≥ y )

i

≤ x, Yi ≥ y )

n

∑ I(X i =1

i

≤ x)

in (3) and (4), we obtain the FDH efficiency scores θˆFDH ( x0 , y0 ) and λˆFDH ( x0 , y0 ) .

3 Conditional frontier model 3.1 Introduction While comparing production units by assessing their efficiency measures, we are to have in mind environmental factors that might cause the difference in efficiency. Such environmental factors affect the production process indeed, but they are not under the control of production managers. Hence understanding how those environmental factors make the difference in efficiency is quite important for productivity analysis. For a detailed discussion on this topic, see the references cited in Introduction. In this section we introduce a conditional approach suggested by Daraio and Simar (2005a, b). 3.2 The model

8

For brevity we confine attention to the input-orientation case from now on. The output-orientation case can be treated in a very similar way. Extending the probabilistic formulation in section 2, Daraio and Simar (2005a, b) considered a general model that involves an environmental variable. Let Z ∈ R r denote the environmental variable. The basic idea of Daraio and Simar (2005a, b) is that, when the environmental variable takes the value of Z = z0 , the conditional distribution of ( X , Y ) given Z = z0 still defines the data generating process which takes into account the exogenous environment represented by Z . Given Z = z0 , let Ψ z0 be the conditional production set, i.e. (7)

{

}

Ψ z0 = ( x, y ) ∈ R+p + q | f XY |Z ( x, y | z0 ) > 0

where f XY |Z ( x, y | z ) denotes the conditional density of ( X , Y ) given Z = z . Putting H X |YZ ( x | y, z ) = P( X ≤ x | Y ≥ y, Z = z ) , we assume the following identity as in the previous section: (8)

{

}

Ψ z0 = ( x, y ) ∈ R+p + q | H X |YZ ( x | y , z0 ) > 0 .

Let ( x0 , y0 , z0 ) be a triple of input, output and environmental factor levels of a production unit. As in (3) and (4), the conditional efficiency score of a production unit ( x0 , y0 , z0 ) is defined by (9)

θ ( x0 , y0 | z0 ) = inf {θ > 0 | (θ x0 , y0 ) ∈ Ψ z } = inf {θ > 0 | H X |YZ (θ x0 | y0 , z0 ) > 0} . 0

3.3 Nonparametric estimation With slight abuse of notation, let

Sn

denote a set of i.i.d. copies of

( X , Y , Z ) ∈ R+p × R+q × R r : S n = {( X 1 , Y1 , Z1 ), ( X 2 , Y2 , Z 2 ),L , ( X n , Yn , Z n )} . Given a production unit ( x0 , y0 , z0 ) , we are to estimate the conditional efficiency score

θ ( x0 , y0 | z0 ) using S n .

9

Given h > 0 such that h → 0 and nh r → ∞ as n → ∞ , let I ( z0 , h) be the set of

{

}

indices defined by I ( z0 , h) = i Zi − z0 ≤ h / 2 , where a is the value of the norm for a vector a . Then we have an empirical version of H X |YZ (⋅ | ⋅, ⋅) given by n

(10)

Hˆ X |YZ ( x | y, z ) =

∑I (X i =1

i

≤ x, Yi ≥ y, Zi − z ≤ h / 2 )

n

∑ I (Y ≥ y, Z i =1

i

i

− z ≤ h / 2)

∑ I(X

i∈I ( z0 , h )

=

i

≤ x, Yi ≥ y ) .

∑ I (Y ≥ y )

i∈I ( z0 , h )

i

The conditional FDH estimator is then obtained by plugging this empirical version of

H X |YZ (⋅ | ⋅, ⋅) into (9):

(11)



θˆFDH ( x0 , y0 | z0 ) = min max 1≤ k ≤ p

 X i( k ) Yi ≥ y0 , i ∈ I ( z0 , h)  . (k ) x0 

This is a version of the FDH estimator obtained only by the points taking its Z values in the neighborhood of z0 . Along this line, the conditional DEA estimator is given by



θˆDEA ( x0 , y0 | z0 ) = min θ > 0 | θ x0 ≥ 

(12)



i∈I ( z0 ,h )

ξi X i , y0 ≤



such that

i∈I ( z0 ,h )



i∈I ( z0 , h )

ξiYi for some ξi ≥ 0



ξi = 1 . 

4 Statistical analysis of conditional FDH and DEA estimators Rigorously speaking, the conditional FDH and DEA estimators in (11) and (12) do not target θ ( x0 , y0 | z0 ) in (9), but

θ h ( x0 , y0 | z0 ) = inf {θ > 0 | (θ x0 , y0 ) ∈ Ψ hz } 0

where

{

} {

}

h p+q Ψ hz0 = ( x, y ) ∈ R+p + q | f XY | H Xh |YZ ( x | y, z0 ) > 0 , | Z ( x, y | z0 ) > 0 = ( x, y ) ∈ R+

h f XY Z − z ≤ h / 2 , and |Z (⋅, ⋅ | z ) is the conditional density of ( X , Y ) given that

10

H Xh |YZ ( x | y, z ) = P ( X ≤ x | Y ≥ y , Z − z ≤ h / 2 ) . Hence, we need the following conditions for a proper statistical analysis of the conditional FDH and DEA estimators.

Assumption 1F Both Ψ z0 and Ψ hz0 are free disposable.

Assumption 1D Both Ψ z0 and Ψ hz0 are free disposable and convex in R+p + q . Assumption 2 As n → ∞ it holds that

( (

)

 o (nh r ) −1/( p + q )  θ ( x0 , y0 | z0 ) − θ ( x0 , y0 | z0 ) =  r −2 /( p + q +1) o (nh ) h

for conditional FDH ;

)

for conditional DEA.

Note that free disposability of Ψ z0 and Ψ hz0 in Assumption 1F and 1D is a direct consequence of the monotonicity of H X |YZ and H Xh |YZ , respectively. Since the cardinality of I ( z0 , h) is proportional to nh r , we may expect that the convergence rate of the conditional FDH and DEA estimator are (nh r )−1/( p+ q ) and (nh r )−2 /( p+ q+1) , respectively. By virtue of Assumption 2, when we investigate the sampling distribution of the conditional FDH and DEA estimator, we only need to consider the limit behavior of the deviations

{

}

(nh r )1/( p + q ) θˆFDH ( x0 , y0 | z0 ) − θ h ( x0 , y0 | z0 ) and

{

}

(nh r )2 /( p + q +1) θˆDEA ( x0 , y0 | z0 ) − θ h ( x0 , y0 | z0 ) instead of

{

}

(nh r )1/( p + q ) θˆFDH ( x0 , y0 | z0 ) − θ ( x0 , y0 | z0 ) and

11

{

}

(nh r )2 /( p+ q +1) θˆDEA ( x0 , y0 | z0 ) − θ ( x0 , y0 | z0 ) , respectively. In order to make the conditional FDH and DEA well-defined, it should be guaranteed that {( X i , Yi , Z i ) | i ∈ I ( z0 , h)} is, of course, not empty. Moreover, for proper asymptotic analysis, we need sufficiently many Z i around z0 , which is endorsed by the following condition: Z has a continuous marginal density f Z such that f Z ( z0 ) is

Assumption 3

bounded away from zero. Given any finite integer C ≥ 0 , let En,C be the event that the

Proposition 1

cardinality of I ( z0 , h) is less than or equal to C . Then, under Assumption 3, P ( En ,C )

tends to zero as n → ∞ . Proof. C

P ( En ,C ) = ∑ P (The cardinality of I ( z0 , h) = k ) k =0

{

C

}

k n! P ( Z − z0 ≤ h / 2 ) 1 − P ( Z − z0 ≤ h / 2 ) k = 0 k !( n − k ) !

=∑

{

}

≤ M ⋅ n C 1 − P ( Z − z0 ≤ h / 2 )

{

n

n− k

for a constant M > 0

}

= M ⋅ nC exp −nh r f Z ( z0 ) ⋅ {1 + o(1)} = o(1).



This proposition ensures that we are provided sufficiently many data points in Ψ hz0 as the sample size grows. Therefore an asymptotic analysis of the conditional FDH and DEA estimators can be justified. Next we investigate the sampling distribution of the conditional FDH and DEA estimators. For this we assume additionally:

Assumption 4

( X , Y , Z ) has a joint density f XYZ (⋅, ⋅, ⋅) , and it is continuous on its

support.

12

Assumption 5 For z in a neighborhood of z0 , the conditional density f XY |Z (⋅, ⋅ | z ) of ( X , Y ) | Z = z exists and it satisfies f XY |Z (θ ( x, y | z ) x, y | z ) > 0 for all ( x, y , z ) in a h neighborhood of ( x0 , y0 , z0 ) . Moreover, f XY f XY |Z (⋅, ⋅ | z ) as |Z (⋅, ⋅ | z ) converges to

n → ∞.

Assumption 6F

θ (⋅, ⋅ | z0 ) and θ h (⋅, ⋅ | z0 ) are continuously differentiable in a

neighborhood of ( x0 , y0 ) , and the elements of the vector of their first partial derivatives at ( x0 , y0 ) are all nonzero.

Assumption 6D θ (⋅, ⋅ | z0 ) and θ h (⋅, ⋅ | z0 ) are twice continuously differentiable in a neighborhood of ( x0 , y0 ) , and their Hessian matrices evaluated at ( x0 , y0 ) are positively definite. The next theorem is the conditional version of the sampling distribution of the FDH estimator provided by Park, Simar and Weiner (2000):

Theorem 1 Under Assumption 1F, 2-5 and 6F, the conditional FDH efficiency score

θˆFDH ( x0 , y0 | z0 ) in (11) has the Weibull limit distribution. Proof. Let cNW be a positive constant and

{

}

NWzh0 ( x, y ) = Ψ hz0 ∩ (u, v ) ∈ R p + q | u ≤ x, v ≥ y . For t ' = (nh r )−1/( p+ q ) t > 0 ,

13

(

P θˆ( x0 , y0 | z0 ) − θ h ( x0 , y0 | z0 ) ≥ t '

(

)

= P No pair of ( X i , Yi ) ∈ NWzh0 (t ' x0 , y0 ), Zi − z0 ≤ h / 2

{

(

= 1 − P ( X , Y ) ∈ NWzh0 (t ' x0 , y0 )

{

)

)

}

Z − z0 ≤ h / 2 P ( Z − z0 ≤ h / 2 )

}

n

= 1 − (t ') p + q cNW f XY |Z ( x0 , y0 | z0 ) ⋅ h r f Z ( z0 ) × {1 + o(1)}

{

}

n

= exp −t p + q cNW f XYZ ( x0 , y0 , z0 ) × {1 + o(1)} ■ Next, we present a large sample approximation procedure for the sampling distribution of the conditional DEA estimator. We point out that the following procedure is merely an extension of the procedure in Jeong (2004) based on a conditional argument. Let P( x0 ) be a p × ( p − 1) matrix whose columns form an orthonormal

{

}

basis for x0⊥ = x ∈ R p | xT x0 = x0T x = 0 . Consider the transformation which maps

x ∈ R p to (u T , w)T ∈ R p −1 × R : u = P( x0 )T x; w =

x0T x , x0

where a denotes the Euclidean norm of a vector a . This transform is one-to-one and its inverse is given by

x = P( x0 )u + w

x0 . x0

Lemma 1 Let g0 and g 0h be the functions defined by  g 0 (u , v) = inf  w > 0 

   x0 , v + y0  ∈ Ψ z0  ,  P( x0 )u + w  x0   

 g 0h (u , v) = inf  w > 0 

   x0 h  P ( x ) u + w , v + y ∈ Ψ  0 0 z0  .  x 0   

Then we have

14

θ ( x0 , y0 | z0 ) = x0

−1

θ h ( x0 , y0 | z0 ) = x0

g 0 (0 p −1 , 0q ) ,

−1

g 0h (0 p−1 , 0q ) ,

where 0n denotes the n -vector with all elements being zero. Moreover, under Assumption 1D, 2 and 6D, both g0 and g 0h are convex in (u , v) and it follows that

((

g 0 (0 p−1 , 0q ) − g0h (0 p −1 , 0q ) = o nh r

)

−2 /( p + q +1)

).

By this lemma, estimation of θ h ( x0 , y0 | z0 ) reduces to that of the convex function g 0h at 0 p −1+ q . Now consider the transformed data

 S n ' = (U i , Vi , Wi , Zi ) 

  U i   P( x0 )T X i  x0T X i = , W = , ( X , Y , Z ) ∈ S , i ∈ I ( z , h )   .   i i i i n 0 x0  Vi   Yi − y0  

By the definition of g 0h , (U i , Vi , Wi ) satisfies Wi ≥ g 0h (U i , Vi ) for i ∈ I ( z0 , h) . Hence,

g 0h can be estimated from the transformed data S n ' .

Lemma 2 Define

 gˆ 0h (u , v) = min  ∑ ξiWi u = ∑ ξiU i , v = ∑ ξiVi for some ξi ≥ 0 i∈I ( z0 ,h ) i∈I ( z0 , h ) i∈I ( z0 , h)  such that ∑ ξi = 1 . i∈I ( z0 , h )  Then, with probability tending to one, it follows that

θˆDEA ( x0 , y0 | z0 ) = x0

−1

gˆ 0h (0 p −1 , 0q ) .

Thus, by Lemmas 1 and 2, for the sampling distribution of θˆDEA ( x0 , y0 | z0 ) , we may investigate that of gˆ 0h (0 p−1 , 0q ) instead.

15

Let ∇ denote the partial differential operator. Along the lines of Jeong (2004), consider the canonical transform on {(U i , Vi , Wi ) | i ∈ I ( z0 , h)} given by: for i ∈ I ( z0 , h)

 U i*  r  *  = nh V  i 

( )

( )

Wi* = nh r

1/( p + q +1)

2/( p + q +1)

1/ 2

1 h   2 G2,0   

Ui     Vi 

 U  h h T  i  Wi − g 0,0 − g1,0      Vi  

h h h where g 0,0 = g 0h (0 p −1 , 0q ) , g1,0 = ∇g 0h (0 p −1 , 0q ) , and G2,0 = ∇2 g 0h (0 p −1 , 0q ) .

Lemma 3

{(U ,V * i

i

*

Let Conv(⋅, ⋅ | z0 , h) be the lower boundary of the convex hull built by

}

,Wi * ) | i ∈ I ( z0 , h) :  Conv(u * , v* | z0 , h) = min  ∑ ξiWi * u * = ∑ ξiU i* , v* = ∑ ξiVi * i∈I ( z0 ,h ) i∈I ( z0 , h ) i∈I ( z0 , h)

  for some ξi ≥ 0 such that ∑ ξi = 1 .  i∈I ( z0 ,h ) Then, with probability tending to one it follows that

( )

Conv(0 p −1 , 0 q | z0 , h) = nh r

By

( nh )

combining

r 2/( p + q +1)

{θˆ

DEA

Lemmas

2 /( p + q +1)

1,

{gˆ

2

}

( x0, y0 | z0 ) − θ ( x0, y0 | z0 )

h 0

}

(0 p−1 , 0q ) − g0h (0 p −1 , 0q ) .

and and

3,

x0

−1

we

may

show

that

Conv(0 p −1 , 0q | z0 , h) have the

same limit distribution as n → ∞ . Note that, however, the sampling distribution of

Conv(0 p −1 , 0q | z0 , h) is not yet at hand. Next we present a procedure for the large sample approximation of the distribution of Conv(0 p −1 , 0q | z0 , h) .

{

}

Note that (U i* ,Vi * ,Wi * ) | i ∈ I ( z0 , h) has the new lower boundary w* = g 0h* (u* , v* )

16

in the coordinate system (u * , v* , w* ) such that g 0h* (u * , v* ) = u *T u * + v*T v* + o(1) uniformly for (u * , v* ) in any compact set contained in R p −1 × R q , as n → ∞ .

Write f 0h* for the conditional density of (U i* , Vi * , Wi * ) , in the coordinate system (u * , v* , w* ) , given Z − z0 ≤ h / 2 . Then, via the change of variable technique, we have by Assumptions 4 and 5

(

h sup' nh r det G2,0 /2

)

1/ 2

f 0h* (u * , v* , w* ) − f 0h → 0

for any ε n ↓ 0 , where sup' denotes the supremum over (u * , v* , w* ) such that

( )

(u * , v* ) ≤ nh r

1/( p + q +1)

εn

( )

u *T u * + v*T v* ≤ w* ≤ nh r

and

conditional density of (U ,V ,W )

2 /( p + q +1)

at the boundary point

εn ,

f 0h

is

h (0 p−1 , 0q , g 0,0 )

the given

h Z − z0 ≤ h / 2 which equals f XY | Z ( x0 , y0 z0 ) , and det( A) denotes the determinant of

a matrix A . Now we are ready to describe the procedure to simulate the limit distribution of Conv(0 p −1 , 0q | z0 , h) . Define κ =

{(

f 0h

 Bκ = (u * , v* , w* ) (u* , v* ) ∈  −  

)

(

2

)}

1/( p + q +1)

(

h det G2,0 /2

)

κ / 2 ( nh r )

1/( p + q +1)

and

,

(

)

κ / 2 ( nh r )

1/( p + q +1)

( )

and u*T u* + v*T v* ≤ w* ≤ u*T u * + v*T v* + κ nh r

1/( p + q +1)

 

p −1+ q

}.

Let  a  be the nearest integer to a ∈ R . Consider a new random sample

{(U ,V ,W u i

u

i

i

u

) i = 1, 2,L ,  nh r 

} which is generated from the uniform distribution on ( )

Bκ . The uniform density is equal to nh r

−1

h κ − ( p + q +1) / 2 = ( nh r ) det ( G2,0 / 2)

Convu (⋅, ⋅ | z0 , h) be the version of Conv(⋅, ⋅ | z0 , h) 17

−1

−1/ 2

f 0h . Let

built by the new sample

{(U ,V ,W u i

u

i

u

i

}

) i = 1, 2,L ,  nh r  .

Lemma 4 Under Assumption 1D, 2, 5 and 6D,

Conv(0 p −1 , 0q | z0 , h)

and

Convu (0 p−1 , 0q | z0 , h) have the same limit distribution.

Theorem 2 Suppose Assumption 1D, 2, 5 and 6D hold. For z > 0

((

P nhr

)

2/( p + q +1)

{θˆ

DEA

}

)

( x0 , y0 | z0 ) − θ ( x0 , y0 | z0 ) ≤ z → F ( z )

as n → ∞ , where

(

F ( z ) = lim P x0 n →∞

−1

)

Conv u (0 p −1 , 0 q | z0 , h) ≤ z .

5 Concluding remarks In this paper, we analyzed the asymptotics of the conditional FDH and DEA estimators. We established consistency of those estimators and obtained their proper limit distributions. By means of these results, we are able to correct their biases and construct confidence intervals for use in practice. However, as is typically observed in nonparametric function estimation problems, these procedures require additional information that depends on unknown quantities. In particular, a further statistical inference with the conditional FDH estimator based on its asymptotic properties may suffer from a severe departure of its finite sample properties from the asymptotic results, which was already pointed out in Park, Simar and Weiner (2000), Jeong and Simar (2006), and others. To avoid this problem, it is natural to consider a bootstrapping idea. For example, for the choice of the bandwidth h , one may use the minimizer of a

(

)

2

consistent bootstrap approximation of E θˆ( x0 , y0 | z0 ) / θ ( x0 , y0 | z0 ) − 1 . Any detailed study on this is left for future research.

18

Notes Byeong U. Park’s work was supported by SRC/ERC program of MOST/KOSEF (R112000-073-00000). Léopold Simar gratefully acknowledges the research support from the “Interuniversity Attraction Pole”, Phase V (No. P5/24) from the Belgian Science Policy.

19

References Adolphson, D. L., G. C. Cornia, and L. C. Walters. (1991). “A unified framework for classifying DEA models,” Operational Research 90, 647-657. Banker, R. D. and R. C. Morey. (1986). “Efficiency analysis for exogenously fixed inputs and outputs,” Operations Research 34, 513-521. Bhattacharyya, A., C. A. K. Lovell, and P. Sahay. (1997). “The impact of liberalization on the productive efficiency of Indian commercial banks,” European Journal of Operational Research 98, 332-347. Cazals, C., J. P. Florens, and L. Simar. (2002). “Nonparametric frontier estimation: a robust approach,” Journal of Econometrics 106, 1–25. Charnes, A., W. W. Cooper, and E. Rhodes. (1978). “Measuring the inefficiency of decision making units,” European Journal of Operational Research 2, 429-444. Daraio, C. and L. Simar. (2005a). “Introducing environmental variables in nonparametric frontier models: a probabilistic approach,” Journal of Productivity Analysis 24, 93-121. Daraio, C. and L. Simar. (2005b). “Conditional nonparametric frontier models for convex and nonconvex technologies: a unifying approach,” Discussion Paper #0502, Institut de Statistique, UCL, Belgium (http://www.stat.ucl.ac.be). Deprins, D., L. Simar, and H. Tulkens. (1984). “Measuring labor inefficiency in post offices.” In Marchand, M., P. Pestieau, and H. Tulkens (eds.), The Performance of Public Enterprises: Concepts and measurements. North-Holland: Amsterdam. Farrell, M. J. (1957). “The measurement of productive efficiency,” Journal of the Royal Statistical Society: Series A 120, 253-281. Fried, H. O., C. A. K. Lovell, and P. Vanden Eeckaut. (1993). “Evaluating the performance of U. S. credit unions,” Journal of Banking and Finance 17, 251-265.

20

Fried, H. O., S. S. Schmidt, and S. Yaisawarng. (1999). “Incorporating the operating environment into a nonparametric measure of technical efficiency,” Journal of Productivity Analysis 12, 249-267. Gijbels, I., E. Mammen, B. U. Park, and L. Simar. (1999). “On estimation of monotone and concave frontier functions,” Journal of the American Statistical Association 94, 220-228. Jeong, S. -O. (2004). “Asymptotic distribution of DEA efficiency scores,” Journal of the Korean Statistical Society 33, 449-458. Jeong, S. -O., and B. U. Park. (2006). “Large sample approximation of the limit distribution of convex-hull estimators of boundaries,” Scandinavian Journal of Statistics 33, 139-151. Jeong, S.-O., and L. Simar. (2006). “Linearly interpolated FDH efficiency score for nonconvex frontiers,” Journal of Multivariate Analysis, in print. Kneip, A., B. U. Park, and L. Simar. (1998). “A note on the convergence of nonparametric DEA estimators for production efficiency scores,” Econometric Theory 14, 783-793. Kneip, A., L. Simar, and P. W. Wilson. (2003). “Asymptotics for DEA Estimators in Nonparametric Frontier Models,” Discussion paper #0317, Institut de Statistique, UCL, Belgium (http://www.stat.ucl.ac.be). McCarty, T. and S. Yaisawarng. (1993). “Technical efficiency in New Jersey school districts.” In H. O. Fried, C. A. K. Lovell, and S. S. Schmidt (eds.), The Measurement of Productivity Efficiency: Techniques and Applications. New York:Oxford University Press. Park, B. U., S. –O. Jeong, and Y. K. Lee. (2006). “Nonparametric estimation of production efficiency,” to appear in the volume in honor of Peter Bickel’s 65th birthday. Park, B. U., L. Simar, and Ch. Weiner. (2000). “The FDH estimator for productivity efficiency scores: Asymptotic properties,” Econometric Theory 16, 855-877.

21

Shephard, R. W. (1970). Theory of Cost and Production Function, Princeton University Press. Simar, L., and P. W. Wilson. (2000). “Statistical inference in nonparametric frontier models: The state of the art,” Journal of Productivity Analysis 13, 49-78.

22

Lihat lebih banyak...

Comentarios

Copyright © 2017 DATOSPDF Inc.