ejercicioi

Share Embed


Descripción

Computational Statistics & Data Analysis 51 (2007) 2559 – 2572 www.elsevier.com/locate/csda

Quantile estimation in two-phase sampling María del Mar Ruedaa,∗ , Antonio Arcosa , Juan Francisco Muñoza , Sarjinder Singhb a Department of Statistics and O.R., University of Granada, 18071 Granada, Spain b Department of Statistics, St. Could State University, 720 Fourth Avenue South, St. Cloud, MN 56301-4498, USA

Received 12 January 2005; received in revised form 3 January 2006; accepted 3 January 2006 Available online 24 January 2006

Abstract The estimation of quantiles in two-phase sampling with arbitrary sampling design in each of the two phases is investigated. Several ratio and exponentiation type estimators that provide the optimum estimate of a quantile based on an optimum exponent  are proposed. Properties of these estimators are studied under large sample size approximation and the use of double sampling for stratification to estimate quantiles can also be seen. The real performance of these estimators will be evaluated for the three quartiles on the basis of data from two real populations using different sampling designs. The simulation study shows that proposed estimators can be very satisfactory in terms of relative bias and efficiency. © 2006 Elsevier B.V. All rights reserved. Keywords: Auxiliary information; Finite population quantiles; Two-phase sampling; Stratified random sampling

1. Introduction The problem of estimating a population mean in the presence of an auxiliary variable has been widely discussed in the finite population sampling literature. However, for the problem of estimating a population median, the situation is quite different and only recently has this problem been discussed. Rao et al. (1990) proposed ratio and difference estimators for the median using a design-based approach. Kuk and Mak (1989) proposed two estimators for which it was only necessary to know the values of the median of the auxiliary variable for the whole population. More recently, Rueda et al. (1998) and Rueda and Arcos (2001) proposed confidence intervals for quantiles based on ratio and difference estimators of the distribution function. In Rueda et al. (2003, 2004) the population information is used through a quantile of the auxiliary variable with the same or different order as that of the quantile of the main variable considered for estimation using difference type estimators. The above estimators are based on prior knowledge of the median Qx (0.5) of the auxiliary characteristic. In many cases Qx (0.5) may not be known, and it may be seen that taking the sample selection in two phases is an attractive solution. Two-phase sampling is a good compromise for surveys in which no prior knowledge is available about the population. A key to successful two-phase sampling is the creation of a highly informative frame for the part of the population ∗ Corresponding author. Departamento de Estadística e I.O., Facultad de Ciencias, Avda. Fuentenueva, Universidad de Granada, 18071, Granada, Spain. Tel.: +34 958240494; fax: +34 958243267. E-mail addresses: [email protected] (M. del Mar Rueda), [email protected] (A. Arcos), [email protected] (J.F. Muñoz), [email protected] (S. Singh).

0167-9473/$ - see front matter © 2006 Elsevier B.V. All rights reserved. doi:10.1016/j.csda.2006.01.002

2560

M. Rueda et al. / Computational Statistics & Data Analysis 51 (2007) 2559 – 2572

from which the subsample is drawn. The estimation of the median in two-phase sampling is developed by Singh et al. (2001), Singh (2003) and Allen et al. (2002). Swamy et al. (2005) have shown that auxiliary information, without knowing its true functional form, can also be used to reduce the bias while estimating the relation among the federal funds and the Federal Reserve’s expectations about future values of certain policy variables is considered. These papers have been developed using simple random sampling. Sampling surveys for economic variables (as income) that possess highly skewed distributions are almost always complex in structure, and methods such as stratification and probability proportional to size are common place. In this article we propose various estimators of a -quantile in two-phase sampling with arbitrary sampling designs in each of the two phases.

2. Quantile estimation in two-phase sampling This study has been carried out under the fixed population approach. Let U be a finite  population with N different elements where y1 , . . . , yN are the values of the variable of interest y, and Fy (t)=N −1 N i=1  (t − yi ), (−∞ < t < ∞), is the population distribution function, where (a) takes the value 1 if a 0 and the value 0 otherwise. Let x be an auxiliary variable and xi (i = 1, . . . , N) be the value of its ith population unit.   The first-phase sample s  of size n is drawn according to a sampling design d1 , such that pd1 s  is the probability that s  is chosen and where the corresponding first and second order probabilities are i and ij for i and j ∈ U . For  can the elements in s  , information of the auxiliary variable  be recorded. Given s , the second-phase sample s of size  n is drawn according to the design d2 such that p s/s  is the conditional probability of choosing s. The inclusion probabilities under this design are denoted by i/s  and ij /s  . A particular case is presented when the variable x is used to stratify s  into L strata denoted by sh , (h = 1, . . . , L), with nh elements in the hth stratum. In this way, a sample sh of size nh can be drawn from sh according to a design    ph /s  independently from each stratum. The final sample is s = L h=1 sh . This particular design is called Two-phase sampling for stratification. 2.1. Direct estimation  y ()=inf t|F HTy (t) Without using auxiliary information, the natural candidate to estimate the -quantile Qy () is Q   −1 −1   Thompson (1952) type estimator of Fy (t)  = FHTy (), where FHTy (t)=N i∈s  (t − yi ) /i is the Horvitz and    and the inclusion probability of the ith element is given by i = s  i pd1 s i/s  . Consequently, to determine i we must know the probabilities i/s  for every s  , which we ordinarily do not, because i/s  may depend on the outcome of phase one (for example if the second-phase sample is drawn by a sampling proportional to an auxiliary variable). Because the Horvitz–Thompson estimator of a mean cannot always be used in practice, in two phase sampling, Särndal et al. (1992) proposed the use of ∗ estimators . Using this idea, we introduce the quantities     i = pd1 s  , ij = pd1 s  , ∗i = i · i/s  and ∗ij = ij · ij /s  , s  i

s  i,j

to define the ∗ -estimator of the distribution function as  (t − yi ) ∗ (t) = 1 F , HTy N ∗i i∈s

and thus, we suggest the following direct estimator of the -quantile: ∗y () = F ∗−1 (). Q HTy

(1)

∗y () does not generally agree with the estimator Q y () except in rare cases, but it makes direct calculation Note that Q possible for all sample designs d1 and d2 used in each phase.

M. Rueda et al. / Computational Statistics & Data Analysis 51 (2007) 2559 – 2572

2561

∗y () estimator 2.2. Properties of the Q ∗y () estimator. For this, a linear approximation is needed because Q ∗y () is not We now study the properties of the Q a continuous function. ∗y () can be expressed asymptotically as a linear function of the estimated distribution function The estimator Q evaluated at the quantile Qy () by the Bahadur representation (see Chambers and Dunstan, 1986):



  1 ∗ ∗y () − Qy () =  −F  + O n−1/2 , (2) Q HTy Qy () fy Qy () where fy (·) denotes the derivative of the limiting value of Fy (·) as N −→ ∞. This linear approximation previously used by Kuk and Mak (1989) and Chen and Wu (2002) helps to study the asymptotic properties of the estimator. ∗y () is asymptotically unbiased because F ∗ (t) is an unbiased estimator of F (t). On the one hand, the estimator Q HTy



    ∗  ∗y () = Qy () + O n−1/2 . In this way, E  − F Qy () = 0, and by using (2) it can be seen that E Q HTy

∗y (), to the first degree of approximation, as On the other hand, from (2) we obtain the asymptotic variance of Q ⎛  Qy () − yi   Qy () − yj 



1 1 ∗     ⎝  ij − i j V Qy () = 2 2  N fy Qy () i j i,j ∈U ⎡    ⎤⎞     Qy () − yi  Qy () − yj ⎦⎠ , +Ed1 ⎣ ij /s  − i/s  j/s  ∗i ∗j  i,j ∈s

and one can construct an unbiased estimator of the variance as



⎛ ∗y () − yi  Q ∗y () − yj  −     Q

 1 ij i j  Q ∗y () = 1 ⎝  V N 2 fy2 Qy () ∗ij i j i,j ∈s



∗y () − yi  Q ∗y () − yj ij /s  − i/s  j/s   Q ⎠. + ij /s  ∗i ∗j i,j ∈s

  An approximate value of fy Qy () can be obtained by applying standard methods such as the kernel or the kth nearest neighbour methods (Silverman, 1986). The variance estimator is stated in an explicit form (it does not depend on the expected value over the first phase design), thus making direct calculation possible. 3. Estimation using auxiliary information In the previous section an estimator is defined without using auxiliary information. We now define a class of estimators that takes the auxiliary variable into account. Assuming simple random and without replacement (SRSWOR) sampling and the median of the variable x is known, Kuk and Mak (1989) proposed a ratio estimator for the population median as ry (0.5) = Q y (0.5) Qx (0.5) . Q x (0.5) Q Furthermore, Kuk and Mak (1989) proposed other estimators of quantiles under SRSWOR design called position and stratification estimators, but the extension of them to more complex sampling designs is very difficult. Rueda et al. (2003, 2004) proposed, for any sampling design d and for any , difference and exponentiation methods to estimate a -quantile. Singh et al. (2001) suggested ratio, regression, position and stratification estimators of the median when the sample is drawn in two phases, using SRSWOR in both phases. Under this sampling design, Allen et al. (2002) proposed two classes of estimators for the population median using information on two auxiliary variables x and z in double sampling when the population median of z is known.

2562

M. Rueda et al. / Computational Statistics & Data Analysis 51 (2007) 2559 – 2572

3.1. Proposed estimators Here, we present a class of estimators of finite population quantiles when the sample is drawn using a general two-phase sampling, described earlier, as ∗ ∗ H Q y () = H (Qy (), t ),

(3)

x () ∗x ()/Q x (), and Q x () being the estimator of Qx () from the first stage of sampling, i.e. Q with t∗ = Q   −1   −1  (t) , where F  (t) = N = inf t|F i∈s   (t − xi ) /i . The function H satisfies the following conditions: HTx HTx   (1) It assumes values in a closed convex subset C contains the point Qy (), 1 ;  ⊂ R2 which  (2) H is a continuous function in C such that H Qy (), 1 = Qy (), and (3) The first and second order partial derivatives of H exist and are also continuous in C,with   jH (q, t ∗ )    = 1. H10 Qy (), 1 =  ∗ jq (q,t )=(Qy (),1) A particular case within the general class of estimators H is the ratio type estimator  ∗y () Qx () , ∗yr () = Q Q ∗x () Q which corresponds to the choice H (q, t ∗ ) = q/t ∗ . Another estimator of the -quantile, called the exponentiation estimator, can be derived from   x () Q ∗ ∗ ye () = Q y () Q , ∗x () Q with  as a fixed constant, which corresponds to the choice of H (q, t ∗ ) = q/(t ∗ ) . ∗ye () = Q ∗y (), i.e. Q ∗ye () coincides with the ∗ -estimator, if  = 1 then Q ∗ye () = Q ∗yr (), Note 1. If  = 0 then Q ∗ ∗   and if  = −1 then Qye () = Qyp (). This we can define as a product estimator. ∗yr () and Q ∗ye () lead, Note 2. If SRSWOR sampling is used in each phase and  = 0.5, the proposed estimators Q (a) (b) y proposed by Singh et al. (2001). y and M respectively, to the estimators M 3.2. Properties of the class of estimators Any estimator in H is asymptotically unbiased for Qy (). This result can be obtained from the following expressions:



  1 ∗ ∗y () − Qy () =  −F  + O n−1/2 , Q HTy Qy () fy Qy ()

  1 ∗ (Qx ()) + O n−1/2 , ∗x () − Qx () = −F Q HTx fx (Qx ()) x () − Qx () = Q

  1  (Qx ()) + O(n −1/2 ), −F HTx fx (Qx ())

  and by using the first order Taylor’s series expansion for H about the point Qy (), 1 :   ∗  H () = H Q (), 1 + Q () − Q () H10 (Qy (), 1) Q y y y y

∗    + t − 1 H01 Qy (), 1 + O n−1 , where H10 and H01 denote the first order partial derivatives of H with respect to q and t ∗ , respectively.

(4)

M. Rueda et al. / Computational Statistics & Data Analysis 51 (2007) 2559 – 2572

2563

∗ (t) and F ∗ (t) are unbiased estimators of Fy (t) and Fx (t), respectively, any estimator in H is asympWhen F HTy HTx totically unbiased for Qy (). 3.3. Asymptotic expression of variances Consider the Taylor’s series expansion (4) and consequently the expression  



∗x () Q H ∗   − 1 H01 (Qy (), 1) + O n−1 . Qy () − Qy () = Qy () − Qy () + x () Q Then, we have   e1 − e2 H H01 Qy ( , 1) Q y () − Qy () = Qy ()e0 + 1 + e2   Qy ()e0 + (e1 − e2 ) (1 − e2 ) H01 Qy (), 1     = Qy ()e0 + (e1 − e2 ) H01 Qy (), 1 − e2 (e1 − e2 ) H01 Qy (), 1 , where ∗y ()/Qy () − 1, e0 = Q

∗x ()/Qx () − 1 e1 = Q

and

x ()/Qx () − 1, e2 = Q

and we obtain, to the first order of approximation, the variance

  yH () = Qy ()2 V (e0 ) + H01 Qy (), 1 2 V (e1 − e2 ) V Q   + 2H01 Qy (), 1 Qy () Cov (e0 , e1 − e2 ) . On the other hand, in two phase sampling:



H  H    H + V () = E V Q ()/s E Q ()/s V Q d1 d1 y y y reflects the variation due to each of the two phases of sampling. Using the known properties of the Horvitz–Thompson  estimator and its variance by denoting ij = ij − i j and sij = ij /s  − i/s  j/s  , we obtain ⎛   ⎞ 

 Q () − y  Q () − y 1 1 y i y j  H ⎠  ⎝ Vd1 E Q ij y ()/s = N 2 fy2 Qy () i j i,j ∈U

and

⎛   ⎞ 

1   Qy () − yi  Qy () − yj  H ⎝ 1 ⎠   sij Ed1 V Q y ()/s = Ed1 N 2 fy2 Qy () ∗i ∗j  i,j ∈s     2 Q (), 1   (Qx () − xi )  Qx () − xj H01 1 1 y s + ij Q2x () N 2 fx2 (Qx ()) ∗i ∗j i,j ∈s    H01 Qy (), 1 1 1   +2 Qx () N 2 fy Qy () fx (Qx ())       Qy () − yi  Qx () − xj s × ij . ∗i ∗j  i,j ∈s

The last variance is not stated explicitly, but as an expected value over the first phase design. This causes no problem for the variance estimation,       Qy () − yi  Qy () − yj ij i j i,j ∈U

2564

M. Rueda et al. / Computational Statistics & Data Analysis 51 (2007) 2559 – 2572

which can be estimated by



∗y () − yi  Q ∗y () − yj ij  Q i,j ∈s

and

∗ij

⎛ Ed1 ⎝



i,j ∈s

i

j

,

  ⎞   Qy () − yi  Qy () − yj ⎠ ij ∗ ∗   i j  s

by



 ∗y () − yi  Q ∗y () − yj sij  Q i,j ∈s

ij /s 

∗i

∗j

  and fx (Qx ()) and fy Qy () by following Silverman (1986). The asymptotic variances of ratio, product and exponentiation estimators corresponding to H (q, t ∗ ) = q/t ∗ , H (q, t ∗ ) = qt ∗ and H (q, t ∗ ) = q/(t ∗ ) , respectively, can be derived. 3.4. Optimal estimators ∗ye (). Again the optimality is defined In this section we derive the expression of the optimal estimator in the class Q in the sense of minimizing the (asymptotic) variance of these estimators. This leads to the optimal value of  given by     x () − Cov Q y (), Q x () y (), Q Qx () Cov Q    . opt = x () + Q x () − 2 Cov Q x (), Q x () Qy () V Q By using the properties of two-phase sampling the next expression can be obtained

  

s ∗ ∗ (Q (Q E  () − y )/ () − x )/  d1 y i x j i,j ∈s ij i j Qx () fx (Qx ()) 

 ,  opt =  

s ∗ Qy () fy Qy () E (Qx () − xi )/ (Qx () − xj )/∗  d1 i,j ∈s

ij

i

j

and then 

yopt () = Q ∗y () Q

x () Q ∗x () Q

opt .

It can be easily seen



   opt H  y () − K1 V Q () V Q () =V Q y y      x () − Cov Q y (), Q x () 2 y (), Q   Cov Q y () −     , =V Q x () + Q x () − 2 Cov Q x (), Q x () V Q

(5)

H that is, the lower bound of the variance of Q y () is the variance of the exponentiation estimator with opt . yopt () always remains more efficient than the simple estimator Q y (). Eq. (5) shows that the proposed estimator Q Specifically, K1 is the amount by which the variance is reduced when we use the exponentiation estimator with an y () estimator. optimal  instead of the Q

M. Rueda et al. / Computational Statistics & Data Analysis 51 (2007) 2559 – 2572

2565

In practice the optimal value of  is unknown. Nevertheless, the sample data can be used to calculate its estimator. Thus, an estimator of the optimal value of  is given by



   s ∗y () − yi /∗  Qx () − xj /∗ ∗x () fx (Qx ()) i,j ∈s ij /ij /s   Q i j Q

     (6) =    ∗ . ∗y () fy Qy ()  s ∗ Q  / () − x () − x   /  Q / (Q ) x i x j ij /s ij i,j ∈s i j We can define an optimal estimator of the -quantile as   x () Q   ∗   Qy () = Qy () . ∗x () Q



  y () = Qy () + o n−1 and to the Following the procedure discussed in Allen et al. (2002) it can be shown that E Q



 y () = V Q yopt () , i.e., the estimators Q y () and Q yopt () are asymptotically first degree of approximation, V Q equivalent. 4. Two-phase sampling for stratification In Section 2 we show a particular case of two-phase sampling where the first phase sample is stratified using the auxiliary variable. This sampling design is called two-phase sampling for stratification. We now define an estimator for the quantile Qy () under this sampling design and analyze several of its properties. In the first place, we define the following estimator for the distribution function: L  (t − yi ) st∗ (t) = 1 F , ∗i N h=1 i∈sh

st∗−1 (), where the inverse F st∗−1 exists in the same way ∗st () = F and we suggest estimating the quantile Qy () by Q −1 above. as F HTy ∗st () estimator, we will first analyze the properties of the F st∗ (t) estimator. To study the properties of the Q ∗ st (t) is unbiased and its variance is given by Note that F ⎛ ⎡   ⎤⎞   L  ∗   t − y  t − y  − y  − y 1  (t (t ) ) j j i i st (t) = ⎝ ⎦⎠ . (7) ij + Ed1 ⎣ sij V F ∗ ∗ N2 i j   i j  i,j ∈U

h=1 i,j ∈sh

Thus, an unbiased estimator of variance is given by ⎛  ⎞    L ij  (t − yi )  t − yj sij  (t − yi )  t − yj  ∗  1  F st (t) = ⎝ ⎠, V + N2 ∗ij i j ij /s  ∗i ∗j i,j ∈s

(8)

h=1 i,j ∈sh

because each component of (8) is unbiased for its counterpart in Eq. (7).   ∗st () estimator can be expressed as a linear function of F st∗ Qy () . In addition, because Similar to Section 2.2, the Q st∗ (t) is unbiased of Fy (t), we deduce that Q ∗st () is asymptotically unbiased. An approximate unbiased estimator of F the variance is given by ⎛   ∗   ∗ st () − yi  Q st () − yj ij  Q   ∗ 1 1  Q st () = ⎝  V N 2 fy2 Qy () ∗ i j i,j ∈s ij   ∗ ⎞  ∗  L st () − yi  Q st () − yj sij  Q ⎠. + ij /s  ∗i ∗j h=1 i,j ∈sh

2566

M. Rueda et al. / Computational Statistics & Data Analysis 51 (2007) 2559 – 2572

Table 1 Description and references of populations Population

Description

Variables

yx

Fam1500 (N = 1500)

Families of Andalucía (Spain)

y:Feeding expenses x1 :Family incomes x2 :Other expenses

0.848 0.546

Counties (N = 304)

Counties in Carolina and Georgia

y:Population in 1970 x1:Population in 1960 x2:Households in 1960

0.982 0.982

References Fernández and Mayor (1994)

Royall and Cumberland (1981) Valliant et al. (2000)

5. Empirical study The present investigation proposes several estimators for quantiles in sampling in two phases with unequal probabilities. The use of two-phase sampling for stratification has also been considered for estimating quantiles. In this section we carried out a simulation study to reveal the behaviour of these estimators and to point out the most efficient estimator. For this purpose, we examined two natural populations, used previously for finite population sampling. The populations in question are Fam1500 and Counties. A brief description and the references of these populations can be seen in Table 1. In these populations there are several auxiliary variables having different linear correlation coefficients with the variable of interest y. In this study the behaviour of estimators can be observed when strong and weak relationships between variables are considered. We have generated 1000 independent samples under different methods in each phase. The first phase sample size, n , is fixed at 150 and the second phase sample size, n, is allowed to change from 10 to 100. The methods used are (1) (SRSWOR.M) The first phase is SRSWOR of size n . The second phase is carried out using the Midzuno–Sen method (Singh, 2003, p. 390) to extract samples with unequal probabilities: i =

n , N

i/s  =

xi n − n n−1  → ∗i = i i/s  . +  n − 1 j ∈s  xj n −1

(2) (SRSWOR.P) The first phase is SRSWOR of size n . The second phase is carried out by Poisson sampling (Singh, 2003, p. 499) such that the conditional inclusion probability is proportional to x: i =

n , N

i/s  = n 

xi

j ∈s  xj

→ ∗i = i i/s  .

(3) (ST.M) Two-phase sampling for stratification: in the first phase, a sample is drawn according SRSWOR. For the elements in s  information is recorded that will permit a stratification. From stratum h, a sample sh of size nh is drawn with unequal probabilities using the Midzuno–Sen Method: i =

n , N

i/sh =

nh − nh xi nh − 1  → ∗i = i i/sh +  nh − 1 x n − 1  j ∈s j h

for i ∈ sh .

h

The performance of the proposed estimators is evaluated for the three quartiles,  = 0.25, 0.50, 0.75, in terms of relative bias (%) (RB) and relative efficiency (RE) with Monte Carlo approximations derived from the B = 1000 independent samples   iy () B i MSE Q − Q () Q () 1 y b y ,  RBi = 100 × , REi = B Qy () ∗ () MSE Q b=1

y

M. Rueda et al. / Computational Statistics & Data Analysis 51 (2007) 2559 – 2572

β=0.25

β=0.5

1.0 RE

0.8 0.7 0.6 0

RE

(*)

25

50

75

100

2.2 2.0 1.8 1.6 1.4 1.2 1.0 0.8

1.0 0.9 0.8 0.7 0.6 0

25

50

75

100

Estimator 1 Estimator 2 Estimator 3

100

0

25

50 n

75

100

1.0 0.8

0.8 100

75

1.2

1.0

75

50

1.4

1.2

50 n

25

1.6

1.4

25

0 1.8

1.6

0 (**)

β=0.75

1.0 0.9 0.8 0.7 0.6 0.5

0.9

2567

0

25

50 n

75

100

(*) x1 is used as an auxiliary variable and x2 is used to assign probabilities. (**) x2 is used as an auxiliary variable and x1 is used to assign probabilities.

Fig. 1. RE for Fam1500 population and under SRSWOR.M sampling design. n = 150.

iy () denotes the ith proposed estimator with where b indexes the bth simulation run and Q  1y () = Q ∗y () Qx () , • Q ∗ () Q x  x () Q 2y () = Q ∗y () • Q , where   can be seen in (6), ∗ () Q  x opt  3y () = Q ∗y () Qx () • Q , ∗x () Q 4y () = Q ∗st (). • Q  2      i y ()b − Qy () and MSE Q ∗y () is similarly defined for Q ∗y (), the direct estimator iy () = B −1 B Q MSE Q b=1 defined in (1). This does not use the auxiliary information. The random generations, calculations and all the estimators were obtained using the R program. Programming details are available from the authors. 1y (), Q 2y () and Q 3y () estimators in different populations and the SRSWOR.M Figs. 1–4 represent the RE for Q and SRSWOR.P designs. These figures show the behaviour of the estimators when the sample size in the second phase increases, while the first phase sample size remains fixed. If there is a high linear correlation coefficient between y and the auxiliary variable, then all estimators are more ∗y () estimator (shown with horizontal dotted lines). The gain in relative efficiency decreases if the efficient than the Q sample size in the second phase increases. This result is logical because if the sample size in the second phase is small, ∗y () estimator will present a larger degree of error, the sample will have less information of the y variable, and the Q ∗y () obtains better while the ratio estimators are more efficient because more information is used. As n increases, Q estimator which is closer to the ratio estimator. Note that for the Fam1500 population and under SRSWOR.P sampling

2568

M. Rueda et al. / Computational Statistics & Data Analysis 51 (2007) 2559 – 2572

β=0.25

β=0.5

β=0.75

0.55 0.55

0.55

0.50

0.50

0.45

0.45

0.45

0.40

0.40

0.40

RE

0.50

0

(*)

25

50

75

100

0

25

50

75

100

1.4

1.4

1.2

1.2

1.0

1.0

0.8

0.8

0.6

0.6

1.4 RE

0

1.2

25

50

75

100

50 n

75

100

1.0 0.8 0

25

(**)

50 n

75

100

0

Estimator 1 Estimator 2 Estimator 3

25

50 n

75

100

0

25

(*) x1 is used as an auxiliary variable and x2 is used to assign probabilities. (**) x2 is used as an auxiliary variable and x1 is used to assign probabilities.

Fig. 2. RE for Fam1500 population and under SRSWOR.P sampling design. n = 150.

RE

β=0.25 0.6

0.5

0.5 0.4 0.3 0.2

0.4

0.1

0.1

0.6 0.5 0.4 0.3 0.2 0.1

0.2

25

50

75

0

100

0.6 0.5 0.4 0.3

25

50

75

100

0.4 0.3 0.2 0.1 0

25

50 n

75

Estimator 1 Estimator 2 Estimator 3

100

0

25

50

75

100

0

25

50 n

75

100

0.6 0.5 0.4 0.3 0.2 0.1

0.5

0.2 0.1

(**)

β=0.75

0.3

0

(*)

RE

β=0.5

0

25

50 n

75

100

(*) x1 is used as an auxiliary variable and x2 is used to assign probabilities. (**) x2 is used as an auxiliary variable and x1 is used to assign probabilities.

Fig. 3. RE for Counties population and under SRSWOR.M sampling design. n = 150.

M. Rueda et al. / Computational Statistics & Data Analysis 51 (2007) 2559 – 2572

β=0.25

β=0.5

β=0.75 0.12

0.02

RE

0.02

0.08

0.01

0.00 0

25

50

75

100

0.01

0.04

0.00

0.00 0

0.015

0.015

0.010

0.010

0.005

0.005

25

50

75

100

0

25

50

75

100

0

25

50 n

75

100

0.10 0.08 0.06

RE

(*)

2569

0.04 0.02

0.000 0 (**)

0.00

0.000 25

50 n

75

100

Estimator 1 Estimator 2 Estimator 3

0

25

50 n

75

100

(*) x1 is used as an auxiliary variable and x2 is used to assign probabilities. (**) x2 is used as an auxiliary variable and x1 is used to assign probabilities.

Fig. 4. RE for Counties population and under SRSWOR.P sampling design. n = 150.

design with the first phase sample size n = 150, as the second phase sample size n increases from 10 to 100, the RE shows two peaks: if n = 25 and n = 80 for  = 0.25; if n = 55 and n = 80 for  = 0.5; and if n = 60 and n = 100 for  = 0.75 . It looks that if we are estimating higher quartile then a large second phase sample size may be required so long as the efficiency of the proposed estimators is concerned. 3y () is the most efficient estimator in many cases. This is expected because this estimator is asymptotically optimum Q 2y () has very similar values and does not depend on unknown values. in the class (3). Nevertheless, the estimator Q 1y () is usually less efficient than other proposed estimators. When the linear relation between the variables is weaker, Q 1y () is even less efficient than the direct estimator, while Q 2y () and Q 3y () continue to perform better. In short, the Q use of the exponentiation estimator improves the estimates, especially if there is a weak relationship between the study and auxiliary variables. On the other hand, the Poisson method of sampling produces more efficient results in the sense of RE than the ∗y () because the direct estimator present disperses estimates under the Midzuno–Sen method and with regard to Q Poisson method caused by the heterogeneity of the inclusion probabilities. Proposed estimators are almost equivalents in the Counties population because the linear correlation coefficients are larger. In fact, the RE of the proposed estimators in this population is better than those in the Fam1500 population. Bias is another important aspect, particularly for ratio estimator that can show the underestimation or overestimation. ∗y () having the largest at 3% The RBs values in the Fam1500 population are all within a reasonable range, with the Q as seen in Fig. 5. The RBs values for the Counties population when x1 is used as an auxiliary variable and x2 is ∗y () estimator clearly leads to serious overestimation, especially used to assign probabilities are shown in Fig. 6. The Q when the sample size is small and under the SRSWOR.P sampling design, whereas the absolute RBs of the proposed estimators are less than 7% for the SRSWOR.M sampling design and less than 13% for the SRSWOR.P sampling design, 2y () estimator, which has the largest at 25%. In short, the study of the RB reveals except on small sample sizes for the Q that the proposed estimators are better than the direct estimator. Fig. 7 is an example of two-phase sampling for stratification. The proposed estimator is compared with the direct estimator if the strata are not considered. It can also be observed that the use of stratification is recommended because

2570

M. Rueda et al. / Computational Statistics & Data Analysis 51 (2007) 2559 – 2572

β=0.25

β=0.5 0.2

RB

0.6 0.4 0.2

0.1

0.0

-0.1

-0.2

-0.2

RB

0.8 0.6 0.4 0.2 0.0 -0.2 -0.4

0.0

-0.4

-0.3 0

(*)

β=0.75

25

50

75

100

0

2.5 2.0 1.5 1.0 0.5 0.0

25

50

75

0

100

1.5

1.5

1.0

1.0

25

50

75

100

25

50 n

75

100

0.5 0.5

0.0

0.0

-0.5 0

25

(**)

50 n

75

100

0

25

50 n

75

Direct estimator Estimator 1 Estimator 2 Estimator 3

100

0

(*) SRSWOR.M sampling design. (**) SRSWOR.P sampling design.

Fig. 5. RB in percent for Fam1500 population when x1 is used as an auxiliary variable and x2 is used to assign probabilities. n = 150.

RB

β=0.25

RB

β=0.75 30 25 20 15 10 5 0

30 25 20 15 10 5 0

25 20 15 10 5 0 0

(*)

25

50

75

100

25 20 15 10 5 0

0

25

50

75

100

10

0

25

50

75

100

25

50 n

75

100

10 5 0 -5 -10

5 0 -5 -10 0

(**)

β=0.5

25

50 n

75

100

0

25

50 n

Direct estimator Estimator 1 Estimator 2 Estimator 3

75

100

0

(*) SRSWOR.M sampling design. (**) SRSWOR.P sampling design.

Fig. 6. RB in percent for Counties population when x1 is used as an auxiliary variable and x2 is used to assign probabilities. The RB’s values for the direct estimator in (∗∗ ) are larger than 97.6%, 74.6% and 21.5% for  = 0.25, 0.5 and 0.75, respectively, and are omitted. n = 150.

M. Rueda et al. / Computational Statistics & Data Analysis 51 (2007) 2559 – 2572

β=0.25

β=0.5

RE

1.0 0.9

0.9

0.8

0.8

0.8

0.7

0.6

0.7

0.6

0.6

RE

β=0.75

1.0

1.0

0.4 0

(*)

2571

25

50

75

100

0.5 0

25

50

75

100

0

1.0

1.0

1.0

0.8

0.8

0.8

0.6

0.6

0.4

0.4

0.2

0.2 25

50

75

50

75

100

0.6 0.4

100

25

50

n

(**)

25

75

100

25

50

n

75

100

n

Proposed estimator using variable x1. Proposed estimator using variable x2.

(*) Fam1500 population. (**) Counties population.

Fig. 7. RE for Fam1500 and Counties populations and under ST.M sampling design. n = 150.

β=0.25

β=0.5

0.6

RB

0.4 0.2

β=0.75

0.4

0.8

0.2

0.4

0.0

0.0

-0.2

-0.2

0.0 -0.4

-0.4 0

(*)

25

50

75

100

RB

15

0

50

75

100

15

10

10

5

5

0

0 25

(**)

25

50

75 n

100

0

25

50

75

100

10 8 6 4 2 0 25

50

75 n

Direct estimator Proposed estimator

100

25

50

75

100

n (*) Fam1500 population. (**) Counties population.

Fig. 8. RB in percent for Fam1500 and Counties populations under ST.M sampling design and when the variable x1 is used. n = 150.

the estimates are more efficient, especially if the sample size in the second phase of the sample decreases. In all cases the proposed estimators show improvement over the direct estimator irrespective of the linear relationship between variables, although the gain in RE is better if this coefficient is larger. In reality, the gain in efficiency is guaranteed because the strata are well designed, i.e., the strata are homogeneous inside and heterogeneous among them.

2572

M. Rueda et al. / Computational Statistics & Data Analysis 51 (2007) 2559 – 2572

∗st () is better than Q ∗y () as can be observed in Fig. 8. The As far as the RB is concerned, the proposed estimator, Q ∗st () are less than 10%, whereas the Q ∗y () estimator leads to a weak overestimation for the Counties RBs values of Q population. In fact, the Fam1500 population produces better estimates than the Counties population in terms of RB. The estimators are showing similar behaviour when the variable x2 is used and consequently these figures are not shown. Acknowlegements The authors would like to thank the referee and the Associated Editor for their many helpful comments and suggestions. The authors are also thankful to a professional English Editor Ms. Melissa Lindsey, St. Cloud State University for editing the manuscript. This work was supported by the Spanish Ministry of Education and Science (Contract no. MTM2004-04038). References Allen, J., Singh, H.P., Singh, S., Smarandache, F., 2002. A general class of estimators of population median using two auxiliary variables in double sampling. INTERSTAT. Chambers, R.L., Dunstan, R., 1986. Estimating distribution functions from survey data. Biometrika 73, 597–604. Chen, J., Wu, C., 2002. Estimation of distribution function and quantiles using the model-calibrated pseudo-empirical likelihood method. Statist. Sin. 12, 1223–1239. Fernández, F.R., Mayor, J.A., 1994. Muestreo en Poblaciones Finitas: Curso Básico. P.P.U, Barcelona. Horvitz, D.G., Thompson, D.J., 1952. A generalization of sampling without replacement from a finite universe. J. Amer. Statist. Assoc. 47, 663–685. Kuk, A., Mak, T.K., 1989. Median estimation in the presence of auxiliary information. J. Roy. Statist. Soc. B 1, 261–269. Rao, J.N.K., Kovar, J.G., Mantel, H.J., 1990. On estimating distribution functions and quantiles from survey data using auxiliary information. Biometrika 77, 365–375. Royall, R.M., Cumberland, W.G., 1981. An empirical study of the ratio estimator and estimator of its variance. J. Amer. Statist. Assoc. 76, 66–88. Rueda, M., Arcos, A., 2001. On estimating the median from survey data using multiple auxiliary information. Metrika 54, 59–76. Rueda, M., Arcos, A., Artés, E., 1998. Quantile interval estimation in finite population using a multivariate ratio estimator. Metrika 47, 203–213. Rueda, M., Arcos, A., Martínez-Miranda, M.D., 2003. Difference estimators of quantiles in finite populations. Test 12, 481–496. Rueda, M., Arcos, A., Martínez-Miranda, M.D., Román,Y., 2004. Some improved estimators of finite population quantile using auxiliary information in sample surveys. Comput. Statist. Data Anal. 45, 825–848. Särndal, C.E., Swensson, B., Wretman, J., 1992. Model Assisted Survey Sampling. Springer, New York. Silverman, B.W., 1986. Density Estimation for Statistics and Data Analysis. Chapman & Hall, London. Singh, S., Joarder, A.H., Tracy, D.S., 2001. Median estimation using double sampling. Austral. New Zealand J. Statist. 43, 33–46. Singh, S., 2003. Advanced Sampling Theory with Applications: How Michael “Selected” Amy. Kluwer Academic Publisher, The Netherlands. Swamy, P.A.V.B., Tavlas, G.S., Chang, I.L., 2005. How stable are monetary policy rules: estimating the time-varying coefficients in monetary policy reaction function for the US. Comput. Statist. Data Anal. 49, 575–590. Valliant, R., Dorfman, A.H., Royall, R.M., 2000. Finite Population Sampling and Inference: A Prediction Approach. Wiley Series in Probability and Statistics, Survey Methodology Section. Wiley, New York.

Lihat lebih banyak...

Comentarios

Copyright © 2017 DATOSPDF Inc.