Kernel-based functional principal components

July 4, 2017 | Autor: Ricardo Fraiman | Categoría: Applied Mathematics, Stochastic Process, Econometrics, Statistics
Share Embed


Descripción

Statistics & Probability Letters 48 (2000) 335 – 345

Kernel-based functional principal components  Graciela Boentea; b;∗ , Ricardo Fraimana; c a Universidad

de Buenos Aires, Argentina Argentina c Universidad de San Andrà es, Argentina b CONICET,

Received November 1998; received in revised form November 1999

Abstract In this paper, we propose kernel-based smooth estimates of the functional principal components when data are continuous trajectories of stochastic processes. Strong consistency and the asymptotic distribution are derived under mild conditions. c 2000 Elsevier Science B.V. All rights reserved

MSC: primary 62G07; 62H25 Keywords: Functional principal components; Kernel methods; Hilbert–Schmidt operators; Eigenfunctions

1. Introduction In many situations, the individual observed responses are curves rather than ÿnite-dimensional vectors as, for instance, in some growth curve models. In this context, the observable response of each individual may be modeled as a sampling path X (t; !), ! ∈ , of a stochastic process with expected value (t) and covariance function (t; s), for t, s in a ÿnite interval I. Ramsay (1982), Hart and Wehrly (1986), Rice and Silverman (1991), Ramsay and Dalzell (1991) and Fraiman and PÃerez Iribarren (1991), among others, discussed further examples and applications in this general setting. Some of these works focussed the problem of estimating the mean curve and the covariance function of the underlying process. While some others went further on and also analyzed the covariance structure through the so-called functional principal component analysis. The functional principal component problem is mainly principal components analysis when data are curves, instead of ÿnite-dimensional vectors. This problem was analyzed by Dauxois et al. (1982) where asymptotic 

This research was partially supported by Grants EX-038 and TX-49 from the Universidad de Buenos Aires, PIP #4186 from the CONICET PICT # 03-00000-00576 from ANPCYT at Buenos Aires, Argentina. ∗ Correspondence address: Departamento de Matemà atica, Facultad de Ciencias Exactas y Naturales, Ciudad Universitaria, PabellÃon 1, Buenos Aires, 1428, Argentina. E-mail address: [email protected] (G. Boente)

and

c 2000 Elsevier Science B.V. All rights reserved 0167-7152/00/$ - see front matter PII: S 0 1 6 7 - 7 1 5 2 ( 0 0 ) 0 0 0 1 4 - 6

336

G. Boente, R. Fraiman / Statistics & Probability Letters 48 (2000) 335 – 345

properties of non-smooth principal components of functional data were derived. Further analysis of this problem has been developed by Besse and Ramsay (1986), Rice and Silverman (1991), Ramsay and Dalzell (1991), Pezzulli and Silverman (1993), Silverman (1996) and Ramsay and Silverman (1997), where, in particular, smooth principal components for functional data – based on roughness penalty methods – were considered. Several examples and applications can be found in these references. In Ramsay and Silverman (1997), it is pointed out that principal components analysis of functional data is a key technique to be considered in functional analysis, in order to explore the data to see features characterizing typical functions. There, it is stated that “some indication of the complexity of the data is also required, in the sense of how many types of curves and characteristics are to be found. Principal components analysis serves these ends admirably,: : :”. They also argue for smoothness properties: “A second issue is that of regularization; for many data sets, PCA of functional data is more revealing if some type of smoothness is required to the principal components themselves”. In this paper, we propose kernel-based principal components for functional data, and studied their asymptotic properties. There are two common ways of performing smooth principal component analysis. The ÿrst is to smooth the functional data and then perform PCA. The second is to directly deÿne smoothed the principal components. This can be achieved, for example, by adding a penalty term to the sample variance and maximizing the penalized sample variance, as described in Ramsay and Silverman (1997). If a kernel-based smoothing method is used, it will be shown that both approaches are the same. On the other hand, the kernel-based approach allows to derive the asymptotic distribution of the smooth principal components, which is unknown for penalized methods as in other non-parametric settings. It is also shown that the degree of regularity of kernel-based principal components is given by that of the kernel function used. In Sections 3 and 4, strong consistency and the asymptotic distribution are derived under mild conditions. 2. Notation and background Let {X (t): t ∈ I} be a stochastic process deÿned in ( ; A; P) with continuous trajectories, zero mean and ÿnite second moment, i.e. E(X (t)) = 0

E(X 2 (t)) ¡ ∞

for t ∈ I;

(1)

where I ⊂ R is a ÿnite interval. Without loss of generality, we may assume that I = [0; 1]. We will denote by

(t; s) = E(X (t)X (s)) its covariance function, which is just the functional version of the variance–covariance matrix in the classical multivariate analysis. As in the ÿnite-dimensional case, the covariance function has an associated linear operator : L2 (0; 1) → L2 (0; 1) deÿned as Z 1

(t; s)u(s) ds ∀u ∈ L2 (0; 1): (2) ( u)(t) = 0

Throughout all the paper, we will assume that Z 1Z 1

2 (t; s) dt ds ¡ ∞: 0

(3)

0

2

Cauchy–Schwartz inequality implies that | u|2 6k k |u|2 , where |u| stands for the usual norm in the space L2 [0; 1], while k k denotes the norm in the space L2 ([0; 1] × [0; 1]). Therefore, is a self-adjoint continuous linear operator. Moreover, (3) implies that is a Hilbert–Schmidt operator.

G. Boente, R. Fraiman / Statistics & Probability Letters 48 (2000) 335 – 345

337

F will stand for the Hilbert space of such operators with inner product deÿned by (

1;

2 )F = trace(

2) =

1

∞ X

(

1 uj ;

2 uj );

j=1

where {uj : j¿1} is any orthonormal basis of L2 (0; 1) and (u; v) denotes the usual inner product in L2 (0; 1). Choosing a basis {j : j¿1} of eigenfunctions of we have that Z 1Z 1 ∞ X 2 j2 =

2 (t; s) dt ds ¡ ∞; k kF = 0

j=1

0

where {j : j¿1} are the eigenvalues of . (The last equality is a consequence of Schmidt Theorem.) As in the classical multivariate case, the linear operator can be dicult to interpret and does not always give a fully comprehensive presentation of the structure of the variability in the observed data directly. A principal component analysis provides a way of looking at covariance structure which can be much more informative and can complement the direct examination of the variance operator. So, following Dauxois et al. (1982) we deÿne the population functional principal components as follows. For any random variable, Y , deÿned through a linear combination of the process {X (s)}, i.e. Z 1 (t)X (t) dt = ( ; X ); ∈ L2 (0; 1); Y= 0

we have that var(Y ) = E(Y 2 ) =

Z

1

Z

0

1

0

(t) (t; s) (s) ds dt = ( ; ):

The ÿrst principal component is deÿned as the random variable Y0 = ( 0 ; X ) such that var(Y0 ) =

sup

{ :| |=1}

var(( ; X )) =

sup ( ; ):

{ :| |=1}

(4)

Therefore, if j ¿j+1 , Riesz Theorem (Riesz and Nagy, 1965, p. 230) entails that the solution of (4) is related to an eigenfunction associated to the largest eigenvalue of the operator , i.e., 0 = 1 and var(Y0 ) = 1 . Moreover, if Ak = { ∈ L2 (0; 1): | | = 1; ( ; i ) = 0; 16i6k − 1} and since sup var(( ; X )) = sup ( ; ) = (k ; k ) = k ;

∈Ak

∈Ak

(5)

the kth populational functional principal component is just (k ; X ). If all the eigenvalues have multiplicity one the solution is uniquely deÿned. As in the ÿnite-dimensional case, from (5) we get the following geometrical interpretation. For any ÿxed integer k, let H ⊂ L2 [0; 1] be ∗ the orthogonal projection of X on H. Then, the linear a linear subspace of dimension k and denote by XH ∗ (t)|2 ). space spanned by 1 ; : : : ; k , minimizes E(|X (t) − XH Non-smooth estimators of the eigenfunctions and eigenvalues of were considered by Dauxois et al. (1982), in a natural way through the empirical covariance operator. More precisely, let n (t; s) denote the empirical covariance function, i.e., n

n (t; s) =

1X Xi (t)Xi (s) n i=1

and

the related linear operator. Then, if Vi stands for the linear operator given by Z 1 Xi (t)Xi (s)u(s) ds; (Vi u)(t) = n

0

338

G. Boente, R. Fraiman / Statistics & Probability Letters 48 (2000) 335 – 345

we have that, for 16i6n, n 1X Vi and E(Vi ) = : n= n

(6)

i=1

Dauxois et al. (1982) deÿned non-smooth estimators of the population functional principal component k as the eigenfunction ˆ k related to the kth largest eigenvalue ˆk of the random operator n . There, they derived strong consistency results for the eigenvalues and their associated eigenmanifolds from the fact that k n − kF → 0 almost surely, which follows directly from the strong law of large numbers in the space F. √ Using the Central Limit Theorem in Hilbert spaces, they have also shown that n( n − ) converges in distribution to a zero mean Gaussian random element, U , of F with covariance operator  and derived from it the asymptotic distribution of the non-smooth estimates of the eigenvalues and of the associated eigenmanifolds of the linear operator . Smooth versions of the previous estimates have been deÿned, through roughness penalties on the sample variance or on the L2 -norm, respectively, by Rice and Silverman (1991) and by Silverman (1996), where consistency results were obtained. See also Ramsay and Dalzell (1991) and Ramsay and Silverman (1997). 3. A kernel-based smooth approach Our aim is to deÿne smooth estimates of the principal components using a kernel method. This approach is equivalent to smoothing the functional data and then performing the PCA analysis. We begin by deÿning a smooth version of the estimated covariance operator. R LetR Kh (·) = h−1 K(·=h) be a kernel nonnegative function with smoothing factor h, such that K(u) du = 1 and K 2 (u) du ¡ ∞. Given a sample {X1 (t); : : : ; Xn (t)}, 06t61, of i.i.d. trajectories of the stochastic process {X (t), 06t61} deÿne smoothed trajectories, via convolution, as Z (7) Xih (t) = Kh (t − s)Xi (s) ds for 06t61; where we extend Xi (s) as Xi (0) or Xi (1) for s ¡ 0 or s ¿ 1, respectively. Deÿne also the empirical covariance function of the smoothed trajectories n 1X Xih (t)Xih (s):

n; h (t; s) = n i=1

Remark 1. It is worthwhile noting that n; h (t; s) is a smooth version of the empirical covariance function of the current process, n (t; s), since Z Z K h (t − u; s − v) n (u; v) du dv

n; h (t; s) = with K h (t − u; s − v) = Kh (u)Kh (v). Similarly, we can deÿne the smoothed process Z Xh (t) = (Kh ∗ X )(t) = Kh (t − s)X (s) ds for and

h (t; s) = E(Xh (t)Xh (s));

which is well deÿned since Xh (t) ∈ L2 (P).

06t61

(8)

G. Boente, R. Fraiman / Statistics & Probability Letters 48 (2000) 335 – 345

339

Thus, we have that E( n; h (t; s)) = h (t; s)

and

h (t; s) = (K h ∗ )(t; s)

and so the covariance function of the smoothed process Xh (t) is just a smooth version of the covariance function of the original process. Deÿne nh as the random linear operator associated to nh , i.e. Z 1

nh (t; s)u(s) ds for u ∈ L2 (0; 1) (9) ( nh u)(t) = 0

and its expected value h deÿned by Z 1

h (t; s)u(s) ds for ( h u)(t) = 0

Then,

nh

u ∈ L2 (0; 1):

(10)

which is a smooth estimate of the covariance operator, can be written as n

nh

=

1X Vih ; n

(11)

i=1

where Vih is the linear operator deÿned through Z 1 Xih (t)Xih (s)u(s) ds: (Vih u)(t) = 0

Note that Vih has only one non-null eigenvalue, ih =|Xih |2 with related eigenfunction Xih =|Xih |2 and E(Vih )= h for 16i6n. Natural smooth estimates of the eigenfunctions and eigenvalues deÿning the functional principal components, i.e., the eigenfunctions and eigenvalues of , are then the eigenfunctions ˆ jh and the eigenvalues ˆjh of the random operator nh . Then, using a kernel approach it is equivalent to ÿrst smooth the data and then perform a principal component analysis or to consider the covariance operator related to the raw data, then smooth it and then perform the PCA. One can also think in directly smoothing the principal functions but in this approach, the orthogonality between the smooth estimates of the eigenfunctions is lost. 3.1. Smoothness properties The following lemma shows that smoothness properties are attained. More regularity conditions can be obtained from a more regular behavior of the kernel K. Lemma 1. Let ˆ jh be an eigenfunction associated to a non-null eigenvalue ˆjh of the operator nh : (a) If K is continuous; then the eigenfunction ˆ jh can be chosen to be continuous. (b) If K is Lipschitz continuous; then the eigenfunction ˆ jh can be chosen to be Lipschitz continuous. Proof. It is easy to see that there exist eigenfunctions ˆ jh satisfying Z 1 ˆ ˆ

nh (t; s)ˆ jh (s) ds = ( nh ˆ jh )(t) for all t ∈ [0; 1]: jh jh (t) = 0

(12)

E ectively, if jh (t) is an eigenfunction of nh , jh satisÿes (12) a.s. Deÿne ˆ jh (t) = ( nh jh )(t), then, ˆ jh satisÿes (12) for all t. (a) The continuity of K entails the continuity of nh and therefore, M1 = sup[0; 1]×[0; 1] | nh (t; s)| ¡ ∞. Let {tl ; l¿1} be a sequence converging to t, then nh (tl ; s) → nh (t; s) and | nh (tl ; s)ˆ jh (s)|6M1 |ˆ jh (s)|.

340

G. Boente, R. Fraiman / Statistics & Probability Letters 48 (2000) 335 – 345

Since ˆ jh (s) is integrable, applying the Dominated Convergence Theorem in (12) we get that ˆ jh (tl ) converges to ˆ jh (t). (b) Denote by C the Lipschitz constant of K, hence, we have Z 1 C 0 0 |Kh (v − s)| dv = A|t − t 0 | (13) | nh (t; s) − nh (t ; s)|6 2 |t − t |M1 h 0 for a suitable constant A. Therefore, nh is Lipschitz in each variable. Finally, using (12) and (13) we obtain Z 1 A 0 0 ˆ ˆ |ˆ jh (s)| ds; |jh (t) − jh (t )|6 |t − t | ˆjh 0 which concludes the proof. 4. Asymptotic results 4.1. Consistency In order to get the strong consistency of the eigenvalues and their associated eigenmanifolds, we have that from Propositions 2 and 4 of Dauxois et al. (1982), it will be enough to show that k

nh

− kF → 0

a:s:

(14)

In the non-smooth case, an analogous result to (14) follows directly from the Strong Law of Large Numbers in the space F. In the smooth case, we will use a Bernstein inequality for Hilbert valued random elements due to Yurinskii (1976), which we include for the sake of completeness. In a similar way, we will derive strong rates of convergence for the estimates of eigenvalues and their related eigenmanifolds. Proposition 1 (Yurinskii, 1976). Let i be independent random elements taking values on a Hilbert space F. Assume that E(i ) = 0 and that m! m Eki kF 6 b2i Am−2 for all m¿2: 2 Pn 2 Then; if Bn = i=1 b2i ; we have that

n !  −1 !

X xA

2 i ¿ xBn 62 exp −x 2 + 3:24 : (15) P

Bn i=1

F

Proposition 2. Assume that K is a non-negative kernel function with

R

K(u) du = 1 and that

m! 2 m−2 bA 2 with b2 = E(|X1 |2 ) and A ¿ 0; for m¿2. Then; for any sequence { n }n¿1 such that n = o(n=log n); we have p n k nh − h kF → 0 completely: (16) E(|X1 |2m )6

= E(Vih ) = h for all i, we have that

n

1

X

(Vih − E(Vih )) : k nh − h kF =

n

Proof. Since E(

nh )

i=1

F

G. Boente, R. Fraiman / Statistics & Probability Letters 48 (2000) 335 – 345

On the other hand, Young’s inequality entail that !2 Z 1 2 2 |Kh (t)| dt 6|Xi |2 ; kVih kF = ih = |Xih | 6|Xi |

341

(17)

0

which implies m

E(kVih − E(Vih )kF ) 6 2m+1 E(|Xi |2m ) 6 2m m!b2 Am−2 =

m! 2 m−2 bA ; 2 1 1

p √ where b1 = 2 2b, A1 = 2A and thus Bn2 = 8b2 n. Therefore, using (15) with x =  n= n , one gets

n

!

X

 p

n k nh − h kF ¿  = P (Vih − E(Vih )) ¿ xBn P

i=1

 6 2 exp−2 n n−1

F

"

A1 2 + 3:24 p n b1

#−1  

which entails (16), since n = o(n=log n). Note that no condition on h is needed. Proposition 3. Assume that E(|X1 |2 ) ¡ ∞. (a) If Rh → 0; we have that k h − kF → 0. (b) If |t|K(t) dt ¡ ∞ and | (t; u) − (t; t)|6C |t − u|; (18) p then R that n k h − kF → 0. R n h → 0 implies (c) If t 2 K(t) dt ¡ ∞; tK(t) dt = 0 and the covariance function (t; u) is continuously di erentiable and @ @ − (t; u)|u = u0 |6C |t − u0 |; (19) g(t; u0 ) = (t; u) @u @u u=t p then n h2 → 0 entails that n k h − kF → 0. Proof. (a) Since k

h

h

= E(V1h ) and

= E(V1 ), we have that

− kF = kE(V1h − V1 )kF 6E(kV1h − V1 kF ):

(20)

On the other hand, since kV1h − V1 kF 62|Xi |2 , the Dominated Convergence Theorem will entail the desired result if we show that kV1h − V1 kF → 0

a:s:

(21)

For each ! ∈ from the fact that X1h (t) = (Kh ∗ X1 )(t) and X1 (t) = X1 (t; !) is a continuous function of t, we obtain that Z 1 (X1h (t; !) − X1 (t; !))2 dt = |X1h − X1 |2 → 0 as h → 0: 0

2

Therefore, since kV1h − V1 kF = |X1h |4 + |X1 |4 − 2(X1h ; X1 )2 , the continuity of the inner product with respect to the norm entails that (21) holds for each ! ∈ .

342

G. Boente, R. Fraiman / Statistics & Probability Letters 48 (2000) 335 – 345

(b) Using that |X1h |6|X1 |, it follows easily that 2

kV1h − V1 kF = |X1h |4 + |X1 |4 − 2(X1h ; X1 )2 63|X1 |2 |X1h − X1 |2 :

(22)

Therefore, using (20), (22) and Cauchy–Schwartz inequality, we obtain that p p √ n k h − kF 6 3[E(|X1 |2 )]1=2 n [E(|X1h − X1 |2 )]1=2 ; which entails that it will be enough to show that (23) n E(|X1h − X1 |2 ) → 0: R R Denote by ˜h (t; t) = Kh (t − u) (t; u) du. Then, from (18) we get | ˜h (t; t) − (t; t)|6C h |z|K(z) d z = C1 h which entails Z 1 | ˜h (t; t) − (t; t)| dt6C1 h: (24) 0

On the other hand, using again (18) we obtain ZZ ZZ Z 1 | h (t; t) − (t; t)| dt 6 K(u)K(z)|u|h du d z + K(u)K(z)|z|h du d z: 0

6 C2 h:

(25)

Therefore, since E(X1 (t)X1h (t)) = ˜h (t; t), we have Z 1 Z 1 2 [ h (t; t) − (t; t)] dt − 2 [ ˜h (t; t) − (t; t)] dt; E(|X1h − X1 | ) = 0

0

(26)

which together with (24) and (25) entails that E(|X1h −X1 |2 )6C3 h, which concludes the proof since n h → 0. (c) As in (b) it will be R enough to show (23). From (19) and using uK(u) du = 0, similar arguments as those used in (b) led to E(|X1h − X1 |2 )6C3 h2 , which concludes the proof since n h2 → 0. Remark 2. The previous proposition provides convergence rates for the bias term. Note that (b) and (c) show that now, the trade-o is between the regularity of the covariance function and the speed at which the smoothing parameter converges to 0. Propositions 2 and 3(a) entail that no compromise between n and h is needed in order to get just consistency of the smoothed covariance operator nh . Order the eigenvalues of and nh as decreasing sequences, i.e., j ¿j+1 and ˆjh ¿ˆ( j+1); h and denote by Mj = {k¿1: k = j } and mj = #Mj , where #M stands for the number of elements of the ÿnite set M. Deÿne Pj and Pˆ jh as the projection operators associated with j and ˆjh , respectively, more precisely, X X k (t) (u; k ) and (Pˆ jh u)(t) = ˆjh ˆ kh (t)(u; ˆ kh ): (Pj u)(t) = j k∈Mj

k∈Mj

The following theorem summarizes our consistency results. Theorem 1. Let KhR(:) = h−1 K(:=h) be a non-negative kernel function with smoothing factor h → 0; such that R K(u) du = 1 and K 2 (u) du ¡ ∞. Assume that (1) and (3) hold and that for some positive constant A m! E(|X1 |2 ) Am−2 ∀m¿2: 2 Then; we have that; for each j¿1; if j has multiplicity mj ; E(|X1 |2m )6

G. Boente, R. Fraiman / Statistics & Probability Letters 48 (2000) 335 – 345

343

(a) there exists mj sequences {ˆkh ; k ∈ Mj } converging to j ; a.s. (b) Pˆ jh converges to Pj in F; a.s. (c) In particular; when mj = 1; ˆ jh converges to j in L2 [0; 1]; a.s. Remark 3. It is worthwhile noting that there is no trade-o between bias and variance. It is just needed that h → 0. In spite of the non-parametric regression setting, for functional principal components, the non-smooth solution is also consistent. Theorem 2 (Strong convergence rates). Let { n }n¿1 be a sequence; such that n = o(n=log n). Under the assumptions given in Theorem 1 and (b) or (c) of Proposition 3; we have that; for each j¿1; if j has multiplicity mj ; p (a) with probability one; there exists mj sequences; {ˆkh ; k ∈ Mj }; such that n (ˆkh − j ) converges to 0; for p any k ∈ Mj . n kPˆ jh − Pj kF → 0 a.s. (b) p (c) In particular; when mj = 1; n |ˆ jh − j | → 0 (in L2 [0; 1]); a.s. The proof of Theorem 2 follows easily from Proposition 2 and 3(b) or (c) together with the inequalities relating the distance between the corresponding eigenvalues (or projection operators) of two operators and the norm between the operators themselves, used in Proposition 2 (Proposition 3) of Dauxois et al. (1982). 4.2. Asymptotic distribution As√mentioned above, using the Central Limit Theorem in Hilbert spaces, Dauxois et al. (1982) have shown that n( n − ) converges in distribution to a zero mean Gaussian random element, U , of F with covariance operator  and derived from it the asymptotic distribution of the non-smooth estimate of the eigenvalues and of the associated manifolds of the linear operator . Therefore, in order to obtain the same result for the smooth estimate, it will suce to show that √ k n( nh − n )kF → 0 in probability: Proposition 4. If E(|X1 |4 ) ¡ ∞ and h = hn → 0; then; √ kZn kF = nk( nh − h ) − ( n − )kF → 0 in probability: Proof. Using TchÃebishev’s inequality, we obtain that 

2  n

X 1

ih  ; P(kZn kF ¿ )6 2 E 

 n i=1

(27)

F

where ih = (Vih − E(Vih )) − (Vi − E(Vi )). Since, E(ih ) = 0 and {Vih − Vi : 16i6n} are independent random operators, we have that 

2  ! n

X X

2 ih  = E (ih ; jh )F = nE(k1h kF ): (28) E 

i=1

F

i; j

2

Thus, (27) and (28) entail that P(kZn kF ¿)6(1=2 )E(k1h kF ). Therefore, the result will follow if we show that 2

E(k1h kF ) → 0:

(29)

344

G. Boente, R. Fraiman / Statistics & Probability Letters 48 (2000) 335 – 345 2

2

Since k1h kF 6kV1h −V1 kF +EkV1h −V1 kF , we have that E(k1h kF )62[E(kV1h −V1 kF )+(EkV1h −V1 kF )2 ]. On the other hand, since kV1h − V1 kF 62|Xi |2 , the Dominated Convergence Theorem entails (29) and thus, from (21), we get kV1h − V1 kF → 0 a.s. The following theorem gives the asymptotic marginal distribution of the smoothed eigenvalues and eigenmanifolds. The proof is a consequence of Proposition 3(b) or (c) (with n = n) which deals with the bias term, Proposition 4 and the results in Section 2:1 of Dauxois et al. (1982). Theorem 3. Let Kh (:) = h−1 K(:=h) be a non-negative kernel function with smoothing factor h = hn ; such R that K(u) du = 1. Assume that (1) and (3) hold and that E(|X1 |4 ) ¡ ∞. Let U be a zero mean Gaussian random element of F with covariance operator . Then; if assumptions given in (b) or (c) of Proposition 3 hold √ for n = n; we have that; for each j¿1 (a) n(Pˆ jh − Pj ) converges in distribution in F to the zero mean Gaussian random element Wj UP j + Pj UW j ; where Wj stands for the linear operator (Wj u)(t) =

X k6∈Mj

1 k (t)(u; k ): k − j

√ (b) ( n(ˆkh − k ))k∈Mj converges in distribution to the distribution of the decreasing ordered eigenvalues of Pj UP j . Moreover; its joint asymptotic density is given by # mj X tl2 Y (tk − tl ); fmj (t1 ; : : : ; tmj ) = C exp − 4j2 k¡l l=1 "

where C −1 = 2mj (mj +3)=4

mj   Y l=1

mj +

1−l 2



m (mj +1)=2

j j

 ;

R +∞ and (p) = 0 xp e−x d x denotes the real function Gamma. In particular; if mj = 1; √ (c) n(ˆjh − j ) converges in distribution to a normal distribution with zero mean and variance 2j2 as in the √ ÿnite-dimensional case. (d) n(ˆ jh − j ) converges in distribution to a zero mean Gaussian random function in L2 [0; 1]; with covariance function (t; s) given by (t; s) = j

X k 6= j

k k (t)j (s): (j − k )2

Remark 4. Theorem 3 shows that in this problem a root-n rate of convergence is attained, although some smoothing is done. Note that also, when estimating the cumulative distribution function through as smoothed empirical distribution a root-n speed of convergence is obtained. As in that case, non-smooth estimates are consistent unlike the case of the classical density estimation. Nevertheless, the problem of bandwidth selection, which is not addressed in this paper, is crucial. In Pezzulli and Silverman (1993) there is some discussion of whether (using roughness penalty methods) smoothing will actually help, from a second order study of the mean square error.

G. Boente, R. Fraiman / Statistics & Probability Letters 48 (2000) 335 – 345

345

Acknowledgements The authors would like to thank an anonymous referee by his=her valuable suggestions which improved the presentation of the paper. References Besse, P., Ramsay, J.O., 1986. Principal component analysis of sampled functions. Psychometrika 51, 285–311. Dauxois, J., Pousse, A., Romain, Y., 1982. Asymptotic theory for the principal component analysis of a vector random function: some applications to statistical inference. J. Multivariate Anal. 12, 136–154. Fraiman, R., PÃerez Iribarren, G., 1991. Nonparametric regression estimation in models with weak error’s structure. J. Multivariate Anal. 21, 180–196. Hart, J.D., Wehrly, T.E., 1986. Kernel regression estimation using repeated measurements data. J. Amer. Statist. Assoc. 81, 1080–1088. Pezzulli, S.D., Silverman, B.W., 1993. Some properties of smoothed principal components analysis for functional data. Comput. Statist. Data Anal. 8, 1–16. Ramsay, J.O., 1982. When the data are functions. Psychometrika 47, 379–396. Ramsay, J.O., Dalzell, C.J., 1991. Some tools for functional data analysis (with discussion). J. Roy. Statist. Soc. Ser. B 53, 539–572. Ramsay, J.O., Silverman, B.W., 1997. Functional Data Analysis. Springer Series in Statistics. Springer, New York. Rice, J., Silverman, B.W., 1991. Estimating the mean and covariance structure nonparametrically when the data are curves. J. Roy. Statist. Soc. Ser. B 53, 233–243. Riesz, F., Nagy, B., 1965. Lecons d’analyse functionelle. Gauthiers-Villars, Paris. Silverman, B.W., 1996. Smoothed functional principal components analysis by choice of norm. Ann. Statist. 24, 1–24. Yurinskii, V.V., 1976. Exponential inequalities for sums of random vectors. J. Multivariate Anal. 6, 473– 499.

Lihat lebih banyak...

Comentarios

Copyright © 2017 DATOSPDF Inc.