Maxbias Curves for Multivariate Regression Estimators

June 13, 2017 | Autor: Stefan Van Aelst | Categoría: Multivariate Regression, Robust Regression, Breakdown Point
Share Embed


Descripción

Maxbias Curves for Multivariate Regression Estimators C. Croux1 , K. Mahieu1 , and S. Van Aelst2 1 2

Faculty of Business and Economics, K.U.Leuven, Naamsestraat 69, 3000 Leuven, Belgium Department of Applied Mathematics and Computer Science, Ghent University, Krijgslaan 281 S9, 9000 Gent, Belgium

Keywords: Breakdown point, Maxbias Curve, Multivariate Regression, Robust Regression, Robustness, Scatter Matrix

1

Introduction

A maxbias curve is a powerful tool to describe the robustness of an estimator. It tells how much an estimator can change due to a given fraction of contamination. In this paper, maxbias curves are computed for several multivariate regression estimators. A first class of them is directly based on the covariance structure of the joint distribution of carriers and responses. A second class minimizes the determinant of the scatter matrix of the residuals. We compare the two approaches, and show how their maxbias curve depends on the maxbias of the underlying scatter matrix estimators.

2

Model and functionals

In the multivariate regression model, the design variable X is p-dimensional, while the response Y is q-dimensional. The multivariate regression model is given by Y = B t X + ε,

(1)

where B ∈ Rp×q is the regression parameter. Denote H the distribution of Z = (X t , Y t )t and let T be a regression functional for estimating B. The regression functionals we consider are all affine, regression and scale equivariant. The maxbias curves will be computed at an elliptically symmetric model distribution H with location parameter µ and scatter Σ. Due to the equivariance of the regression functionals, we can take µ = 0 and Σ = Ip+q without any loss of generality. The maxbias curve of the regression functional T at the model distribution H is defined as Maxbias(ε, T ; H0 ) = sup kT ((1 − ε)H0 + εG)k, G

where the supremum is taken over all possible contaminating distributions G, and ε > 0 indicates the level of contamination. A first class of multivariate regression estimators is based on the estimation of the joint scatter matrix of carriers and responses. Take C a (p + q)-affine equivariant scatter functional, which we decompose as   CXX (H) CXY (H) C(H) = CY X (H) CY Y (H) for any H. The regression functional we consider is TC (H) = CXX (H)−1 CXY (H).

(2)

This regression functional has been considered in several papers. Maronna and Morgenthaler (1986) used M-estimators of scatter, Croux et al (2001) multivariate S-estimators of scatter, Ollila et al (2003) used rank covariance matrices, and Rousseeuw et al (2004) used the Minimum Covariance determinant as a choice for C. We derived a general expression for the maxbias curve of this class of functionals, as a function of the maxbias curve of the scatter functional C.

2

Maxbias Curves Multivariate Regression

A second class of estimators is based on the minimization of the determinant of the scatter matrix of the residuals. Let Y − B t X be the residual vector for every parameter candidate B. Then we define the regression functional as TCR (H) = argminB det C(H B ),

(3)

where H B is the distribution of Y − B t X. Also here, it is possible to obtain a general expression for the maxbias of the regression functional, as a function of the maxbias of the scatter matrix used. The obtained formula coincides with the one of Berrendero and Zamar (2001) for univariate regression, where q=1. Different types of residual based multivariate regression estimators were studied by Ben et al (2006), Van Aelst and Willems (2005), and Agullo et al (2008). In the figure below we plot the maxbias curves for C the Minimum Volume Ellipsoid (MVE) scatter functional, for q = 3, and several values of p, at a multivariate normal distribution. The maxbias of the residual based estimator TCR does not depend on p, and is lower than for the fully covariance based functional TC . This result favors the use of residual based multivariate regression estimators. MVE scatter

100

80

residual p=1 p=2 p=3

maxbias

60

40

20

0 0

0.1

0.2

0.3

0.4

0.5

epsilon

FIGURE 1. Maxbias curve for TCR (residual) and TC , for C the MVE-functional, H = N (0, Ip+q ), for p = 1, 2, 3 and q = 3.

References J. Agull´o, C. Croux, and S. Van Aelst (2008). The multivariate least-trimmed squares estimator. Journal of Multivariate Analysis, 99 , 311–338. M.G. Ben, E. Martinez, V.J. Yohai (2006). Robust estimation for the multivariate linear model based on tau-scale. Journal of Multivariate Analysis, 7, 1600-1622. J.R. Berrendero, and R.H. Zamar (2001) Maximum bias curves for robust regression with nonelliptical regressors. Annals of Statistics, 29, 224-251. C. Croux, C. Dehon, S. Van Aelst, and P. Rousseeuw (2001). Robust Estimation of the Conditional Median Function at Elliptical Models. Statistics and Probability Letters, 51, 361-368. R.A. Maronna, and S. Morgenthaler (1986) Robust regression through robust covariances. Communications in Statistics-Theory and Methods 15, 1347–1365. E. Ollila, H. Oja, and V. Koivunen (2003). Estimates of regression coefficients based on rank covariance matrix. Journal of the American Statistical Association, 97, 136–159. P.J. Rousseeuw, S. Van Aelst, K. Van Driessen, and J. Agull´o (2004). Robust multivariate regression. Technometrics, 46, 293–305. S. Van Aelst, and G. Willems (2005) Multivariate regression S-estimators for robust estimation and inference. Statistica Sinica, 15, 981–1001.

Lihat lebih banyak...

Comentarios

Copyright © 2017 DATOSPDF Inc.