Dynamic competitive probabilistic principal components analysis

Share Embed


Descripción

Automatic model selection by cross-validation for Probabilistic PCA

Ezequiel López-Rubio, Juan Miguel Ortiz-de-Lazcano-Lobato School of Computer Engineering. University of Málaga. Bulevar Louis Pasteur, 35. 29071 Málaga. SPAIN Phone: (+34) 95 213 71 55 Fax: (+34) 95 213 13 97 {ezeqlr,jmortiz}@lcc.uma.es

Abstract. The Mixture of Probabilistic Principal Components Analyzers (MPPCA) is a multivariate analysis technique which defines a Gaussian probabilistic model at each unit. The numbers of units and principal directions in each unit are not learned in the original approach. Variational Bayesian approaches have been proposed for this purpose, which rely on assumptions on the probability distributions of the MPPCA parameters. Here we present a different way to solve this problem, where crossvalidation and simulated annealing are combined to guide the search for an optimal model selection, providing a structured strategy to escape from suboptimal configurations. This allows to learn the model architecture without the need of any assumptions other than those of the basic PPCA framework. Experimental results are presented, which show the probability density estimation and missing value imputation features of the proposal.

1

Keywords.

Probabilistic

Principal

Components

Analysis

(PPCA),

dimensionality reduction, cross-validation, simulated annealing, missing value imputation, probability density estimation.

1 Introduction

The original Mixtures of Probabilistic PCA (MPPCA) models [29] do not address the problem of selecting the optimal number of units (mixture components) M nor the number of basis vectors qi for each unit i. This problem has been studied in the context of global PCA [6]. It has been also considered in the context of Bayesian PCA ([13], [18], [23]). Other strategies include split and merge procedures [30] and Markov chain Monte Carlo algorithms [33]. The basic MPPCA framework obtains the optimal model parameters  that maximize the data likelihood p( t | ). This maximum likelihood (ML) strategy fails to take into account the problem of model complexity, since more complex models are not penalized, and this produces overfitting. The Variational Bayesian PPCA ([7], [13]) treats the model paramaters  as random variables, and averages over the range of models they define. So, the data evidence p( t ) is used in addition to data likelihoods p( t | ). However, these averages produce integrals that are analytically intractable, and we end up with various more or less exact approximations. Moreover, since the model parameters  are now random variables, the above mentioned process of averaging requires the assumption of a probability model for . Hence we get an approximation of a probability model of the parameters of a probability model of the real data. This means that Variational Bayes adds an extra level of abstraction over the basic MPPCA model, which may move it away from the real input probability distribution. 2

We propose to avoid these approximations by using a data-driven strategy such as the cross-validation (CV), which has been considered by Miloslavsky & van der Laan [20] to select the number of mixture components for univariate Gaussian mixtures. It has also been used by Krzanowski [17] and Scarponi et al. [26] to select the number of principal components in global PCA, and it has been tuned to withstand the effects of outlying observations [15]. A survey of CV proposals for global PCA can be found in [9]. Here our aim is to provide an alternative solution to the MPPCA optimization problem without imposing a probability model to its parameters. It has been recently proven [31] that likelihood based CV procedures are asymptotically optimal in the sense that the CV selector performs asymptotically as well (w.r.t. to the Kullback-Leibler distance to the true density) as an optimal benchmark model selector which depends on the true density. A similar result was obtained by Györfi et al. for the squared error loss function [14]. Our proposal falls into Pavlic & van der Laan’s category of model selectors [25], so it has this important property. Nevertheless, as the range of competing models actually compared by CV is always finite, in practice there is the danger of getting stuck in a model which is not the global minimum of the validation set likelihood. This would happen if all the models compared by CV are worse than the current model, but there are better models which have not been included in the comparison. To overcome this problem we propose the use of simulated annealing as a global optimization method, which guarantees asymptotical convergence to the global minimum of the validation set likelihood. This dynamic global search makes our proposal completely different from the previous usages of CV discussed above, where CV is used to select among a fixed (static) set of options. Furthermore, we develop procedures for computing the optimal parameters of the PPCA models quickly by taking advantage from the already considered models. The simulated annealing 3

search space is discrete, since it comprises the possible numbers of units and numbers of basis vectors in each unit, which is a considerable advantage. A related search technique such as deterministic annealing has been used for incremental model refinement by Meinicke & Ritter [21], who use two objective likelihood functions in turn to control the growth of an ever-expanding MPPCA model. This approach differs significantly from ours, which only uses one likelihood function and shrinks the model when necessary. The outline of the paper is as follows. Section 2 is devoted to the original MPPCA model. In Section 3 we present the new learning scheme, called Dynamic Mixtures of Probabilistic Principal Components Analyzers (DMPPCA). The application of this proposal to the missing value problem is considered in Section 4. Section 5 is devoted to computational experiments. Finally, conclusions are presented in Section 6.

2 The MPPCA model

2.1 Mixture model Each unit i of the mixture stores a PPCA model [7] to perform a dimensionality reduction from the observed (input) space of dimension d to the latent (reduced) subspace of dimension qi, with qi0 is arbitrarily small. Finally, the EM algorithm is applied to yield the optimum parameters for the training set likelihood, as in subsections 3.1 and 3.2.

3.4 Model selection To evaluate a possible change in the number of basis vectors (subsections 3.1 and 3.2) or the number of units (subsection 3.3), we use the criterion of the minimum negative log-likelihood L measured over the validation set,

  L   ln pˆ t n    ln   i pˆ t n | i  n n  i  9

(13)

which is guaranteed to lead to the optimal selection as the number of samples grows, as proven in subsection 3.6. Since the validation set is fixed in a practical setup, the log-likelihood L is a deterministic function of the MPPCA model to be evaluated. Hence, we have a deterministic minimization problem in the configuration space of possible numbers of units and possible numbers of principal directions of each unit. If we could evaluate L for all the points of this space, we would obtain the best possible MPPCA model (as seen in subsection 3.6). But this is impossible, since this space is infinite. So we have to use an optimization algorithm to guide our search in this configuration space. Any standard global optimization technique could be applied here, such as evolutionary algorithms [11] or ant colonies [10].

We have selected simulated annealing [16]

because of its desirable property of asymptotic convergence to the global optimum. Other global search algorithms will be tested in future work. Hence, if the current model and the new model have validation log-likelihoods Lold and Lnew, respectively, the change to the new model is always accepted if Lnew
Lihat lebih banyak...

Comentarios

Copyright © 2017 DATOSPDF Inc.