Fuzzy Gaussian Process Classification Model

June 12, 2017 | Autor: Iman El-Azab | Categoría: Image Analysis, Fuzzy Classification, Gaussian Process, Real-world Application

Share Embed

Laporkan tautan ini

Descripción

Fuzzy Gaussian Process Classification Model Eman Ahmed1 and Neamat El Gayar1,2 and Amir F. Atiya3 , Iman A. El Azab1 1

2

Faculty of Computers and Information, Cairo University, 12613 Giza, Egypt {e.ahmed, n.elgayar,i.elazab}@fci-cu.edu.eg School of Communication and Information Technology, Nile University, Giza, Egypt [email protected] 3 Faculty of Engineering, Cairo University, Giza, Egypt [email protected] Abstract. Soft labels allow a pattern to belong to multiple classes with different degrees. In many real world applications the association of a pattern to multiple classes is more realistic; to describe overlap and uncertainties in class belongingness. The objective of this work is to develop a fuzzy Gaussian process model for classification of soft labeled data. Gaussian process models have gained popularity in the recent years in classification and regression problems and are example of a flexible, probabilistic, non-parametric model with uncertainty predictions. Here we derive a fuzzy gaussain model for a two class problem and then explain how this can be extended to multiple classes. The derived model is tested on different fuzzified datasets to show that it can adopt to various classification problems. Results revealed that our model outperforms the Fuzzy K-Nearest Neighbor (F-KNN), applied on the fuzzified dataset, as well as the Gaussian process and the K-Nearest Neighbor models used with crisp labels.

Keywords: Fuzzy Classification, Gaussian Process(es), Soft labels.

1

Introduction

Dealing with vagueness is a common issue in many pattern recognition problems. Vagueness always exists in real applications where classes have no sharp boundaries, instead they overlap. Crisp labels are hard to obtain in such applications. In addition, they fail to reflect the natural grouping and uncertainties among classes. This gave rise to soft labels which allow a pattern to belong to multiple classes with different degrees. Using soft labels can be very useful in cases where the feature space has overlapping or ill-defined classes, to accommodate the uncertainty of an external teacher about certain patterns and to model the opinions of several experts [1]. Due to the arising importance for using soft labels, many classification algorithms have been adapted to handle soft labeled data. Earlier models include fuzzy MLP [2], fuzzy RBFs [3] and fuzzy KNN [4]-[5]. More recent models have also been developed [6][7][8]. Lin and Wang [6] developed a fuzzy SVM model to solve special problems like weighting the samples in time series or decreasing

the impact of outliers. The output of this model however produced hard labels and could not be put to work in case training data carries only soft labels. In an alternative attempt, Borasca et. al. [7] present a fuzzy-input fuzzy-output support vector machine technique to deal with the multi-class problem in the classification of remote sensing images. A similar, but much less experimentally computational model is also presented in [8]. This latter model was tested on the problem of fuzzy classifications of emotion in recording of spoken sentences. Further studies attempt to incorporate fuzzy labels in learning of prototype based classifiers [9][10][11][12]. In addition to the usefulness of learning using soft labels in many real world problems like speech recognition, remote sensing images and medical diagnostics [8][9][10]; several studies have reported that fuzzy approaches are clearly more robust to noise and error in the learning labels opposed to their hard (i.e. crisp )alternatives [1][12]. Motivated by the need of real world applications for learning models that accept soft labels in the training data and the fact that studies have reported the robustness of these models in noisy and uncertain learning problems; the aim of this work is to develop a new model based on Gaussian process that takes soft labels as input and produces a probabilistic output. Gaussian process is a supervised learning technique that has been used for regression and classification [13]. It is a stochastic model that governs the properties of functions and is fully specified by a mean and a covariance functions. It is based on assigning a prior in the form of a multivariate Gaussian density, that imposes a smoothness constraint on the underlying function. For the classification problem this underlying function is the posterior probability [14]. In this paper we derive a Fuzzy Gaussian Process model for learning using soft labels. We test the derived model on some benchmark data sets. We also discuss parameter selection for the derived model and discuss potential applications for the derived model in the future. The paper is organized as follows: Section 2 provides an overview on Gaussian Processes. In Section 3 the derivation of the Fuzzy Gaussian Process model is presented and some details related to parameter setting and multiclass classification are outlined. Section 4 describes the used data sets, outlines the experiments conducted and presents and discusses results. Finally, the paper is concluded in section 5.

2

Review of Gaussian Process Classification

In this section we will briefly review the basic theory behind the Gaussian Processes model in classification. This will be necessary as a foundation for our proposed Fuzzy Gaussian Processes model presented in the next section. Given a training set S of n observations, S = {(xi , yi ) |i = 1...n} where xi denotes an input vector of dimension D and yi denotes the target class of the corresponding input sample i. X refers to the matrix of all the training samples, y denotes the vector of the class labels for all the training samples and f represents the vector of the prior latent functions for all the training samples. One would

like to predict the class membership probability to a test sample x∗ . This is achieved by obtaining the distribution of the latent function of the test sample f∗ given the class memberships of the training samples. Since the underlying function corresponds to the posterior probability of class 1, the unrestricted latent function is passed through a squashing function in order to map its value into the unit interval. The Gaussian Process is specified by an a priori multivariate distribution for the latent functions of the training and testing samples. This distribution has a covariance function that ensures that the latent functions of near-by samples are closely correlated. On the other hand, their covariance decreases with increasing the distant between their data samples, this is controlled by hyper-parameters that need to be estimated. During the training phase, the mean and the covariance of the latent function are calculated for each training sample using the algorithms in [14]. The probability that the test sample belongs to class 1 is calculated as:

P (y∗ = 1|X, y, x∗ ) = =

Z

Z

∞

p(y∗ = 1, f∗ |X, y, x∗ )df∗ −∞ ∞

p(y∗ = 1|f∗ , X, y, x∗ )p(f∗ |X, y, x∗ )df∗ −∞

p(f∗ |X, y, x∗ ) =

Z

∞

p(f∗ |f, X, y)p(f|X, y)df

(1)

−∞

The sample belongs to the class with the maximum probability. Since this integration is intractable, an approximation algorithm should be used. Several approximation techniques such as Laplace Approximation, Expectation propagation and Markov Chain Monte Carlo (MCMC) have been exploited [13]. As follows we present an extension of the Gaussian process model to work with soft labels.

3

Fuzzy Gaussian Process Classification Model

A soft label m(x) of a pattern x is usually defined as a K-dimensional vector with entries in [0, 1] indicating the degree with which pattern x is a member of each class. K is the number of classes in the application at hand. Our aim here is to extend the Gaussian Process model to be able to work on soft-labeled data. As the basic Gaussian Process model is mainly concerned with discriminating between two classes, we will focus on the binary class case now. The extension to any desired number of classes will be presented later in this section. In our approach, each training sample xi ∈ S has two membership values − m+ i ,mi which indicate to what extent the sample belongs to class 1 and class -1 respectively. M is the matrix containing all the membership values of the n training samples for all the K classes.

Since in the case of the existence of soft labels, each sample has a degree of membership in each class. Hence the probability that sample i belongs to class 1 given its prior latent function can be described as follows: − P (yitrue = 1|fi ) = m+ i P (yi = 1|fi ) + mi P (yi = −1|fi )

(2)

where yitrue represents the true class membership, which is unknown. The class 1 membership value m+ represents p(yitrue = 1|yi = 1), and the class -1 membership value represents p(yitrue = 1|yi = −1). Since: + m− i = 1 − mi

(3)

P (yi = −1|fi ) = 1 − P (yi = 1|fi )

(4)

+ P (yitrue = 1|fi ) = P (yi = 1|fi )(2m+ i − 1) − mi + 1

(5)

Then,

Substituting in the equation 1 of the Gaussian Process model, we get:

P (y∗ = 1|X, y, f, M ) = " Qn # Z ∞ Z ∞ + + i=1 P (yi = 1|fi )(2mi − 1) − mi + 1 p(f|X) P (y∗ = 1|f∗ ) p(f∗ |f, X, x∗ ) dfdf∗ p(y|X) −∞ −∞ Using the activation function, we get the final model: P (y∗ = 1|X, y, f, M ) = # " Qn Z ∞ Z ∞ + + i=1 σ(yi fi )(2mi − 1) − mi + 1 p(f|X) dfdf∗ σ(y∗ f∗ ) p(f∗ |f, X, x∗ ) p(y|X) −∞ −∞ The difference to the ordinary Gaussian Processes model is that we use the membership values of the training samples. The membership values used are those of the class under investigation. So far, Fuzzy Gaussian Processes model only deals with two classes at a time. To extend this to a multi-class case where we have K classes, we are using the One-Against-All architecture. It works by building K Fuzzy Gaussian Processes model, each one is capable of separating one class c from all others. The input to the Fuzzy Gaussian Process for class c is the soft-labeled data and its membership to class c. This is repeated for all the classes and the test sample belongs to the class that resulted in the maximum probability from the K Fuzzy Gaussian Processes models.

4

Experimental Results

Experiments were performed on 2 benchmark data sets. Both data sets have natural class overlappings to some extent. Therefore we believe that expressing this data using soft labels is sensible. The first data set is the Iris data set , describing different kinds of Iris plants and consisting of 150 samples. Each sample has has four rational-number features (sepal length, sepal width, petal lenth, petal width) and is assigned one of three classifications (Iris-setosa, Iris versicolor, Iris virginica)[15] [16]. The second data set is the Cone-torus data set, which is a synthetic two dimensional data sets [5] of 400 samples. . This data set consists of 3 classes and is generated from three differently shaped distributions; where patterns from each class are not equal but are rather distributed with a frequency of 0.25, 0.25 and 0.5. The following table summarizes the details of the used datasets: Dataset Features Classes Samples Iris 4 3 150 Cone-torus 2 3 400 Table 1. Datasets

Since the data sets are crisply labeled, we use a K-nearest Neighbor approach to assign soft labels to each pattern; according to its similarity to its nearest Kneighbors. This labeling technique -opposed to Keller labeling technique [5] does not guarantee that patterns retain their true class labels if the soft labels are ”hardened” by the maximum membership rule. Refer to [1] for more details on the different labeling techniques. For fuzzifying the labels, we used K = 7 in the K-Nearest Neighbor as recommended empirically in [1] in a study on the same data sets we used. The purpose of our experiments was mainly to validate the classification power of the developed fuzzy Gaussian process (Fuzzy GP) model. We therefore chose to compare the performance of the fuzzy GP model to crisp classifiers like the crisp GP model and the traditional KNN model. The KNN classifier is popular for its simplicity to use and implementation, robustness to noisy data and its wide applicability in a lot of appealing applications. We also compare the fuzzy GP model to the FKNN classifier. We used a simple version of the FKNN [5] that is trained using soft labels. In all our experiments we used the same ”fuzzy labeled data” and trained the models using 5 fold cross validation. For the crisp models (GP and KNN) the fuzzy labels are hardened (i.e. models were trained with the class having the maximum class membership). The accuracy of the final trained models is calculated by hardening the final output of the test samples and comparing them to the hardened soft labels of the training samples. For the KNN and the FKNN classification models we used K = 3; again as determined empirically from the study in [1].

For the crisp GP and the fuzzy GP model the covariance function was set to be square exponential and its hyper-parameters are calculated by maximizing the marginal likelihood as illustrated in the algorithm in [14]. The hyper-parameters of the covariance function are the characteristic length scale l, which specifies the distance where the samples located are correlated to the test sample, and the signal variance σvar . Both parameters are initialized first and then optimized to get the best values for both hyper-parameters. We observed in our experiments that the initialization of the chracteristic length scale should not exceed the minimum average distance between samples. The results presented as follows are obtained from using initial values chosen empirically for the length scale and the signal variance. Table 2 summarizes the performance of the KNN, the FKNN, the crisp GP and the proposed fuzzy GP model. Obviously the proposed Fuzzy Gaussian Process Model outperforms the crisp models (GP and KNN) and also the fuzzy classifier FKNN. In spite the fact that experiments were only conducted on two benchmark data sets; it still can verify and demonstrate the effectiveness of the new proposed model for learning using fuzzy labels. We plan to exploit the usefulness of our model in the future for real world applications. Particularly in applications related medical diagnostics, where a clear (crisp) classification of training data may be difficult or impossible since the assignments of a patient to a certain disorder frequently can be done only in a probabilistic (fuzzy) manner.

Table 2. Results for Iris and Cone-torus Datasets Classifier Iris Cone-torus KNN (K = 3) 98.67 ± 1.82 87.75 ± 3.47 FKNN (K = 3) 99.33 ± 1.49 88.25 ± 3.01 crisp GP (l = 1,σvar = 1) 99.33 ± 1.49 89.25 ± 4.38 fuzzy GP (l = 1,σvar = 1) 100 90.00 ± 2.93

5

Conclusions and Future Work

In this work we present a new Fuzzy Gaussian Process model to deal with fuzzy labels in classification. Our proposed Fuzzy Gaussian process performed better than the standard hard-trained Gaussian process model. It has also been found to be superior to the popular K-Nearest Neighbor and Fuzzy K-Nearest Neighbor classification models. Currently, we are investigating the optimal initialization values of the covariance function hyper-parameters due to its great effect in the performance of our model. We are also planning to conduct comparisons with more crisp and fuzzy classification models as well as applying our model to applications in remote sensing, speech recognition and medical applications. We are also devising several measures of performance to compare the output based

on the fuzzy classification and not after turning it into hard labels. We believe that such measures can better demonstrate the power of models that learn with soft labels and can effectively compare them to other models.

Acknowledgment This work was supported by DFG (German Research Society) grants SCHW 623/3-2 and SCHW 623/4-2.

References 1. N. E. Gayar, F. Schwenker, and G. Palm, “A study of the robustness of knn classifiers trained using soft labels,” F. Schwenker and S. Marinai (Eds.):ANNPR 2006, LNAI 4087,Θc Springer-Verlag Berlin Heidelberg 2006, pp. 67–80, 2006. 2. S. Pal and S. Mitra, “Multilayer perceptron, fuzzy sets and classification.,” IEEE Transactions on Neural Networks, vol. 3, pp. 683–697, September 1992. 3. N. El Gayar, Fuzzy Neural Network Models for Unsupervised and Confidence-Based Learning. PhD thesis, Dept. of Comp. Sc., University of Alexandria, 1999. 4. J. Keller, M. Gray, and J. Givens, “A fuzzy k-nearest algorithm,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 7, pp. 693–699, 1985. 5. L. Kuncheva, “Fuzzy classifier design,” physica-verlag, 2000. 6. C. Lin and S. Wang, “Fuzzy support vector machines,” IEEE Transactions on Neural Networks, vol. 13, pp. 464–471, 2002. 7. B. Borasca, L. Bruzzone, L. Carlin, and M. Zusi, “A fuzzy-input fuzzy-output svm technique for classification of hyperspectral remote sensing images,” in NORSIG 2006, Reykjavk, 2006. 8. C. Thiel, S. Scherer, and F. Schwenker, “Fuzzy-input fuzzy-output one-against-all support vector machines,” Apolloni, B., Howlett, R.J., Jain, L. (eds.) KES 2007, Part III. LNCS (LNAI), vol. 4694, pp. 156–165, 2007. 9. S. Seo and K. Obermayer, “Soft learning vector quantization,” Neural Computation, vol. 15, pp. 1589–1604, 2003. 10. T. Villmann, B. Hammer, and T. Schleif Frank-M.and Geweniger, “Fuzzy labeled neural gas for fuzzy classification,” in WSOM 2005 Paris, France, pp. 283–290, September 2005. 11. T. Villmann, S. Frank-M., and B. Hammer, “Fuzzy labeled soft nearest neighbor classification with relevance learning,” in ICMLA 2005, Los Angeles, USA, pp. 11– 15, IEEE Press, December 2005. 12. C. Thiel, BrittaSonntag, and F. Schwenker, “Experiments with supervised fuzzy lvq,” L. Prevost, S. Marinai, and F. Schwenker (Eds.): ANNPR 2008, LNAI 5064, pp. 125–132, 2008. 13. H. Nickisch and C. E. Rasmussen, “Approximations for binary gaussian process classification,” Journal of Machine Learning Research, vol. 9, pp. 2035–2078, 10 2008. 14. C. E. Rasmussen and C. K. I. Williams, Gaussian Processes for Machine Learning. the MIT Press, 2006.

15. R. Fisher, “The use of multiple measurements in taxonomic problems,” Annual Eugenics, vol. 7, Part II, pp. 179–188, 1936. 16. P. Murphy and Aha, D.W. UCI repository of machine learning databases. PhD thesis, Irvine, CA: University of California, Dept. of Information and Computer Science, 1992.

This article was processed using the LATEX macro package with LLNCS style

Lihat lebih banyak...

Fuzzy Gaussian Process Classification Model

Descripción

Comentarios