A granular, parametric KNN classifier

June 23, 2017 | Autor: V. Tsoukalas | Categoría: Classification, Granular Computing

Descripción

A Granular, Parametric KNN Classifier Vassilis Th. Tsoukalas

Vassilis G. Kaburlasos

Christos Skourlas

Human-Machines Interaction (HMI) Lab TEI of Kavala Kavala 65404, Greece

HMI Lab Dept. of Industrial Informatics TEI of Kavala Kavala 65404, Greece

InfoDat KM Lab Dept. of Informatics TEI of Athens Athens 12210, Greece

[email protected]

[email protected]

ABSTRACT This work presents a granular K Nearest Neighbor, or grKNN for short, classifier in the metric lattice of Intervals’ Numbers (INs). An IN here represents a population of numeric data samples. We detail how the grKNN classifier can be parameterized towards optimizing it. The capacity of a preliminary grKNN classifier is demonstrated, comparatively, in four benchmark classification problems. The far-reaching potential of the proposed classification scheme is discussed.

Categories and Subject Descriptors I.5.2 [Pattern Recognition]: Design Methodology—Classifier design and evaluation

General Terms Algorithms, Design, Experimentation

Keywords Classification, granular computing, interval’s number (IN), k nearest neighbor (KNN), lattice computing (LC)

1.

INTRODUCTION

Pattern recognition and machine learning are topics of high interest in practical applocations [3, 4]. In the aforementioned context, this work introduces a parametric, granular K Nearest Neighbor, or grKNN for short, classifier in the metric lattice of Intervals’ Numbers of INs for short. An Intervals’ Number, or IN for short, is a mathematical object that may represent either a fuzzy interval or a distribution of numeric data samples. The space of INs has been studied in the literature. In particular, it has been shown that the set F of INs is a metric lattice with cardinality ℵ1 , where “ℵ1 ” is the cardinality of the set R of real numbers; the space F is a cone in a linear space; furthermore, non-linear transformations of INs can be pursued [8]. An IN may be regarded as an information granule for dealing with vagueness. Recall that the term (information) Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. PCI 2013, September 19 - 21, 2013, Thessaloniki, Greece Copyright 2013 ACM 978-1-4503-1969-0/13/09 ...$15.00.

[email protected]

granule was introduced in fuzzy set theory to denote a clump of values drawn together by indistinguishability, similarity, proximity or functionality. Computing with granules, namely Granular Computing [11], is important in practice because granules may accommodate ambiguity and/or uncertainty. The techniques described here fall within the framework of Lattice Computing (LC), which has been defined lately as “an evolving collection of tools and methodologies that process lattice ordered data including logic values, numbers, sets, symbols, graphs, etc” [6, 7, 8, 10]. Different versions of the grKNN classifier have already been presented in the literature with emphasis on specific applications including sugar production prediction [5, 13] and human fingerprint recognition [10]. This work emphasizes sound mathematical notation including new mathematical results towards casting a pattern recognition problem as a parameter-optimization problem; furthermore, the applicability of the proposed grKNN classifier is demonstrated here in four benchmark classification problems. This work is organized as follows. Section 2 summarizes the mathematical background including INs. Section 3 presents algorithm CALCIN for inducing an IN from a population of data samples. Section 4 describes the proposed KNN classifier. Section 5 presents, comparatively, computational experiments and results. Section 6 concludes by both summarizing our contribution and discussing future work extensions. The Appendix includes proofs of propositions.

2.

THE MATHEMATICAL BACKGROUND

This section summarizes mathematical definitions and results regarding Intervals’ Numbers, or INs for short [8].

2.1

General

A binary relation v on a set P is a partial order iff it satisfies the following three conditions: x v x (reflexivity), x v y and y v x ⇒ x = y (antisymmetry), and x v y and y v z ⇒ x v z (transitivity). In this case (P, v) is called a partially ordered set, or poset for short. A lattice is a poset (L, v) with the additional property that any two elements x, y ∈ L have both a greatest lower bound, namely infimum, denoted by x u y and a least upper bound, namely supremum, denoted by x t y. If in a lattice (L, v) every (x, y) pair satisfies either x v y or x = y then we say that lattice (L, v) is totally-ordered. A lattice (L, v) is called complete iff each of its subsets X has both an infimum and a supremum in L (hence, taking X = L, we see that a complete lattice has both a least element and a greatest element denoted by O and I, respectively) [2].

We will deal with a reference set R = R∪{−∞, ∞}, namely set of extended real numbers. It turns out that (R, ≤) is a complete- as well as a totally-ordered lattice, where ≤ is the “usual” order relation of real numbers. There are both a least element, denoted by “−∞”, and a greatest element, denoted by “∞”, in (R, ≤); moreover, the corresponding infimum and supremum operators are denoted by ∧ and ∨, respectively. Definition 1. Given a1 , a2 ∈ R, a1 ≤ a2 , an interval A = [a1 , a2 ] is defined as [a1 , a2 ] = {x : x ∈ R and a1 ≤ x ≤ a2 }.

The structure (I, ⊆) is an ordered set. In fact it is a lattice, as shown by the following well-known theorem. Theorem 1. The structure (I, ⊆) is a complete lattice with respect to the ⊆ order (i.e. set theoretic inclusion). The least element of I is ∅; the greatest element of I is R = [−∞, ∞]. Given nonempty intervals A = [a1 , a2 ] and B = [b1 , b2 ], their infimum and supremum in I are given by .

A ∩ B = [a1 ∨ b1 , a2 ∧ b2 ] and A ∪ B = [a1 ∧ b1 , a2 ∨ b2 ].

Fuzzy Intervals

A fuzzy subset F of R is identical to its membership function F : R → [0, 1]. The number F (x) denotes the degree to which x belongs to F . A partial order can be defined for fuzzy subsets as follows: F ≤ G ⇔ (∀x : F (x) ≤ G(x))

It is known that the set F0 of fuzzy intervals, equipped with the fuzzy sets order ≤, is a complete lattice. Theorem 3. The structure (F0 , ≤) is a complete lattice. The infimum operation is ∧ as defined in (2). The supre. mum operation (∨) is defined as .

F ∨ G = inf{H : H ∈ F0 , F ≤ H, G ≤ H}. .

The empty set (∅) is also considered an interval, i.e. the empty interval ; we denote the collection of intervals of R (including the empty interval) by I(R), or simply by I 1 .

2.2

Definition 2. A fuzzy interval is a fuzzy subset F whose every h-cut is a closed interval: (∀h ∈ [0, 1] : Fh ∈ I). We denote the set of all fuzzy intervals by F0 .

(1)

It is easy to check that the infimum and supremum of two fuzzy sets F , G is a fuzzy set denoted by F ∧ G and F ∨ G, respectively, and defined for every x ∈ R as follows: (F ∧ G)(x) = F (x) ∧ G(x)

(2)

(F ∨ G)(x) = F (x) ∨ G(x)

(3)

2

Given a fuzzy subset F , the h-cut of F is the set Fh = {x : F (x) ≥ h}. It is well known that a fuzzy subset is fully determined by the family of its h-cuts (i.e., {Fh }h∈[0,1] ) as shown next. Theorem 2. For a fuzzy membership function F it is: (∀h ∈ [0, 1] : Fh = Gh ) ⇔ (∀x : F (x) = G(x)). Consider fuzzy intervals or, equivalently, fuzzy numbers. 1 The empty interval may be denoted as [a1 , a2 ] with any a1 , a2 such that a1 > a2 . In particular, the empty interval, in I(R), is typically represented by [∞, −∞]. 2 We use the term “h-cut” instead of the (equivalent) term “α-cut” used in the literature for fuzzy sets. The rationale for such a new term stems from the two different interpretations for an Intervals’ Number (IN) [8].

That is, F ∨ G is the smallest fuzzy interval which is greater than both F and G.

2.3

Intervals’ Numbers (INs)

We now introduce Intervals’ Numbers, or INs for short. Definition 3. An Intervals’ Number, or IN for short, is a function F : [0, 1] → I which satisfies h1 ≥ h2 ⇒ Fh1 ⊆ Fh2 , and ∀X ⊆ [0, 1] : ∩ Fh = F∨X . h∈X

We denote the class of all INs by F. The following theorem shows that we can associate every IN to a fuzzy interval. e as Theorem 4. Given IN E ∈ F, define a fuzzy set E e ∀x : E(x) = sup{h : x ∈ Eh }. e are denoted by E eh and, by definition, satisfy: The h-cuts of E eh = {x : E(x) e ∀h ∈ [0, 1] : E ≥ h}. Then, for all h ∈ [0, 1] eh = Eh . Hence, E e is a fuzzy interval. we have E e are the interIn other words, the h-cuts of the fuzzy set E vals of the original IN E. Hence, Theorem 4 indicates two equivalent representations for an IN, namely the intervalrepresentation and the membership-function-representation (Fig.1). An advantage of the interval-representation is that it enables useful algebraic operations, whereas an advantage of the membership-function-representation is that it enables convenient (e.g., fuzzy logic, etc) interpretations. Just like fuzzy intervals are equipped with a partial order ≤, INs are equipped with a partial order ¹ as follows. Definition 4. For every pair F, G ∈ F we define the relationship ¹ as follows: F ¹ G ⇔ (∀h ∈ [0, 1] : Fh ⊆ Gh ). The isomorphism of (F0 , ≤) and (F, ¹) is a consequence of the following theorem. Theorem 5. For all F, G ∈ F we have F ¹ G ⇔ (∀h ∈ [0, 1] : Fh ⊆ Gh ) ⇔ (∀x ∈ R : F (x) ≤ G(x)) The relation ¹ is a lattice order and the lattice (F, ¹) of INs is complete. If we denote the. infimum operation by f and the supremum operation by g, then ∀h ∈ [0, 1] we have .

.

(F f G)h = Fh ∩ Gh and (F g G)h = Fh ∪ Gh

2.5

Our interest here is in defining (metric) distances between finite sets of elements in a metric space (M, d), where M is a set and d : M × M → R+ 0 is a metric. First, we consider the Hausdorff metric dH , which turns the set of non-empty compact subsets of a metric space into a metric space in its own right. In particular, given two different (finite) sets A = {A1 , ..., AM } and B = {B1 , ..., BN } of elements in a metric space (M, d) the Hausdorff metric dH : M × M → R+ 0 is defined as follows

(a)

_^ _^ dH (A, B) = max{ d(Ai , Bj ), d(Bj , Ai )}

Distances in a Metric Space Power-set

i

(b) dmin (A, B) = min{

Figure 1: Two equivalent representations for an Intervals’ Number (IN) E include (a) The intervalrepresentation E = Eh , h ∈ [0, 1], and (b) The membership-function-representation E = E(x), x ∈ R.

_^ i

dK (A, B) =

_^ i

2.4

Metric Functions in the lattice of INs

A metric is a function d : S × S → R+ 0 which satisfies the following three conditions for any x, y, z ∈ S:

(C2) d(x, y) = d(y, x) – Symmetry. (C3) d(x, z) ≤ d(x, y) + d(y, z) – Triangle Inequality. The following two functions will be useful towards defining metric functions in lattice (F, ¹).

j

d(Ai , Bj ) +

_^ d(Bj , Ai )} j

_^ j

(8)

i

d(Bj , Ai )

(9)

i

The proof of Proposition 1 is given in the Appendix. It remains to (dis)prove metric function condition (C3) in a future work. In other words, it remains to show whether the distance function dK (., .) is metric or not. Distances dH , dmin and dK will be engaged below in the metric lattice (F, dF ) of INs.

Based on the two functions v and θ that satisfy properties A1-A2 above, a metric function dI : I × I → R+ 0 is defined as follows dI ([a, b], [c, d]) = v(θ(a ∧ c)) − v(θ(a ∨ c)) + v(b ∨ d) − v(b ∧ d) (4) A metric function dF : F × F → R+ 0 is defined as Z1 dF (F, G) =

dI (Fh , Gh )dh

(5)

0

The k-Minkowski metric dk : FN × FN → R+ 0 is defined "

N X

# k1 dkF (Fi , Gi )

,

Optimization

The next Proposition can be useful towards optimization.

A2 A strictly decreasing function θ : R → R.

d (F, G) =

j

d(Ai , Bj ),

On one hand, distance dmin is not metric since equality dmin (A, B) = 0 does not necessarily imply A = B as demonstrated for A = {A1 } and B = {A1 , B2 }. On the other hand, regarding distance dK , consider the following Proposition.

2.6

A1 A strictly increasing function v : R → R.

k

(7)

i

Proposition 1. The distance function dK (., .) satisfies both conditions (C1) and (C2) of a metric function.

(C1) d(x, y) = 0 ⇔ x = y.

as

j

Second, another two distances dmin : M × M → R+ 0 and dK : M × M → R+ 0 , respectively, are proposed here next.

j

(6)

i=1

where F = (F1 , . . . FN ), G = (G1 , . . . GN ), k ∈ {1, 2, ...∞}.

Proposition 2. Assume n positive numbers x1 , x2 , . . . , xn such that x1 +x2 +· · ·+xn = 1. Then, the objective function (product) x1 x2 . . . xn is maximized for x1 = x2 = · · · = xn . The proof of Proposition 2 is given in the Appendix.

3.

INDUCING INS FROM DATA

An algorithm, namely CALCIN, for inducing an IN from a population of data samples [5, 10] is described next. → → Consider a vector − x of data samples, e.g. − x = (x1 , . . . , xn ). − → Two entries xi , xj of the vector x are called successive iff there is no other entry xk , k ∈ {1, . . . , n} such that xi ∧ xj < xk < xi ∨ xj , where ∧ and ∨ are the min and max operators, respectively. A strictly-increasing, cumulative real function c : R → − → R+ 0 is computed from vector x by, first, defining c(xi ) =

1 |{xj : xj ≤ xi }|, where i, j ∈ {1, . . . , n} n

(10)

and |S| denotes the cardinality of the set S. Finally, function c : R → R+ 0 is defined by straight-line connecting two points

Table 1: Number of partitions of n real numbers x1 < · · · < xn to N nonoverlapping (partition) parts N Number of partitions 2 n−1 n−1 P (n − k) 3 k=2 µ ¶ n−3 P P n−1−i 4 (n − k − i) i=1 Ã µ k=2 −i ¶! n−4 P n−3−i P 2 n−1−i P1 2 5 (n − k − i1 − i2 ) i2 =1

i1 =1

k=2

(xi , c(xi )) and (xj , c(xj )), where xi , xj are successive entries → of vector − x . Obviously, there is a unique real number x0.5 such that c(x0.5 ) = 0.5. In conclusion, an IN is calculated from function c(.) such that for values less-than or equal-to x0.5 the corresponding IN envelope function equals 2c(x), whereas for values larger than x0.5 the corresponding IN envelope function equals 2(1−c(x)). The previous concludes the description of algorithm CALCIN. Fig.2 demonstrates the application of algorithm CALCIN as explained in the following. Fig.2(a) displays a series of data samples over the range [0, 10]. Fig.2(b) displays the cumulative function c(x) computed from all the data in Fig.2(a); moreover, Fig.2(c) displays the corresponding IN envelope function E. Fig.2(d) demonstrates the computation of two cumulative functions c1 (x) and c2 (x) corresponding to two data clusters over the ranges [0, 4] and [4, 10], respectively; that is, in Fig.2(d) we have partitioned the data samples of Fig.2(a) in two (different) clusters over the ranges [0, 4] and [4, 10], respectively. Fig.2(e) displays the corresponding two INs envelope functions E1 and E2. Given n different real data samples x1 < x2 < · · · < xn the idea is, first, to partition them and, second, to engage algorithm CALCIN in order to compute one IN per partition part. Recall that the number of partitions of a finite set {x1 , x2 . . . , xn } is given by the very rapidly increasing Bell number Bn [14], where Bn > 2n−2 for n ≥ 2. Since our interest here is in inducing nonoverlapping INs, we have considered solely nonoverlapping (partition) parts; that is, all the data in a partition part are either (strictly) larger or (strictly) smaller than all the data in another partition part. Table 1 displays the number of partitions of n different real numbers x1 < x2 < · · · < xn to N nonoverlapping parts. In its interval-representation, we typically represent an IN envelope by 32 equally spaced intervals from h = 0 to h = 1 included. Hence, any population of data samples in vector − → x can be represented by 32 intervals. This is, potentially, a very useful property for the IN representation because it → provides a fixed size representation for any data vector − x length. In other words, a fixed-size IN can represent any data population including all order data statistics [6, 7, 10].

4.

Algorithm 1 grKNN Training 1: Assume the training data in N dimensions partitioned in N classes; 2: for k = 1 to k = N classes do 3: for j = 1 to j = N dimensions do Induce clusters from the training data, per j, per k; 4: 5: Use algorithm CALCIN to represent a cluster by an IN; 6: end for 7: end for

THE PROPOSED KNN CLASSIFIER

A version of the grKNN classifier is presented in this section by Algorithm 1 (training) and Algorithm 2 (testing). At the end of training (Algorithm 1) a class k ∈ {1, . . . , } is represented by a set {Fk } of N dimensional INs. In particular, in the j th dimension of class k appears a set {Fk,j } of INs. The details of computing the set {Fk,j } are applicationspecific as explained below.

Algorithm 2 grKNN Testing 1: Assume N testingData testing data vectors. Let {Fk } denote a set of N dimensional INs (clusters) representing class k; moreover, let {Fk,j } denote the set of clusters (INs) in the j th dimension for class k. 2: for i = 1 to i = N testingData do 3: for k = 1 to k = N classes do 4: for j = 1 to j = N dimensions do 5: dj = d(Xi,j , {Fk,j }); 6: end for 7: Compute the distance d(Xi , {Fk }) from the dj ’s; 8: end for . 9: Let the class of Xi be J = argmin{d(Xi , {Fk })}; 10: end for

k

The complexity of Algorithms 1 and 2 is determined by their corresponding loops. More specifically, the complexity of Algorithm 1 equals O(N clases × N dimensions); moreover, the complexity of Algorithm 2 equals O(N testingData× N classes × N dimensions). Note that the computation of a distance in the inner loop of Algorithm 2 takes a constant time; whereas, the induction of clusters in the inner loop of Algorithm 1 also depends on the number N trainingData of the training data.

5.

EXPERIMENTS AND RESULTS

We considered four benchmark classification problems from the UCI Repository at “http://archive.ics.uci.edu/ml/”. In particular, we considered the “Iris”, “Wine”, “Glass” and “Ionosphere” benchmark datasets, where each dataset includes numeric data instances without missing data values. In summary, first, the “Iris” dataset includes 150 instances (with 4 attributes per instance) partitioned in 3 classes; second, the “Wine” dataset includes 178 instances (with 13 attributes per instance) partitioned in 3 classes; third, the “Glass” dataset includes 163 instances (with 9 attributes per instance) partitioned in 2 classes, namely float and nonfloat, respectively; fourth, the “Ionosphere” dataset includes 351 instances (with 34 attributes per instance) partitioned in 2 classes. Only the latter dataset (“Ionosphere”) explicitly specifies both a training dataset and a testing dataset including 200 and 151 instances, respectively. Therefore, when a training/testing dataset was not given explicitly, in our computational experiments we engaged a randomly selected 75% of the data instances for training, whereas the remaining 25% data instances were used for testing. Tables 2, 3, 4 and 5 display results reported in the literature [12] by alternative classifiers on the “Iris”, “Wine”, “Glass” and “Ionosphere” benchmark datasets, respectively.

1

0 0

1

2

3

4

5

6

7

8

9

10

(a) 1

1

c1

c

0.5

0.5

0 0

c2

1

2

3

4.347 5

6

7

8

9

10

0 0

1

2

3

4

(b)

6

7

8

9

10

9

10

(d)

E

1

5

E1

1

2(1-c1)

2c1

2c

E2 2(1-c2)

2c2

2(1-c)

0 0

1

2

3

4.347 5

6

7

8

9

10

(c)

0 0

1

2

3

4

5

6

7

8

(e)

Figure 2: (a) A series of data samples. (b) The corresponding (single) cumulative function c. (c) Computation of IN E from function c above. (d) Two different cumulative functions c1 and c2 ; i.e., one cumulative function per different data cluster. (e) Computation of INs E1 and E2 from functions c1 and c2 , respectively.

Table 2: Testing-data accuracy by various classifiers regarding the Iris benchmark dataset Classifier Name Accuracy (%) SLF 95.84 Back Propagation 94.67 Min-max neural net for clustering 92.67 CMOC 92.00

Table 3: Testing-data accuracy by various classifiers regarding the Wine benchmark dataset Classifier Name Accuracy (%) Rprop 100.00 RDA 100.00 QDA 99.40 LDA 98.90 1NN 96.10 CLUSTER 84.09

Table 4: Testing-data accuracy by various classifiers regarding the Glass benchmark dataset Classifier Name Accuracy (%) Rprop 95.23 FASTCLUS 89.25 Nearest Neighbor 82.82 Beagle 82.20 Discriminant Analysis 73.62

Table 5: Testing-data accuracy by various classifiers regarding the Ionosphere benchmark dataset Classifier Name Accuracy (%) IB3 96.70 Rprop 96.23 Backpropagation 96.00 C4 94.00 Multiresolution Algorithm 92.87 Nearest Neighbor 92.10 Nonlinear Perceptron 92.00 Linear Perceptron 90.70

In this preliminary work, using Algorithm 1 for training, we located clusters per data dimension per class according to the following heuristic. First, we sorted the differences of all successive training data in a descending order. Second, we considered up to the four largest differences conditioned upon a (standard deviation based) user-defined threshold. Finally, we considered the training data clusters among the aforementioned (largest) differences. In all cases, we also assumed a single cluster for all the training data. In conclusion, we represented a training data cluster by an IN. The application of Algorithm 2 for testing is straightforward. Table 6 reports the best classification results using the proposed grKNN classifier for three different distance functions, namely dmin , dK and dH , respectively. In all cases, both functions v(x) = x and θ(x) = −x have been used per data dimension. Our computational experiments have demonstrated that

Table 6: Best (non-optimized) grKNN classifier testing data accuracy (%) in four datasets using the three different distance functions dmin , dH and dK Distance Iris Wine Glass Ionosphere function dataset dataset dataset dataset dmin 100 86.7 85 93.4 dK 100 80.0 75 86.8 dH 100 73.3 70 82.1 the proposed grKNN classifier has a potential in practical applications. In particular, the grKNN classifier with the dmin distance performed best on all four benchmark datasets employed here followed by the performance of the grKNN classifier with the dK distance; the latter was followed by the performance of the grKNN classifier using the dH distance. Currently, we have completed the development of a fully functional computer software towards evaluating the performance of alternative versions of the grKNN classifier. Fig.3 displays the output of our software using the grKNN classifier with distance dmin regarding the Iris benchmark dataset. More specifically, Fig.3(a) displays a single IN induced from the training data per data dimension per class; whereas, Fig.3(b) displays up to five INs per data dimension per class induced from the (same) training data.

6.

CONCLUSIONS

This preliminary work has presented a granular K Nearest Neighbor (grKNN) classifier in the metric lattice of Interval’s Numbers (INs). The capacity of different grKNN classifier versions was demonstrated, comparatively, on four benchmark classification problems with promising results. Future work improvements regard the computation of data clusters by considering simultaneously more than one data dimensions. Further improvement will be pursued by optimal estimation of the parameters of functions v(x) and θ(x) per data dimension. For instance, it could be v(x) = A and θ(x) = 2µ − x, where λ ∈ R+ , µ ∈ R. Of 1+e−λ·(x−µ) special interest is optimization based on Proposition 2. In this manner a pattern classification problem could be formulated as an objective function parameter optimization problem [1]. Note that especially promising is the development of machine learning schemes in the space of INs in the context of a Riesz space [9]; the latter (space) is defined as a lattice-ordered vector space. We remark that a unique advantage of the proposed grKNN classifier is its capacity to rigorously deal with granular data (INs) toward accommodating vagueness in practice.

7.

ACKNOWLEDGMENTS

This research has been co-funded by the European Union (Social Fund) and Greek national resources under the framework of the “Archimedes III: Funding of Research Groups in TEI of Athens” project of the “Education & Lifelong Learning” Operational Programme.

8.

REFERENCES

[1] D. P. Bertsekas. Convex Optimization Theory. Athena Scientific, Boston, MA, 2009. [2] G. Birkhoff. Lattice Theory. American Mathematical Society, Providence, RI, 1967. [3] C. M. Bishop. Pattern Recognition and Machine Learning. Springer, New York, NY, 2006. [4] R. O. Duda, P. E. Hart, and D. G. Stork. Pattern Classification, 2nd ed. John Wiley & Sons, Inc., New York, NY, 2001. [5] V. G. Kaburlasos. FINs: lattice theoretic tools for improving prediction of sugar production from populations of measurements. IEEE Trans. Syst., Man, and Cybern. - B, 34(2):1017–1030, April 2004. [6] V. G. Kaburlasos and A. Kehagias. Fuzzy inference system (FIS) extensions based on lattice theory. IEEE Trans. Fuzzy Systems, in press. [7] V. G. Kaburlasos, S. E. Papadakis, and G. A. Papakostas. Lattice computing extension of the FAM neural classifier for human facial expression recognition. IEEE Trans. Neural Networks and Learning Systems, in press. [8] V. G. Kaburlasos, G. A. Papakostas, T. Pachidis, and A. Athinelis. Intervals’ numbers (INs) interpolation /extrapolation. In FUZZ-IEEE Conference Proceedings. IEEE, in press. [9] W. A. J. Luxemburg and A. C. Zaanen. Riesz Spaces, vol. 1. North-Holland, Amsterdam, NL, 1971. [10] T. Pachidis and V. G. Kaburlasos. Person identification based on lattice computing k-nearest-neighbor fingerprint classification. In KES Conference Proceedings, pages 1720–1729. IOS Press, Sept. 2012. [11] W. Pedrycz, A. Skowron, and V. Kreinovich. Handbook of Granular Computing. John Wiley & Sons Ltd, Chichester, England, 2008. [12] V. Petridis and V. G. Kaburlasos. Fuzzy lattice neural network (FLNN): a hybrid model for learning. IEEE Trans. Neural Networks, 9(5):877–890, Sept. 1998. [13] V. Petridis and V. G. Kaburlasos. FINkNN: a fuzzy interval number k-nearest neighbor classifier for prediction of sugar production from populations of samples. Journal of Machine Learning Research, 4:17–37, April 2003. [14] M. Z. Spivey. A generalized recurrence for Bell numbers. Journal of Integer Sequences, 11, 2008.

(a)

(b)

Figure 3: Interval-representation of INs induced, per data dimension (in 4 dimensions) per class (namely, Iris-setosa, Iris-versicolor, and Iris-virginica), from the (same) training data regarding the Iris benchmark. (a) A single IN induced unconditionally. (b) Up to five INs induced conditionally.

APPENDIX

Proof. Consider the objective function f (x1 , x2 , . . . , xn ) = x1 x2 . . . xn = x1 x2 . . . xn−1 (1 − x1 − x2 − · · · − xn−1 ), where A. PROOFS OF PROPOSITIONS xn = 1 − x1 − x2 − · · · − xn−1 . PROPOSITION 1. The distance function dK (., .) satisfies We pursue maximization of function f (x1 , x2 , . . . , xn ). Hence, both conditions (C1) and (C2) of a metric function. we zero the correspondning derivatives as follows. ∂f = x2 . . . xn−1 (1−x1 −x2 −· · ·−xn−1 )−x1 x2 . . . xn−1 = ∂x1 Proof. Let A = {A1 , ..., AM } and B = {B1 , ..., BN }. x2 . . . xn−1 (1 − 2x1 − x2 − · · · − xn−1 ) = 0 (C1) In the one direction, we show that dK (A, B) = 0 ⇒ .. A = B: WV WV WV . dK (A, B) = d(Ai , Bj )+ d(Bj , Ai ) = 0 ⇒ d(Ai , Bj ) = ∂f = x1 . . . xn−2 (1−x1 −x2 −· · ·−xn−1 )−x1 x2 . . . xn−1 = ∂xn−1 i j i j WV Wj Vi x1 . . . xn−2 (1 − x1 − x2 − · · · − 2xn−1 ) = 0 0= d(Bj , Ai ). Equality d(Ai , Bj ) = 0 implies that j i i j The above equations imply the following ones. V ∀i, it holds d(Ai , Bj ) = 0. That is, ∀i, ∃Ji : d(Ai , BJi ) = 2x1 + x2 + · · · + xn−1 = 1 j WV .. 0 ⇒ Ai = BJi . And likewise for d(Bj , Ai ); that is, ∀j, . j i x1 + x2 + · · · + 2xn−1 = 1 ∃Ij : Bj = AIj . Hence, A = B. By subtracting an equation from the one above it, it folIn the other W direction, we show V WV that dK (A, A) = 0: lows  dK (A, A) = d(Ai , Aj ) + d(Aj , Ai ) = 0. x1 − x2 = 0  i j j i  .. (C2) Next, we show that d (A, B) = d (B, A): K K ⇒ x1 = x2 = · · · = xn−1 . WV WV WV .   dK (A, B) = d(Ai , Bj )+ d(Bj , Ai ) = d(Bj , Ai )+ xn−2 − xn−1 = 0 i j j i j i WV Next, we solve the above (objective function) maximizad(Ai , Bj ) = dK (B, A). i j tion problem using x1 = 1 − x2 − x3 − · · · − xn instead. It follows x2 = x3 = · · · = xn . PROPOSITION 2 Assume n positive numbers x1 , x2 , . . . , xn In conclusion, x1 = x2 = · · · = xn . such that x1 +x2 +· · ·+xn = 1. Then, the objective function (product) x1 x2 . . . xn is maximized for x1 = x2 = · · · = xn .

Lihat lebih banyak...

A granular, parametric KNN classifier

Descripción

Comentarios