A fuzzy clustering algorithm enhancing local model interpretability

May 24, 2017 | Autor: José-Luis Díez | Categoría: Cognitive Science, Applied Mathematics, Soft Computing, Fuzzy Clustering, Linear Model, Fuzzy System

Share Embed

Laporkan tautan ini

Descripción

Soft Computing manuscript No. (will be inserted by the editor)

A Fuzzy Clustering Algorithm Enhancing Local Model Interpretability J. L. D´ıez, J. L. Navarro, A. Sala Departamento de Ingenier´ıa de Sistemas y Autom´ atica Universidad Polit´ecnica de Valencia Camino de Vera, 14 46022 Valencia, Spain Email:{jldiez,joseluis,asala}@isa.upv.es

Received: date / Revised version: date

Abstract

In this work, simple modifications on the

1 Introduction

cost index of particular local-model fuzzy clustering algorithms are proposed in order to improve the readability of the resulting models. The final goal is simultane-

Modelling and identification of a system is essential in

ously providing local linear models (reasonably close to

applications such as controller design, supervision and

the plant’s Jacobian) and clustering in the input space

fault detection systems. However, classic linear mod-

so that desirable characteristics (regarding final model

elling techniques are not applicable to complex systems

accuracy, and convexity and smoothness of the mem-

such as those in biochemical and chemical engineering,

bership functions) are improved with respect to other

social models, etc. Many of these systems are charac-

proposals in literature. Some examples illustrate the pro-

terised by significant nonlinearities and/or noisy data,

posed approach.

so linear models must be local in nature. Fuzzy models are widely used in the above context because of its universal function approximation capabilities [29] and the parallelism to human reasoning processes so that, hopefully, readability of the obtained rep-

Key words

fuzzy system identification, fuzzy cluster-

ing, interpretability, local models.

resentations is satisfactory. Fuzzy identification tries to adjust those models to available data sets.

2

J. L. D´ıez et al.

Two main fuzzy identification approaches from data

cost index of particular local-model clustering algorithms

are usually adopted [30]: a minimization of a “global”

are going to be proposed in order to overcome the re-

prediction error [5,21] and minimization of a “local” er-

ferred readability and sensitivity to noise. The modi-

ror of a number of local models that represent the system

fications require some variations in the usual iterative

in a region [22,20,17].

minimization procedures that are also discussed.

Identifying fuzzy models, as any identification proce-

The structure of the contribution is as follows. No-

dure [28], needs first establishing a model structure for

tation and basic definitions are introduced in next sec-

a subsequent parameter identification step. This second

tion. Section 3 briefly reviews the most widespread mod-

step can be done, for example, by least mean squares

elling approaches and clustering algorithms, discussing

[2] if the system is linear in parameters. That is the case

the fuzzy clustering application in fuzzy models identifi-

if antecedent membership functions (interpretable as va-

cation and their relation to local identification. Section

lidity regions for local descriptions) have been previously

4 presents a modified objective function for local mod-

provided. However, if that is not the case (antecedents

elling, so that a balance between precision and inter-

must be estimated from data), a non-linear in parame-

pretability can be established. The iterative algorithm

ter problem usually appears which can be considered a

for the minimization of the modified cost index is also

particular case of a general structure identification task

detailed, and some additional issues regarding this new

[26]. Then, in general, a worse readability of the resulting

clustering algorithm are remarked. The capabilities of

model will be achieved (readability/accuracy tradeoff),

the modified clustering algorithm are presented in dif-

apart from requiring greater data lengths to achieve a

ferent identification examples in Section 5, justifying the

successful generalisation capability.

suitability of the proposed approach.

The objective of this paper will be obtaining interpretable local lineal models (i.e. linearizations, that would

2 Preliminaries

be valid, for example, for linear control techniques) us-

Identification techniques try to find an “approximation”

ing fuzzy clustering given, among other advantages [11],

y = f (z) to an elusive “real” system y = F∗ (z) from

the systematic methodology these techniques provide for

which a set of input-output data are gathered.

fuzzy identification [2,26,12].

If a general discrete-time dynamic system xk+1 =

In this work, the applicability of fuzzy clustering in

F (xk , uk ) + e(k) is considered, the problem can be posed

local and global modelling, and the readability of the

in the previous framework provided the arguments to F

obtained models is discussed. Some modifications on the

are measurable (for instance, if xk were a collection of

A Fuzzy Clustering Algorithm Enhancing Local Model Interpretability

past input-output values [18]). The variable e(k) repre-

3

3 Discussion on fuzzy modelling techniques

sents exogenous disturbances. The main criteria in order to carry out fuzzy modelling Let us assume that a set of N input-output samples from a system is available. In particular, the samples are arranged in the form z = {(zk , yk ),

k = 1, . . . N }

will be reviewed below, discussed in the context of the trade-off between readability and accuracy [13]. First, the global model approach minimizes an objec-

n

where zk ∈ R are input samples and yk ∈ R are output

tive function that is:

ones. Jg = As widely known, a fuzzy model is usually expressed

N X

(yk − f (zk ))2

(3)

k=1

where a fuzzy model (1) is used for f :

by:

X

f (zk ) =

Pc µ (z)fi (z) Pc i f (z) = i=1 i=1 µi (z)

(1)

0 ≤ µi (z) ≤ 1

(2)

µi (z) = 1

c X

µi (zk ; θi )fi (zk ; βi )

(4)

i=1

An alternative option for modelling complex systems

i

denoting a set of c rules (2 ≤ c ≤ N ), with input mem-

is to partition the operating regime space into a number of regions (either used-defined or generated in the identification process), and minimize the “local” error of a

bership function µi (from 0 to 1) and a consequent function fi (z). As the denominator adds one, it will not be

number of local models fi that represent the system in each of the regions [2] so they approximate the plant be-

written in the expressions in the sequel. haviour on it. The objective function to be minimized in On the following, µi will be assumed to be paramthis case is: eterised as µi (z, θi ) and θi will be identified. Also, the consequent will be a linear function fi (z, βi ) = βi0 + Pn

j=1 zkj βij

(βi is the vector adjustable parameters),

i.e., conforming a linear Takagi-Sugeno model [27].

Jil

=

N X

µi (zk ; θi )(yk − fi (zk ; βi ))2 ,

i = 1, ..., c

(5)

k=1

On one hand, the global approach (4) has the advantage of easy training [6,23,28], and good accuracy

A “readable” fuzzy model would be one in which fi

between real and estimated output of the process. On

were interpretable as “local” behaviours and the mem-

the other hand, the local one (5) improves readability of

bership functions µi were “convex” (in the sense that

the identified model, as individual fi tend to adjust the

its α-cuts [29] were convex sets, generalising the notion

data at points with high membership.

of fuzzy number on the real line). Such readability is-

The posterior use of the models depends on which of

sues motivate the discussion in the section below and

the two approaches is taken. For instance, if the objec-

the proposals in Section 4.

tive is the application to local linear control design or

4

J. L. D´ıez et al.

the models are designed for human interface, user (con-

1) fuzzy partition condition in (2). Let us briefly review

trol engineer) interpretability is fundamental and local

the most widely-known fuzzy clustering algorithms. Fuzzy C-Means (FCM) is a fixed-distance clustering

modelling criteria might be more suitable [10]. This work is focused on interpretable modelling, so the local approach will be used. Note, however that getting global models from them is not straightforward and alternative convex interpolation must be used [3]. The

algorithm usually formulated as minimising under the restriction (2), the cost [7]: J(Z; U, C) =

N c X X

2 (µik )m Dik

(7)

i=1 k=1 2 = (zk − ci )T B(zk − ci ) Dik

(8)

local and global approaches to intelligent modelling coincide in the case of crisp ({0,1}) membership functions, and they do so approximately on those setups where

where U denotes a matrix of adjustable memberships µik , C is a matrix whose rows ci ∈ Rn+1 correspond to the cluster centers (prototypes) to be determined, m ≥ 1

transition zones are small compared to core regions.

2 is metric defined by a is a weighting exponent, and Dik

symmetric positive-definite matrix B. For example, if

3.1 Fuzzy clustering

2 is the usual Euclidean distance. The B = I then Dik

The objective of fuzzy clustering is partitioning the input space in regions so that each one contains data points corresponding to a “similar” behaviour [16]: clustering methods are related to local model identification (5). On the following, the notation below will be used: Eik = (yk − fi (zk , βi ))2

major FCM drawback is that all identified clusters tend to have similar dimensions and shape. The Gustafson-Kessel (GK) algorithm [14] is the FCM extension with adaptive distance most used in identification. This algorithm selects a different Bi (i.e. different

(6)

metric) for each cluster and has the advantage of looking for ellipsoids of variable size and orientation:

The analysis can be extended to multiple outputs by replacing the above formula by the Euclidean norm. In

2 = (zk − ci )T Bi (zk − ci ) DikB i

clustering techniques, each data point gets an individual

where Bi changes from iteration to iteration, and it is

membership assignment: the length of θi in (5) is equal

updated using the expression Bi = [ρi det(Fi )]1/n Fi−1

to the number of data. Basically, the objective function

where Fi is the fuzzy covariance matrix of the ith clus-

P

(9)

µik Eik where µik = µi (zk ); but

ter (the covarianze matrix of data zk weighted by the

the objective function must be modified to obtain mean-

membership degress of cluster i in U ), and ρi is a fixed

ingful results (otherwise, µi (zk ) = 0 is a trivial solution).

2 is therefore a generalmeasure of cluster volume. DikB i

It is usually solved by incorporating the so-called (add-

ized Mahalanobis distance norm. See [14,4] for details.

to minimise is J =

k

A Fuzzy Clustering Algorithm Enhancing Local Model Interpretability

Although the use of FCM and GK in fuzzy mod-

5

comes progressively more linear:

elling is quite widespread, it should be emphasized that αi = 1 −

the “local modelling” criteria implicit in (5) does not directly apply to (7) or (9), as it states “similarity” in terms of “distance to a center” instead of “distance to a model fi ”. To obtain local linear models, hyperellipsoidal clusters must be adjusted to linear hyperplanes eliminating the least significant eigenvalue and eigenvector.

minl {λil } maxl {λil } ,

i = 1, 2 . . . , c

(13)

where λil are l eigenvalues of cluster i covariance. AFCR overcomes some FCRM problems, because it keeps the basic FCRM (includes modelling error in clustering process) but initially the cost is biased towards distance criteria, avoiding initialization problems. How-

To partly address those issues, some clustering algorithms that explicitly look for linear prototypes (inputoutput models) should be considered, such as Fuzzy CRegression Models (FCRM) algorithm [15], with objective function:

ever, as the algorithm progresses in finding local linear models (decreased importance of the “distance term”), the readability of the obtained membership function may degrade in exchange of a fictitious lower modelling error in case of high data noise. The previuos discussion on

J(Z; U, β) =

N c X X

µm ik Eik (βi )

(10)

the advantages and drawbacks of clustering algorithms

i=1 k=1

very similar to the local objective function (5). However, FCRM does not behave well in applications: initialisation affects convergence, and also unusable clusters often result (it connects distant clusters, and noise corruption

with regard to global and local criteria used in modelling, and how suitable modifications can lead to better fuzzy identification results is briefly summarized in Table 1. In next section, further modifications to the cost index try to better tune the readability / accuracy tradeoff.

is very noticeable). Adaptive Fuzzy C-Regression models (AFCR) [25] builds on FCRM, but Dik is substituted by a new mixed

4 A cost index improving membership functions

distance using a combination of FCRM (Eik ) and FCM

convexity

(Dik ) distances, with a fixed weighting factor η and a Modifying the AFCR objective function by including

changing one α ∈ [0, 1]:

some criteria in it for “convexity” of µ, leads to im2 αi )ηDik

(11)

m k=1 (µik ) δa ik

(12)

δa ik = αi Eik (βi ) + (1 − Ja (Z; U, C, β) =

Pc

i=1

PN

provements over original behaviour towards a better performance and membership function interpretability. Of

Selection of α is dynamically determined by the algo-

course, those additions are at the expense of a maybe

rithm and it is closer to 1 as the cluster structure be-

higher modelling error if number of clusters is preserved.

6

J. L. D´ıez et al.

Table 1 Fuzzy clustering techniques for local linear model identification.

Algorithm

Distance

Advantages

Drawbacks

FCM

Fixed

Simplicity

Cluster shape and orientation

GK

Adaptive

Robust to parameters initialization

Hyperellypsoidal clusters

FCRM

Linear

Hyperplanes detection, linear function adjustment

Connection of distant clusters

AFCR

Mixed

Advantages of FRCM, FCM and GK

Cluster readability, noise sensitive

This idea can be conceived as a sort of regularization

where Jf ar and Jnear are multiplied in a new objective

[28].

function (17) by parameters γf and γn , respectively, with

With this idea in mind, two new additional terms are

the purpose of balancing the diverse cost index terms,

added to the performance index defined in (12): a first

and the parameter α defined in (13) is also included

term to penalise high membership of points far from a

( α3i appears because the same importance is given to all

prototype (Jf ar ik ), and a second term to penalise low

terms). Minimization of the index (17) leads to a solution

membership of points near to a prototype (Jnear ik ):

biased towards smoother and more convex membership

Pc

Jf ar (U, C) = Jnear (U, C) =

i=1

Pc

L2ik =

i=1

function shapes.

PN

m k=1 µik [1

PN

k=1 (1

−

−

L2 exp(− 2σik2 )] f

(14)

L2 µik )m [exp(− 2σik2 )](15) n

(zk − pi )T Bi (zk − pi )

The various parameters meanings are summarized below (see Section 4.3 for a further discussion on tuning):

(16) – σf is related to a target maximum cluster size rf

where distance Lik and prototypes pi are measured on

(see Figure 1): points farther away from rf would

input space, by using the first n − 1 components of pro-

be penalized if they belonged to the support of the

totypes vectors ci .

cluster. – σn is related to a target minimum cluster size rn (see

In this way, the modified objective function is:

JC (Z; U, C, β) =

Figure 1): points nearer than rn would be penalized

N c X X

(µik )m α3i Eik (βi )+ i=1 k=1

N c X X

if they did not lie in the core of the obtained cluster.

2 (µik )m (1 − αi )ηDik +

– η provides a balance between cluster size and mod-

i=1 k=1 N c X X i=1 k=1 N c X X

(17) αi µm ik 3 γf [1

−

L2 exp(− 2σik2 f

– γf and γn weight the new index terms against AFCR L2

(1 − µik )m α3i γn [exp(− 2σik2 )]

i=1 k=1

elling error at startup.

)]+

n

ones.

A Fuzzy Clustering Algorithm Enhancing Local Model Interpretability 1

7

where: 3σf

0.9

2 DI (i, k) = α3i Eik (βi ) + (1 − αi )ηDik +

0.8

Dn

0.7

αi 3 γf [1

D

f

0.6

0.5

L2

− exp(− 2σik2 )]

(19)

f

L2

DII (i, k) = α3i γn [exp(− 2σik2 )] n

0.4

0.3

Let us discuss the above minimization steps (minimizing

0.2

3σn

JC (U, C, β) where for each variable (U, C, β) indepen-

0.1

0

rf

rn

Fig. 1 Plot of Df =

Distance to prototype L2ik

L2 [1 − exp(− 2σik2 f

)] and Dn =

L2 [exp(− 2σik2 )] n

dently, assuming the others to be constant). The following theorem gives necessary conditions for JC (U, C, β) minimization for m = 2.

terms.

Theorem 1 (JC minimum) Assume parameters c, α,

4.1 Performance index minimization

σf , σn , γf , γn , η are fixed and m = 2, B strictly posThe minimization of the cost index JC in (17) can be

itive definite and let the number N of datapoints in Z,

done following a similar procedure to the one used for

be strictly greater than the number of clusters c. If the

AFCR [25], to derive updating expressions for proto-

(U ,C,β) is globally minimal for JC (Z; U, C, β) then:

types ci and memberships µik at each iteration step.

µik

Pc II (j,k)+DII (i,k) 1 + j=1 −D DI (j,k)+DII (j,k) = Pc DI (i,k)+DII (i,k)

Consequent parameters βi are adjusted by least mean

PN

squares.

(20)

j=1 DI (j,k)+DII (j,k)

ci =

k=1

PN

m µm ik KI zk + (1 − µik ) KII zk

k=1

The procedure is based on iterating a successive min-

(21)

m µm ik KI + (1 − µik ) KII

where: imization of memberships (constrained by the partition condition (2)), prototype centroids (unconstrained) and consequents (unconstrained). This procedure guarantees

L2

KI (i, k) = 2B((1 − αi )η +

αi 3 γf

exp(− 2σik2 ) 2σf2

f

(22)

L2

KII (i, k) = −2B α3i γn

convergence of the algorithm to a (maybe local) minima,

)

exp(− 2σik2 ) 2σn2

n

and U = [µik ] satisfies the fuzzy partition condition (2). as later stated in Lemma 1. To perform minimization, (17) can be rewritten in

β, and define JCU (U ) = JC (U, C, β) for suitable fuzzy

the more suitable compact form:

JC (U, C, β) =

N c X X

partitions U. The objective is to obtain (µik )m DI (i, k)+

i=1 k=1 N c X X i=1 k=1

Proof First, fix prototypes C ∈ Rc×n and coefficients

(18) (1 − µik )m DII (i, k)

U = arg min JCU (U ) = min JC (U, C, β) = U∈U

U∈U

N X k=1

min JCU k (U ) U

(23)

8

J. L. D´ıez et al.

objective function JCC i (C):

where JCU k (U )

c c X X = (µik )m DI (i, k) + (1 − µik )m DII (i, k) i=1

i=1

JCC i (ci ) =

N X

(µik )m DI (i, k) +

k=1

N X

(1 − µik )m DII (i, k)

k=1

(24)

(28)

The partition condition (2) affects the column sum

This minimization is done by nulling the derivatives and

of the membership matrix U , but different columns are independent so the optimisation can be carried out in-

results in the prototypes expression (21). Last step is to minimize JC (U, C, β) for β with fixed U and C. This can be cast as a weighted least mean

dependently for each k. In this way, for a particular data sample k, adding the partition condition and a Lagrange multiplier transforms

squares [2] problem for each cluster i. The result is the consequent parameter vector given by:

the problem above into the global minimization of: Hk (λk , µik ) =

c X i=1 c X

βi = (ZeT Ui Ze )−1 ZeT Ui Y

(µik )m DI (i, k)+

where βi is the vector of coefficients of linear model for

c X (1 − µik )m DII (i, k) − λk ( µik − 1)

i=1

(29)

cluster i, Ze ∈ RN ×(n+1) is a matrix containing the input

i=1

data and extended by a column of ones (corresponding (25) to the estimated offset parameter βi0 ), Ui ∈ RN ×N is and setting its gradient equal to zero yields to:

a diagonal matrix with the (i, i) element being µm ik , and

c

X ∂Hk = µik − 1 = 0 ∂λk i=1

(26)

Y ∈ RN is the vector of output data.

∂Hk =m(µik )m−1 DI (i, k) ∂µik − m(1 − µik )m−1 DII (i, k) − λk = 0

Remark 1 Updating equations for AFCR algorithm (mini = 1, . . . , c imization of (12)) are a particular case of (20) and (21), (27)

From these c + 1 conditions (note that they are linear

setting γf and γn to zero in (20) and (21), by comparing with expressions [15] identifying analogous terms:

in µik and λk for m = 2), the updating expression for membership functions results in (20). The second step for JC (U, C, β) minimization is to

N P

µik = P c j=1

1 (δa2 ik /δa2 jk )

,

ci =

(µik )m zk

k=1 N P

k=1

(30) (µik )m

minimize for C with fixed fuzzy partition matrix U ∈

Remark 2 Note that (20) makes use of a convex combi-

Rc×N and coefficients β. As J is a summation of terms

nation of DI and DII (by µ and (1 − µ), respectively).

in i, the minimization of ci can be carried out indepen-

As DI and DII cannot be zero simultaneously it pre-

dently for each cluster by unconstrained minimization of

vents (27) from ”singularities” like those appearing, for

A Fuzzy Clustering Algorithm Enhancing Local Model Interpretability

9

example, in the proof of FCM algorithm [7]. However,

6. compare new U with its previous (iteration) value

singularity can occur if γn = 0 because the index be-

and repeat from step 3 if it changes significantly.

comes similar to AFCR. Singularity in that case would

The effect of AFCRC parameters and a suitable method-

appear when DI is zero (i.e., data point coincides with a

ology for parameters selection are studied in Section 4.3.

prototype), because if DII is not present, computing µ

Proof of convergence of the algorithm for the general

from (27) would entail a division by zero. In that case,

case has a main difficulty on the variation of parame-

the usual approach is setting the membership to 1 as in

ter αk in (13), depending on the condition number of

the original FCM algorithm [7].

the estimated cluster covariances. The particular case

Remark 3 (Possibilistic clustering) Possibilistic clustering algorithms [9] relax the fuzzy partition condition replacing (2) by a milder condition Pc

i=1

µik ≤ 1. However, in that case objective functions

have a minimum at µik = 0. To avoid this trivial solution additional terms must be added to the cost index [8,19], but an interesting alternative is the use of terms similar to (14) and (15) then leading to non-trivial solutions with more convex membership functions.

for α constant after a prefixed number of iterations is discussed below. Let us denote as TC the functional that implements AFCRC calculations in each iteration step of the algorithm. Explicitly, the action of TC on an available set of fuzzy partition, prototypes matrices and linear models coefficients at iteration step l will be (U (l+1) , c(l+1) , β (l+1) ) = TC (U (l) , c(l) , β (l) )

(31)

This functional will be formed by the composition of updating equations (20), (29) and (21). Lemma 1 below

4.2 Summary: AFCRC algorithm

A new algorithm to be named AFCRC (Adaptive Fuzzy

adapts theorem (12.1) in [7] to the particular case of AFCRC.

C-Regression models with Convexity enhancement) can

Lemma 1 Let Z be bounded in Rn . If there exists an

be now presented by performing iteration with updating

that, for each l, i and k, fulfils that (l)

expressions (20), (21) and (29):

(DI (i, k) +

1 − µik (l) DII (i, k)) ≥ > 0 µik

(32)

1. fix user defined parameters (η, σf , σn , γf and γn ),

then for m > 1 and any random initial triple (U (0) , c(0) , β (0) ),

2. initialize randomly U ,

either

3. calculate αi with (13) and C prototypes with (21),

(l)

– TC (U (0) , c(0) , β (0) ) converges to a local minimum

4. determine β coefficients by (29),

or saddle point of the set of solutions S ∗ = (U ∗ , c∗ , β ∗ )

5. update U using (20),

of JC as l → ∞, or

10

J. L. D´ıez et al. (l)

– every convergent subsequence of TC (U (0) , c(0) , β (0) ) has its limit at a local minimum or saddle point of JC .

experiments show satisfactory results (although increase of JC may occur in the initial iterations).

4.3 AFCRC parameters selection Proof Complete proof outlined in the following is long but quite straightforward, and it is analogous to the proof of theorem (12.5) for FCM in [7]. If the stated assumptions hold, then:

As previously discussed σn and σf values are related ,respectively, to a sort of limit radius in which a significant increase in cost index would be noticed when membership of data included in a cluster does not fit

– theorem (12.2) in [7] gives the conditions in order to prove that JC is a descent functional for (TC , S ∗ ),

the expected shape for core and support. Let us express the AFCRC cost in the form:

– TC continuity can be stated by using theorem (12.3)

JAF CRC = JAF CR + γf Jf ar + γn Jnear

(33)

in [7], – any iterate sequence generated by AFCRC operator TC lies in a compact subset of the domain of JC (see

From simulation studies, the smaller σf is, the narrower the resulting membership function cores are. The recommended importance of γf Jf ar term (to tune its

theorem (12.4) in [7]),

multiplying factor) is up to 10% of the achieved JAF CR . and therefore these three previous conditions entail conOtherwise, cluster shapes may degenerate. vergence, because in this case theorem (12.1) in [7] can On the other hand, the same simulation experiments be directly applied to prove the lemma. seem to indicate that γn Jnear is effective if its value is

around 1% of the total error. Its main effect is to soften

In plain terms, the above lemma states that if the metric

the variations in µ of the points of the cluster with high

defined in the clustering algorithm is always positive,

membership.

then the use of the functional defined in (31) at each

Based on the above, in the following, a simple struc-

iteration step makes the algorithm converge to a local

ture for parameters selection will proposed to ease AFCRC

minimum of JC for any random initialization.

tuning:

This lemma assures theoretical convergence to JC local extrema, but does not ensure the global minimum, as (20) and (21) are only necessary conditions for JC strict local minima. For the varying α case, simulation

– Normalise the data to unit variance and determine a candidate number of clusters c [26]. – Obtain the maximum feasible distance dmax with available data.

A Fuzzy Clustering Algorithm Enhancing Local Model Interpretability

– Determine the proportion of that distance to be cov-

11

5 Case studies

ered by a reasonably sized cluster to estimate σf . Then, the quotient between σn and σf would be de-

In this section, some examples will show the suitabil-

termined by the expected sizes of the core and sup-

ity of the proposed algorithm. The AFCRC algorithm

√ port of the clusters. For instance, dmax n/c ≈ 6σf

performs well in those cases where classical algorithms

and σn ≈ σf /2 have worked reasonably in simula-

(FCM, GK, AFCR) give good results. Additionally, as

tions (n is the input space dimension).

the examples below show, AFCRC results improve those

– Determine, iteratively, appropriate values for γf and

from the referred algorithms in some cases where the for-

γn so that the contribution of Jf ar and Jnear to the

mer are not able to provide interpretable enough (in the

AFCRC cost index does not go beyond 10%. The

sense of soft and convex) membership functions.

higher the γf and γn are, the more modelling error the local models will have, in exchange for a

5.1 Growth rate kinetics identification

smoother and more readable shape of the membership functions. Usually, the noisier the data the big-

All suggested clustering algorithms have been tested in

ger those parameters should be for acceptable mem-

a real data set corresponding to a nonlinear dimensional

bership functions shape.

static function: the identification from data of growth rate kinetics of a bioreactor [24] (in hours−1 ) as a function of substrate concentration (in g/l). 4 clusters will be identified, as this number seems to be adecquate by

In low-dimensional data sets, the above steps can

visual inspection.

be supported by visualizations of the obtained cluster

The application of the parameter tuning procedure in

shapes. The above guidelines can be used for higher di-

previous section leads to the following parameters values:

mensional data sets (more than two inputs) but, in those

σf2 = 0.02, γf = 10−3 , σn2 = 0.005, and γn = 10−3 .

cases, results can not be visualized. Numerical validation indexes may be used [1] in that case.

Results are shown in the following figures for GK (Figure 2(a), with results almost identical to those obtained by AFCR) and AFCRC (Figure 2(b)). The figures

In the next section, and also in other studies that

shown represent the most frequent shape of the results;

authors have performed, this structure for parameters

note that all clustering algorithms considered (even the

determination has been used with good results.

ones in prior literature) converge to local minima, hence

12

J. L. D´ıez et al.

obtained clusters while the local linear model estimation FUNCTION MAPPING 2

is still accurate enough.

Output

1.5 1 0.5 0

FUNCTION MAPPING 2 1

2

3

4

5

6

1.5

Input MEMBERSHIP FUNCTIONS

1

Output

1

0.5

Membership

0.8

0

0.6

−0.5 0

0.4

1

2

3

4

5

6

7

5

6

7

5

6

7

5

6

7

Input 0.2

MEMBERSHIP FUNCTIONS 0 0

1 1

2

3

4

5

6

7

0.8 Membership

Input

(a) GK (≈ AFCR).

0.6 0.4 0.2 0 0

1

2

3

4 Input

FUNCTION MAPPING 2

Output

1.5

(a) AFRCR for γf = 10−2 .

1 0.5 0

FUNCTION MAPPING 2

−0.5 0

1

2

3

4

5

6

7

1.5

Input MEMBERSHIP FUNCTIONS

1

Output

1

0.5

Membership

0.8

0

0.6

−0.5 0

0.4

1

2

3

4 Input

0.2

MEMBERSHIP FUNCTIONS 0 0

1 1

2

3

4

5

6

7

0.8

(b) AFCRC for initial parameters values. Fig. 2 Clustering algorithms detection of bioreactor data

Membership

Input

0.6 0.4 0.2 0 0

1

2

3

4 Input

structure (circle: experimental data, line: identified local

(b) AFRCR for γf = 10−1 .

model) Fig. 3 AFCRC detection of bioreactor data structure (circle: experimental data, line: identified local model) for different

random initilization sometimes drives the parameters to-

γf values.

wards spurious results. Results of AFCRC algorithm for variations of 102 Noticing the abnormal shape of the clusters in Fig-

times in σf2 and σn2 (related to clusters radius) are quite

ure 2(a), the AFCRC plot shows how it significantly im-

similar to those presented in Figure 2(b), showing a rela-

proves the readability (in terms of µ convexity) of the

tive insensitivity of algorithm to these parameters. Much

A Fuzzy Clustering Algorithm Enhancing Local Model Interpretability

13

more important is the effect on variations of the relative importance of new terms in the cost index, managed by γf and γn . For example, when γf is increased to γf = 10−2 it leads to results shown in Figure 3(a). A higher increase in γf to values of γf = 10−1 leads to clusters with too narrow shapes, as shown in Figure 3(b). A similar study can be carried out for γn variations showing the intuitively expected results.

(a) AFRC (≈ GK).

5.2 Two dimensional nonlinear function identification

The algorithm has been tested on a non-linear system with two input variables x1 and x2 and a single output y taken from [26,12] and given by: −1.5 2 ) y = (1+x−2 1 +x2

with

1 ≤ x1 , x2 ≤ 5 (34)

with a data set of 50 points (x1 , x2 , y) to which some white noise has been added (uniformly distributed in [−0.5, 0.5]). The two dimensional surface to identify is shown in Figure 4.

(b) AFRCR (σf = 0.04, σn = 0.01, γf = 0.01 and γn = 0.001). Fig. 5 Data to cluster assignation by different algorithms.

Figures 5(a) and 5(b) show, respectively, how data points are assigned to clusters by application of AFCR and AFCRC algorithms, respectively (a contour has been drawn around the points with highest membership to a Fig. 4 Nonlinear system to be identified.

particular cluster, indicated by a suitable symbol, and the cluster “center” has been marked with a + sign).

14

J. L. D´ıez et al.

Note that AFCR identifies one cluster whose validity

References

function is surrounding two other ones, i.e., non convex with hardly any meaning. However, AFCRC in Figure 5(b) achieves a very readable interpretation of the un-

1. T. Ahvenlampi, J. L. D´ıez, and J. L. Navarro. New methods for validation of local models in fuzzy clustering identification. Proc. of IFAC Conference on Intelligent Con-

derlying model. trol and Signal Processing, pages 9–12, 2003. 2. R. Babuska. Fuzzy Modeling for Control. Ed. Kluwer Academic, Boston, USA, 1998.

6 Conclusions 3. R. Babuska, C. Fantuzzi, and H. B. Verbruggen. Improved inference for takagi-sugeno models. Proc. IEEE Conf. on Fuzzy Systems, New Orleans:653–664, 1996.

In this paper, after defining different criteria for fuzzy 4. R. Babuska, P.J. van der Veen, and U. Kaymak. Im-

models identification, a revision of the main approaches proved covariance estimation for gustafson-kessel clus-

to fuzzy clustering for local-models identification is done, tering. Proc. IEEE Conf. on Fuzzy Systems, Honolulu,

highlighting the main characteristics a clustering algorithm should have for finding interpretable models (in the sense of local linearizations with convex membership functions). After that, a cost index modification has been proposed where balance between interpretability and modelling error can be taken into account and whose minimization leads to local linear models and more convex membership functions. Suitable updating expressions for

2002. 5. L. F. B. Baptistella and A. Ollero. Fuzzy methodologies for interactive multicriteria optimmization. IEEE Transactions on Systems, Man and Cybernetics, 10:355– 365, 1980. 6. D.P. Bertsekas. Nonlinear Programming. Ed. SpringerVerlag, London, UK, 1999. 7. J.C. Bezdek. Pattern recognition with Fuzzy Objective Function Algorithms.

Ed. Plenum Press, New York,

USA, 1987.

prototypes and memberships have been presented, as 8. R.N. Dav´e. Characterization and detection of noise clus-

well as guidelines in the tuning of the objective functering. Pattern Recognition Letters, 12:657–664, 1991.

tion parameters. 9. R.N. Dav´e and R. Krishnapuram.

As shown in different examples, the modified clustering algorithms overcome some problems present in current ones, enhancing the the readability of the obtained membership functions.

Robust clustering

methods: a unified view. IEEE Transactions on Fuzzy Systems, 5:270–293, 1997. 10. J. L. D´ıez, J. L. Navarro, and A. Sala. Identification for local model control with fuzzy clustering. Proc. IFAC

A Fuzzy Clustering Algorithm Enhancing Local Model Interpretability Workshop on Advanced Fuzzy-Neural Control, pages 99– 104, 2001.

20. S. Lee and G. E. Yen. On the local interpretation of takagi-sugeno fuzzy models from a dynamical systems

11. R. O. Duda, P. E. Hart, and D. G. Stork. Pattern Classification. Ed. John Wiley & Sons, New York, USA, 2000. 12. M. R. Emami, I.B. T¨ urksen, and A.A. Goldenberg. Development of a systematic methodology of fuzzy logic modeling. IEEE Transactions on Fuzzy Systems, 6:346– 366, 1998.

view. Proc. of the American Control Conference, pages 519–524, 2002. 21. L. Ljung. System Identification. Ed. Prentice-Hall, New Jersey, USA, 1999. 22. R. Murray-Smith and T.A. Johansen. Local learning in local model networks. In R. Murray-Smith and T.A. Jo-

13. S. Guillaume. Designing fuzzy inference systems from data: an interpretability-oriented review. IEEE Transactions on Fuzzy Systems, 9:426–443, 2001. 14. E.E. Gustafson and W.C. Kessel. Fuzzy clustering with a fuzzy covariance matrix. Proc. of IEEE CDC, San Diego, California, pages 761–766, 1979. 15. R. J. Hathaway and J.C. Bezdek.

15

hansen, editors, Multiple model approaches to modelling and control. Ed. Taylor & Francis, London, UK, 1997. 23. J. Nocedal and S.J. Wright. Numerical Optimization. Ed. Springer-Verlag, London, UK, 1999. 24. J.A. Romero, J. L. Navarro, and J. M. Bruno. Estimaci´on de par´ ametros del modelo de un biorreactor aplicando

Switching regres-

algoritmos gen´eticos. ITecnike Journal (in Spanish), In-

sion models and fuzzy clustering. IEEE Transactions

novaci´ on e Investigaci´ on en la Ingenier´ıa, 2:31–34, 2002.

on Fuzzy Systems, 1:195–204, 1993.

25. M. Ryoke and Y. Nakamori. Simultaneous analysis of

16. T.A. Johansen and R. Murray-Smith. The operating

classification and regression by adaptive fuzzy clustering.

regime approach to nonlinear modelling and control. In

Japanese Journal of Fuzzy Theory and Systems, 8:99–

R. Murray-Smith and T.A. Johansen, editors, Multiple

113, 1996.

model approaches to modelling and control. Ed. Taylor & Francis, London, UK, 1997. 17. T.A. Johansen, R. Shorten, and R. Murray-Smith. On

26. M. Sugeno and T. Yasukawa. A fuzzy-logic-based approach to qualitative modeling. IEEE Transactions on Fuzzy Systems, 1:7–31, 1993.

the interpretation and identification of dynamic takagi-

27. T. Takagi and M. Sugeno. Fuzzy identification of sys-

sugeno fuzzy models. IEEE Transactions on Fuzzy Sys-

tems and its applications to modeling and control. IEEE

tems, 8:297–313, 2000.

Transactions on System, Man and Cybernetics, 15:116–

18. R. Johansson. System modeling and identification. Ed. Prentice-Hall, Information and System Sciences series, New Jersey, USA, 1993. 19. R. Krishnapuram and J.M. Keller. A possibilistic approach to clustering. IEEE Transactions on Fuzzy Systems, 1:98–110, 1993.

132, 1985. 28. E. Walter and L. Pronzato. Identification of Parametric Models from Experimental Data. Ed. Springer-Verlag, London, UK, 1997. 29. L.-X. Wang. A Course in Fuzzy Systems and Control. Ed. Prentice-Hall, New Jersey, USA, 1997.

16

J. L. D´ıez et al.

30. J. Yen, L. Wang, and C. W. Gillespie. Improving the interpretability of tsk fuzzy models by combining global learning and local learning. IEEE Transactions on Fuzzy Systems, 6:530–537, 1998.

Lihat lebih banyak...

A fuzzy clustering algorithm enhancing local model interpretability

Descripción

Comentarios