Information fractals for evidential pattern classification

May 22, 2017 | Autor: Aydan Erkmen | Categoría: Robotics, Artificial Intelligence, Computer Vision, Pattern Recognition, Modeling, Fractals, Entropy, Reasoning, Robots, Intelligent System, Classification, Fractal, Pattern Classification, Belief Function, Entropie, Fractals, Entropy, Reasoning, Robots, Intelligent System, Classification, Fractal, Pattern Classification, Belief Function, Entropie

Share Embed

Laporkan tautan ini

Descripción

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS, VOL.

20, NO. 5, SEPTEMBER/OCTOBER

1990

1103

Information Fractals for Evidential Pattern Classification

Abstract-Proposed is a new model of belief functions based on fractal theory. The model is first justired in qualitative, intuitive terms, then formally defined. Also, the application of the model to the design of an evidential classifier is described. The proposed classification scheme is illustrated by a simple example dealing with robot sensing. The approach followed is motivated by applications to the design of intelligent systems, such as sensor-based dexterous manipulators, that must operate in unstructured, highly uncertain environments. Sensory data are assumed to be 1) incomplete and 2) gathered at multiple levels of resolution.

I. INTRODUCTION

T

H E OVERALL objective of the research reported in this paper is to derive an algorithmic framework for the integration of perception and action in the intelligent control of dexterous robot manipulators operating in complex, unstructured, highly uncertain environments. More specifically, our goal is to develop an evidential reasoning scheme to determine the most suitable grasp configuration for a given object and manipulation task. This scheme is at the basis of sensorimotor control and is based on a classification algorithm that generates a grasp situation by recognizing its two components: 1) the graspable part(s) of three-dimensional (3-D) objects in the environment; 2) the required grasp characteristics for a given task. An early version of such an algorithm has been described in 1241. The effectiveness of a sensor-based system depends on its ability to extract meaningful information from a complex environment under a broad range of sensing conditions, while requiring a minimal amount of prior knowledge about the environment. The importance of uncertainty has been increasingly recognized in the last few years by the robotics community [3], [61. The sensory information available to an intelligent robot is gathered at multiple levels of resolution. It is typically sparse, imprecise, incomplete, and possibly inconsistent. The procedural knowledge for sensory data interpretation, task planning, or high-level control may be incompletely specified. Manuscript received November 3, 1988; revised May 11, 1990. A. M. Erkman was with the Department of Electrical and Computer Engineering, George Mason University, Fairfax, VA 22030. She is now with the Middle East Technical University, Ankara, Turkey 06531. H. E. Stephanou was with the Department of Electrical and Computer Engineering, George Mason University, Fairfax, VA 22030. He is now with Rensselaer Polytechnic Institute, Troy, NY 12180. IEEE Log Number 9037412.

Also, the manipulation environment is not accurately known. These considerations have motivated our investigation of evidential reasoning techniques for sensor-based robotics. The major thrust of this paper is on the fractal modeling of belief functions. Early results from the incorporation of this model into evidential reasoning, and its application to robot sensing are also outlined to illustrate the model. Section I1 summarizes the related work that motivated our research. Section I11 lays the theoretical foundation of our fractal approach to the representation of belief functions. The concept of interaction between belief functions is introduced in Section IV and measured as an interaction dimension. In Section V, the interaction between a belief function and an aggregate of belief functions is analyzed, leading to the derivation of a conductivity measure. Section VI describes a simple application to a robot sensing problem. 11. BELIEFFUNCTIONS Pearl [17] points out that Bayesian methods begin drawing inferences when the underlying probabilistic model is complete. Rather than completing the model, the Dempster-Shafer (DS) theory computes provability or plausibility indices. The theory estimates how close the evidence is to forcing the truth of the hypothesis, instead of estimating how close the hypothesis is to being true. The stronger the evidence, the more likely it is that a complete proof will be assembled. As such, it resigns itself to providing partial answers rather than full answers to probabilistic queries. If a large body of knowledge has been acquired but a few parameters are missing, Bayesian methods assume some reasonable values for the missing parameters. DS does not. Thus although Bayesian methods have the capability to tolerate total ignorance, they lack the flexibility to accommodate partial information. Since our fractal model of uncertainty is based on belief functions, we briefly review the basic concepts and definitions underlying Dempster-Shafer (DS) theory. A. Frame of Discernment

Let a frame of discernment F be the set of all possible values of some numerical or symbolic variable x. Let b ( q ) denote the degree of belief that the true value of x is in the subset q of F , and in no smaller subset of q . It is

0018-9472/90/0900- 1103$01.OO 01990 IEEE

1104

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS, VOL.

convenient to visualize x as a mass of weight b ( q ) that is confined to q but can move anywhere inside q. B. Basic Probability Assignments A function b: 2F assignment (BPA) if

-+

[O, 11 is called a basic probability

b(dJ)= 0 0 < b ( q i )G 1 C b ( q ; )= 1 1

where dJ is the empty set, qi ( i = l ; - - , n ) are subsets of F , and 2 F denotes the set of all subsets of F. b ( q i ) is a measure of the belief committed exactly to qi and to no proper subset of 4,. C. Focal Elements The subsets 4,; qn are called focal elements. The set Q = ( q l ; - . , q n )of all the focal elements is called the core. A subset of F is a focal element if it is nonempty and if the bpa assigned to it is nonzero. There are no other restrictions on Q. In particular, one of the focal elements can be the entire frame of discernment. Also, the focal elements need not be disjoint, and their union need not cover the entire frame of discernment. e ,

D. Belief Functions A belief function X = [ { Q ) ; ( b )consists ] of a core and a set of bpa’s assigned to its focal elements. The definition of a belief function establishes a one-to-one correspondence between subsets of F and logical propositions. Thus the notions of conjunction, disjunction, implication, and negation are equivalent to the set-theoretic notions of intersection, union, inclusion, and complementation. A belief function corresponds to the intuitive idea that a portion of one’s belief can be committed to set unions but need not be fully committed to any one set or its complement. Special Cases: 1) A simple support function consists of a core Q = ( q , F ) , where q is any proper nonempty subset of F , and a bpa such that b ( q ) + b ( F ) = 1. 2) A vacuous belief function is given by b ( F ) = 1 and b ( q , ) = 0 for all qi other than F . This function describes total ignorance, since no portion of one’s belief is committed to any proper subset of F. Thus the true value of x can be anywhere inside F. 3) When the core consists of all the singletons in F , the belief function is equivalent to a Bayesian probability density function. An important aspect of the Dempster-Shafer theory is its

ability to differentiate between the dual concepts of total belief and plausibility, on one hand, and disbelief and

20, NO. 5, SEPTEMBER/OCTOBER 1990

lack of belief on the other hand. These four concepts are defined in the following. Total Belief. The total belief B(q,) = C , b ( q , ) is the sum of the bpa’s of all proper subsets q, of 4,. It corresponds to the sum of all the masses that are confined to proposition q,, and reflects the weight of evidence confirming the truth of proposition q,. Plausibility. The total belief is generally smaller than the plausibility P ( q , ) = 1- B(F - q,), where F - q, is the complement of qr. The plausibility corresponds to the sum of all masses that may enter q, and reflects the absence of evidence disconfirming proposition ql. Disbelief. Similarly, the disbelief D(q,)= B(F - 4 , ) is the total belief in the complement of q,, i.e., the sum of all masses that cannot enter q,. It reflects the weight of the evidence disconfirming 4,. Lack of Belief. The disbelief is generally smaller than the lack of belief U q , )= 1- B(q,),which is the sum of all masses that cannot be confined in q,, and reflects the absence of evidence confirming qr.

E. Dempster’s Rule of Combination Dempster’s rule [20] is the basic mechanism for the combination of several belief functions. It can be interpreted as the generation of a consensus and has been shown [25] to reduce the entropy of a belief system. ] Y= Consider two belief functions X = [ { Q > ; ( A ) and [ { R ) {; B)] based on independent bodies of evidence. Their orthogonal sum Z = [IS);IC)] is obtained by applying Dempster’s rule of combination as follows: Z = [ { S ) ; { C )= ] [ { Q ) ; { A }@ l [{N;{B)] where

This rule, which is associative and commutative, allows the sequential combination of multiple bodies of evidence. The DS rule of combination assumes a particularly convenient form when several pieces of evidence bear on the same proposition (or its negation). However, more complex computation is required when pieces of evidence impinge on different propositions. F. Discussion Unlike Bayesian theory, the theory of belief functions distinguishes between the notions of belief and plausibility, as well as those of disbelief and lack of belief. This is due to the property that, in general, B(qi)+ B ( F - 4,) > 1 The Shafer-Dempster theory is particularly well suited

ERKMEN AND STEPHANOU: INFORMATION FRACTALS FOR EVIDENTIAL PATTERN CLASSIFICATION

for applications where uncertainty is due to the incomplete but not necessarily random nature of the evidence. Since it deals with set functions rather than point functions, it also allows the representation of uncertain knowledge at multiple levels of granularity (coarsening or refining), thus facilitating reasoning with a hierarchy of hypotheses. 111. ENTROPY The information theoretic measures that we use in our approach are based on the concept of entropy. The essential mathematical and statistical nature of information theory has been emphasized by Fisher, Shannon, and Wiener [ll. Information theory has its mathematical roots in the concept of disorder or entropy in thermodynamics and statistical mechanics [12].It brings a measure-based approach into the evaluation of cost. The information cost measured by the entropy formalism has been developed by analogy to the concept of energy.

1105

Several cross-entropy measures have been derived in the literature and are found to satisfy the entropy criteria of monotonicity discussed in this subsection. B. Cross-Entropy Kullback [12l has introduced a measure of information for a set of continuum cardinality for which Shannon’s entropy is a limiting case. The concept of relative measure is developed into a cross-entropy formulation as follows. The logarithm of the likelihood ratio log( f , ( x ) / f 2 ( x ) ) , where f , ( x ) is the generalized probability density unique to set probability measure a{ for hypothesis H,, is defined as the information in a variable x for discrimination in favor of hypothesis 1 and against hypothesis 2. Good [8] has described it as the weight of evidence for hypothesis 1 given x . A measure of divergence between hypotheses 1 and 2, based on the notion of cross-entropy, has been investigated within the following formulation [12],[4]:

A. Axiomatic Characterization of Entropy

Entropic measures have been derived in the literature

[l] as variations of the initial form H ( x ) = -logp(x), where 0 G p(x) Q 1. The properties of such measures include monotonicity in sign and monotonicity in hierarchy. Monotonicity in sign requires two conditions: 1) entropy is strictly positive in the space of real num-

bers except at two points, 0 and 1, where it becomes zero; in short, H ( x ) 0; 2) entropy is additive; let X be a set partitioned into smaller subsets x l ; the entropy of the superset is the sum of the entropies of its subsets; therefore H ( X ) = C,H(x,).

The additivity property also expresses monotonicity in hierarchy: the entropy of a superset is higher than that of its subsets. Monotonicity in hierarchy involves the union of sets. This property states that the union of non-nested sets that are not necessarily disjoint has a higher entropy than the entropy of any individual set which contributes to the union, i.e.,

H( X , UX,)

>H(X,),

k

= I(

1 :2) + Z(2: 1)

where

JD(1,2) is a measure of the difficulty of discriminating between the hypotheses. This divergence J0(1,2) has all the properties of a metric as defined in topology except the triangle inequality property, and is therefore not called distance. The information measures Z(1: 2) and Z(2 : 1) represent directed divergences. For pattern classification, the notion of closest hypothesis suggests the minimization of divergence. Directed divergences have also been called relative entropy or cross-entropy. The principle of minimum cross-entropy has been investigated in [21] and [231 and applied to clustering [22]. C. Evidential Entropy Measures

= 1,2.

For all nonempty sets of m elements, the entropy assumes its maximum value when p ( ~ is) a uniform distribution among the m set elements. This case shows maximum disorder due to ignorance. Shannon calls this an averaging operation. Information dissimilarity has been expressed by crossentropy measures. They are based on the evaluation of relative individual entropies of the form

The concept of entropy has been extended to the framework of the Dempster-Shafer theory of evidence as a measure of uncertainty 1271. Various aspects of uncertainty within the framework of belief and plausibility measures have been analyzed by Klir and Folger [ll]. They include the following.

1) Dissonance: Conflict or dissonance in evidence is encountered whenever nonzero degrees of evidence are allocated to disjoint subsets of the frame of discernment. Dissonance is derived as

E(b)= - Cb(q0log P ( q 0 I

~

1106

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS, VOL.

where q i is a focal element and P ( q , ) is the plausibility of 4;. 2) Confusion: Since plausibility and belief measures are duals, i.e., P ( q ; ) = 1 - B( F - 4;) a measure of confusion may be derived from dissonance by replacing the plausibility by its dual:

C ( b )= - C b ( s i ) l o g ( B ( q i ) ) 1

where B ( q , ) is the total belief. Since B(qJ 2 b(q,), an upper bound for confusion is

The term on the right side is the belief entropy in our fractal model of uncertainty. When b(.) represents a probability measure, then b(qi)= P ( q J = B ( q i ) for all focal elements. Consequently, the confusion measure and the dissonance. measure become Shannon’s entropy. 3) Nonspecificity : Yager [27] has introduced a measure of specificity associated with a possibility distribution. Klir and Folger [ll] have analyzed several properties of a measure of nonspecificity defined by

20, NO. 5, SEPTEMBER/OCTOBER 1990

while multispecificity determines the internal structure (degree of precision) of the variables involved in the degrees of freedom. The Kolmogorov-Sinai entropy is associated with a partition of a mapping, thereby refining it at low levels of granularity. For a partition a = { A ; ) ,the KolmogorovSinai entropy function is

where p( A , ) is an invariant measure associated with the interval A , . Grassberger [lo] has related information flow in dissipative chaotic systems to fractal dimensions and to the Kolmogorov- Sinai eptropy. The information I needed to specify a function during an interval [ t , ,t2] consists of

1) the information needed to specify it at the initial state t , of the interval, 2) the measure of uncertainty during the interval. This can be written as f ( t l , t 2 )= f ( t , ) + ( t , - t 2 ) K

where K = H, is the Kolmogorov-Sinai entropy. For predictable systems K = 0 (no uncertainty), while for Brownian motion, K =CO [19]. The information at a state t , , Z ( t , ) is related to the fractal information dimension of an attractor (convergence point):

for E , 0 Measures 1) and 2) are generalizations ,of Shannon’s entropy in the framework of the theory of evidence. The where E , is the estimated error. D is a non-integer and is measure of nonspecificity is a generalization of the Hartclosely related to the Hausdorff dimension, which is an ley measure: inherent characteristic of fractal sets. --j

B. Fractal Sets The concept of core entropy in our fractal model is also a generalization of the Hartley measure.

IV. FRACTALS A. Information Flow and Fractals Physical systems typically involve a large number of degrees of freedom. For analysis, implementation and simulation, coarsening is usually performed to reduce the number of system variables. Thus coarsening assumes a multigranular approach to system dynamics. These reduced number of variables evolve nondeterministically in time. Moreover, the variables exhibit multiple levels of specificity since dynamical analysis and implementation can not deal with infinitely precise numbers. Chaotic behavior is related to two types of uncertainty: multigranular and multispecific. Multigranularity is associated with partitioning of the problem domain into degrees of freedom,

The theory of sets of fractional dimension has been developed by mathematicians early in this century. Recently there has been a substantial increase in the importance of fractal sets in many fields. Mandelbrot [141 pioneered their uses to model a wide variety of scientific phenomena, from the molecular to the astronomical. A fractal is defined as a scale invariant structure whose statistical properties are unchanged under dilation or change of spatial length scale [lS]. Important investigations of propagation in fractals have been carried out in the fields of fractal crystal growth and viscous fingers [7], [131, [16]. The propagation of a flow of particles in a fractal environment depends on the physical interactions of particles with the fractal structure in terms of reflections, refractions, or chemical, thermal, or dynamic reactions. This propagation is expressed solely as a function of the dimension of the fractal structure. A rigorous study of sets of fractional dimension is given in [5]. There are two types of fractal sets: 1) irregular continuous sets and 2) sets with discontinuities (gaps).

ERKMEN AND STEPHANOU: INFORMATION FRACTALS FOR EVIDENTIAL PATTERN CLASSIFICATION

The size and density of such sets can only be estimated by nonfractal sets, often called regular sets. These regular sets have an exact topology that is used in the estimation of fractal dimensions.

C. Hausdo@ Dimension The dimension of fractal sets is an estimate of the size or density of such sets. It is a fractional entity. Sets of dimension s are called s-sets. These are sets of nonzero but finite s-dimensional Hausdorff measure. If V is a nonempty subset of R" (Euclidean n-space), the diameter of V is defined as

I VI = sup { Ix - y I:

x , y € V }.

If E c U ,V, and 0 < lyl G 8 for each i, then the set (V,} is a 8-cover of E. Intuitively, a &cover is an upper bound estimate of the size of the set E by the union of amorph sets with maximum diameters lV,l. Let E be a subset of R" and s > O . For 8 > 0 , the Hausdorff measure is defined as m

H i ( E ) = inf

lvlS

1 = 1

where the infinum is over all (countable) 8-covers { y } of E. The Hausdorff s-dimensional measure of E is obtained when 8 + 0. Thus H S (E ) = lim H l ( E ) . 6-0

The &cover of E is an estimate of order 6. As 8 tends toward zero, a closer estimate for E is obtained and is called H S (E). The more accurate the estimation, the larger the numwill be to cover E. At the limit, H S ( E )will ber of sets be infinite. There exists a unique value, dim(E), called the Hausdorff dimension of E such that

v

H"( E ) = 03

if 0 G s < dim ( E)

Hs(E ) = 0

if dim( E ) G s ;{B)] be

a belief function with core Q = basic probability assignments B = {b(qi)}.We define the within-set entropy as a measure of information gaps within set elements (here, focal elements). It is composed of two components.

1990

The within-set entropy is HW(X) = H , ( X ) + H h ( X ) .

This entropy measure ranges between zero and infinity. Maximum entropy for a given set of focal elements occurs for uniform distribution of belief and cardinality. All three entropy measures satisfy the entropy axioms stated in section 111-A. C. Focal Divergence In Euclidean space, the €-capacity [5] of a set A depends on a parameter that carries information about distant elements of A . Analogously, we introduce the concept of focal divergence as a measure of the “distance” between focal elements of a belief function. The focal divergence of X is a measure of the degree of uncertainty due to the dissimilarity between its focal elements. We define it as D ( X )= C Cd(si,sj)

B. Within Set Entropy

{si},and

20, NO. 5, SEPTEMBER/OCTOBER

i

j

where the focal distance between qi and qj is based on the generalization of the cross-entropy concept, and is formalized as:

The similarity sim(q,; q,) between focal elements q, and q, is based on [15]: The partial belief assignment takes a value between zero and one leading to an entropy value ranging from zero to infinity. The lowest entropy is achieved for a singleton carrying a belief of one for total certainty. Maximum entropy for a given collection of hypotheses is achieved if evidence supports them equally. The core entropy, which is

sim(q,;q,)

=

( 4 q J n 4?,)jX { 4 q J U4q,))#

where d q ) is a list of attributes of q. Since the focal divergence is a weighted sum of crossentropies, it also satisfies the entropy criteria of monotonicity discussed in section 111-A. D, Inner Entropy We now develop the concept of inner entropy as a total measure of information gaps in the belief function. It is therefore formalized as the measure of the total information cavity due to uncertainty within focal elements ( H J , and between focal elements ( D ) , i.e.,

H , ” ( X )= H J X )+ D ( X ) . and q# is the cardinality of 4 , . The variable h(qJ ranges from zero to one while the core entropy takes values between zero and infinity. h ( q , ) defines the dispersion of choice in more or less dissimilar focal elements. As the focal elements become more and more similar, h ( q i ) tends toward one. Consequently, the dispersion is reduced and the core entropy is lowered, due to a crisper choice allocation. Maximum entropy for a finite collection of focal elements occurs for uniform cardinality dispersion: choice allocation is disparate.

The inner entropy thus consists of two terms:

1) the within set entropy H,,,(X), i.e., the entropy of the information carrying elements; and 2) the focal divergence D ( X ) , i.e., the information waste due to distant focal elements. This entropy measure is derived from the sum of an entropy measure and a modified cross-entropy term. It therefore satisfies the entropy criteria of positivity, additivity, and monotonicity.

ERKMEN AND STEPHANOU: INFORMATION FRACTALS FOR EVIDENTIAL PATTERN CLASSIFICATION

E. Fractal Dimension The fractal dimension provides an estimate of the spread of a fractal set. Due to high irregularities or random cavities, the scale of the set cannot be directly determined. The scale of highly irregular fractal sets is estimated as a function of distances of elements of that set to its boundary. For sets with random cavities, the scale is estimated as a function of two volumes: that of the nonempty subsets of the set and that of the holes. In our fractal model of belief functions, the ratio of size to distance becomes a ratio of within-set entropy to inner entropy. We therefore compute the fractal dimension of the belief function X as

The numerator is a measure of the entropy due to the spread of the focal elements. The divergence component in the denominator captures the entropy due to distances between focal elements.

F. Capacity and Projection We define the capacity of a belief function X as

It is analogous to the concept of t-capacity of a fractal set. We use the capacity of a belief function to project a belief function dimension into another belief function of lower dimension. The projection of the dimension of a belief function is the fraction of that dimension that can be viewed by another belief function of lower dimension. We define the projection of the dimension s of a belief function Y on a belief function X of dimension t ( t < s) as proj [dim ( Y ) ]= dim ( Y ).cap ( X ) . This projection is used to scale an interaction dimension down to a frame. Interaction will be introduced in the next section and can be interpreted as a potential difference between interacting input Y and a frame which X belongs to. By projecting the interaction, we seek to measure the effect viewed by the frame, which can be thought of as the amount of “charge” that can be absorbed by the frame of capacity .cap(X).

VI. INTERACTION BETWEENBELIEFFUNCTIONS

1109

The laws governing these phenomena can be expressed solely in terms of a fractal dimension. For example, consider the interaction of a fractal aggregate with a ray of light. A tiny light source is placed at a point inside the fractal. The probability that it is visible from a given point outside the fractal aggregate is equal to one minus the probability that the light ray from the source to the observer intersects the fractal aggregate. A similar problem deals with the escape of a diffusing particle or random walker from an absorbing cluster. If the particle is released from the interior of the cluster, its probability of escape is equal to one minus the probability of intersection between the cluster and an infinitely long random walk. The intersection probabilities have been derived in the literature as powers of the fractal dimension [191, [26]. Similarly, we view information fractals (in the form of evidence represented by belief functions) as modifying the knowledge environment. In our approach, this modification takes the form of an interaction between belief functions in the’ environment. The interaction between belief functions is defined in the next subsection. B. Interaction Between Evidence and a Knowledge Base Let the interaction environment consist of a piece of evidence (represented by a belief function) and a knowledge base (represented by an aggregation of prototypical belief functions). The evidential fractal is obtained from uncertain sensory data. Consider an evidential fractal interacting with the aggregation of fractals that constitutes the knowledge base. This interaction reflects the information value carried by the evidence. More specifically, this information value is a measure of the relevance of the aggregation fractals to the belief function fractal. We formalize this relevance measure as a conductivity, reflecting the extent to which the aggregate is conductive (i.e., relevant) to the evidence input. Conductivity is caused by the one-to-one interaction between the evidential fractal and each belief function in the aggregate. We formalize the degree of interaction between these two entities as an interaction dimension. If the interaction dimension is larger than the aggregate dimension, then the aggregate cannot view the interaction in its integrity, but only a fraction of it. This fraction of the dimension is equal to the projected interaction dimension. It is equal to one when the interaction dimension is smaller than the aggregate dimension.

A. Interaction of Fractals with the Environment

Fractal aggregates modify external phenomena in the surrounding space [ 141. Examples of such modifications are encountered in different fields such as electricity (a conducting fractal modifies the electric field around it), diffusion processes (an absorbing fractal modifying the local concentration of some diffusing species), fluid dynamics (a fractal immersed in fluid modifying the surrounding flow), etc.

C. Interaction Between Two Belief Functions 1) Interaction Dimension: Consider two interacting fractal belief functions X and Y. We define the fractal dimension of the interaction between two belief functions as dim( X ; Y ) =

Hin(X) + Hi,(’) Hi,( X ) + Hi,( Y ) + div( X ; Y )

1110

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS, VOL.

2) Divergence Between Two Belief Functions: Consider two belief functions:

x = [ (419.. .,4J y = [{ r , ,

*

{ b( 41) . b( 4, I,]

;

7.

. 7

ddiv( Y ;X )

= div( Y; X ) = ddiv( X ; Y )

+ ddiv( Y; X )

1

b(rj) b( r j )log maxi sim ( ri; 4;) b( 4 : )

=

~

where ddiv is a directed divergence, and j*(i*) indicates the value of j(i) that maximizes sim(q,; r,). 3) Divergence Properties: The computation of divergence involves the weighted cross-entropy between the most similar pairs of focal elements, where the weighting factor is the inverse of the similarity between the elements of each pair. Several special cases are now considered:

1) d div(X,, X,) < 0-d-divergence (abbreviation for directed divergence) points from a focal element with low belief to a very similar one with higher belief (in the target frame). The directed divergence contributes to the increase of the interaction dimension by reducing DIV and subsequently reducing the denominator of the interaction dimension. 2) d div( X I , X2) > 0-d-divergence points from a focal element with high belief to a very similar one with lower belief. The directed divergence contributes to the decrease of the interaction dimension by increasing DIV, thus increasing the denominator of the interaction dimension. 3) DIV = C, d div, > 0-The individual components of the sum can be positive or negative, thus causing belief enhancement or attenuation, and the increase or decrease of the interaction dimension. 4) If one of the focal elements of X , has a similarity of zero with all the focal elements of X I , then the d-divergence tends to infinity, leading to DIV +03, which in turn causes zero interaction dimension. 4) Belief Function Aggregates: We now extend our entropy measures to frames structured as belief function aggregates. Let r = {XI,* . .,X,,} denote an aggregate of belief functions. The within-set entropy of r is Hw(')

=

CHin(Xi)* I

The internal divergence of D( r) =

r

j

H i n ( r )= H w ( r ) + D ( r ) . The aggregate dimension of

r

r

is

is

The aggregate capacity of

1

VII. CONDUCTIVITY OF A BELIEF FUNCTION A. Definition

In this section, we introduce the conductivity of a belief function Y in an aggregate r = {XI, . .,X,,} of n belief functions as a measure of the degree of relevance of the internal knowledge structure of r to the belief function Y. Our concept of conductivity results from the degree of interaction viewed by the aggregate, between the belief function Y and each of the belief functions in I'. We define the dimension of the interaction between a belief function Y and the ith belief function X, E r by dim( Y; Xi)=

Hin(Y) + Hin(xi) Hi,,( Y) + Hin(X i ) + div (Y; X,)

The numerator is the within-set entropy of a pair, i.e., the sum of their individual inner entropy measures. The denominator is the total entropy in the pair, i.e., the sum of the within-set entropy of the pair and the entropy between the elements of the pair, as measured by the divergence.

B. Algorithm Let a, denote the conductivity of a belief function Y in r). We compute a; as follows.

Xi (the ith belief function of

1) Compute the dimension dim(r) of the aggregate. 2) Compute the interaction dimension dim(Y; X , ) . 3) If dim(Y; X,) < dim(r), then a, = dim(Y; X,). In this case, the intensity of the interaction is totally viewed by the aggregate. This leads to strong conductivity. 4) If dim ( Y ; X , ) > dim (r), t h e n c o m p u t e pro j[dim(% X,)]. For this case, the intensity of the interaction is partially viewed by the space of r as a projection of the interaction dimension on the dimension of r. This leads to weak conductivity. The projection of an interaction dimension is the fraction of the interaction dimension viewed by a lower dimensional set

pro j[dim( Y; Xi)] = dim( Y; Xi) *cap(r).

is

div(Xj, Xi) i

r is

{ b( rl 1,. . . ,b( r,,)I] .

,r,,};

We define the divergence between X and Y as div( X ; Y )

The inner entropy of

20, NO. 5, SEPTEMBER/OCTOBER 1990

for i > j .

The set capacity c a p ( r ) reflects the ability of the aggregate to view the effect of a higher dimensional

ERKMEN AND STEPHANOU: INFORMATION FRACTALS FOR EVIDENTIAL P A T E R N CLASSIFICATION

1111

interaction. If projecting yields a lower interaction dimension than the dimension of the aggregate, conductivity can be directly computed. If, however, projection leads to an interaction dimension higher than the dimension of the aggregate, then the aggregate cannot view the projection and the corresponding conductivity is zero. Thus if pro j[dim(Y; Xi)] G dim(r), then ai(Y; Xi) = pro j[dim(Y; Xi)] if pro j[dim(Y; Xi)] > dim(r), then uJY; Xi) = 0. 5) Compute the total aggregate r conductivity for Y , U (Y ;r) = C u i ( Y ;x i ) . Z

C. Properties of Conductivity

\ ~~

Fig. 1. Graspable objects in manipulation environment.

4 0 )= 0

In this section, we use a simple example to illustrate our conductivity analysis procedure applied to the classification of 3-D objects from incomplete sensory data. The “objects” considered here are the graspable parts of more complex objects in a manipulation environment. Knowledge is represented as an aggregate of prototypes for each of the object classes. Each prototype is a belief function in a frame of attributes. These attributes are extracted from a hierarchical decomposition of a 3-D object into faces, edges, and vertices. The evidential input is a belief function derived from imprecise and incomplete sensory data.

and U(

j l E J )= ; p ) .

A nonempty collection L = {E,}of subsets of X is a field if L is closed under complementation and under countable union, i.e., if E,

E

L then

-I

E,

E

L m

i f E , , E , , . - - ~ L t h e nU E , E L . J=1

The superset 2F of a frame F is therefore a field. Conductivity is defined on a field 2F composed of subsets (focal elements) of the frame F. When the frame has a fractal dimension lower than the conductivity, the conductivity is projected. Projection transforms its properties to those of an outer-measure [5]. The projected conductivity is an outer-measure. A n outer-measure has the following properties: 4 0 )

A

=0

A’ * U‘( A ) G a’(A’)

This property provides a measure of an upper bound for estimating the total projected conductivity of an aggregate

.

~

Conductivity is a measure. A measure U [5] is a function defined on the field of subsets E, of a set X and taking values in the range [O, 4 such that

Raising the interaction dimension with this upper bound is a factor in smoothing the discontinuity created by the projection and is used in the calculation of the total conductivity.

VIII.

EXAMPLE

A. Object Representation Fig. 1 shows four simple objects that may be encountered in a “coffee cup and a cube of sugar on a saucer” environment. The object frame of discernment consists of four singletons, corresponding to four object classes: pyramid, L-shape, handle, and cylinder, and is denoted by

FO = { PYRD ,LSHP, HNDL, CYLD} . The primitives of such objects can be represented at three different levels of granularity: faces, edges, and vertices. The pyramid can be decomposed at three levels of granularity. At the face level, 3-D objects are decomposed into groups of primitive shapes which determine the specificity of that level. Each group is a triplet (represented as a singleton in the frame of discernment FF) of three attributes: the number (fno), type (ftype), and curvature (fcurv) of faces, where

1 Q fno Q 8 ftype E {T(riangle), S(quare), R(ectangle), B(racket1, C(ircle), D(isk)} fcum E {P(lanar), Chrved)}. The knowledge base used in this example consists of labeled prototypical objects. The labels refer to the pre-

1112

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS, VOL.

TABLE 1 PROTOTYPICAL OBJECTS

TABLE I1 DIMENSIONS

INTERAC7ION

Class 1: Pyramid PYRD, =[{(4,T,P),(l,S,P),FF);{0.50,0.30,0.20)1 PYRD, =[{(4,T,P),(l,R,P),FF);{O.50,0.25,0.25)1 PYRD, = [((4,T, PI, FF);(0.70,0.30)1 Class 2: L-Shape LSHP, =[{(4,T,P),(2,L,P),(5,R,P),(1,S,P),FF);{0.2,0.4,0.1,0.2,0.1)] LSHP, = [{(Z,T,P),(I,S,P),(5,R,P),FF);{0.4,0.2,0.1,0.3)1 LSHP, = [{(6,R, P), FF); {0.5,0.5)1 Class 3: Handle HNDL, =[{(2,B,P),(6,R,P),(2,S,P),FF);{O.4,0.2.0.2,0.2)] HNDL, =[{(2,C,P),(l,R,C),FF);{0.4,0.4,0.2)1 Class 4:Cylinder CYLD, = [{(2,C,P),(1,R,C), FF);{0.4,0.4,0.2)1

determined classes of objects. Each class is represented by an aggregate of belief functions, each corresponding to a prototypical object in that class. Sample aggregates of representing prototypical objects (at the face level) for four classes are listed in Table I. Each prototypical object is represented by a list of these attributes weighted by the degree to which they characterize the object. The weight assigned to the frame of discernment, FF, reflects the incompleteness of the list of attributes. It accounts for additional (hidden) attributes that may characterize the same prototype. In Table I, (4,T,P) stands for four planar triangular faces; (1, S, P) for one planar square face; (1, R, P) for one planar rectangular face; (2,B, P) for two planar bracket faces; (2, C, P) for two planar circle faces; (1, R, C) for one curved rectangular face, etc.

20, NO. 5, SEPTEMBER/OCTOBER 1990

Class,

Class,

Class,

Class,

BF,

0.69

0.88

1.37

0.27

BF,

0.72

0.91

0.48

BF1

1.03

0.38

TABLE Ill PROJECTIONS AND CONDUCTIVITY Class,

Class,

Class,

Class,

BFI

0.13

0.88

0.13

0.27

BF,

0.14

0.91

0.48

BF,

0.20

0.38

U

0.47

2.17

0.61

0.27

cluster the four primitive focal elements: (1, R , PI, (1, S, P),(2, S, P),(2, T , P ) , which, for notational simplicity, are denoted by R , S , 2S, and 2T, respectively. There are three possible partitions of the focal elements into two component clusters. From the focal distances between all pairs of primitive focal elements, the scatter of each partition can be computed as d ( R ;S ) + d ( 2 S ; 2 T )

+

+

?" = d ( R ; 2 S ) d ( R ; 2 T ) d ( S;2S)

d ( R ; 2 S ) +d ( S ; 2 T )

+ d ( S ; 2 T ) = 0.99 = 0.24

"=

d(R;2S)+d(R;2T)+d(S;2S)+d(S;2T)

y3=

d ( R ; 2 T )+ d ( S ; 2 S ) = 0.45. d(R;2S)+d(R;2T)+d(S;2S)+d(S;2T)

B. Sensory Evidence The input to the recognition algorithm is a belief function in FF, reflecting the uncertainty due to the imperfections of sensing and low-level processing algorithms. This section describes the formation of this belief function from sensory evidence. Input evidence is the smallest cluster of sensory data that, after appropriate preprocessing, may have an impact on the current state of knowledge. In this example, the uncertain sensory evidence is modeled by a set of four observed simple support functions representing groups of shape primitives: EV, = [ { (1, R , P ) , FF} ;{0.8,0.2}] EV2 = [ { ( 1 , S, P ) , FF} ;{0.6,0.4}] EV3 = [{ ( 2 , S , P ) ,F F } ; {0.7,0.3)]

EV'

=

[ { ( 2 ,T , P ) ,FF} ;{ 0.5,o .5}] .

The first step in the formation of the evidential input is to

The second partition yields the smallest scatter and is selected. The evidential input for this example is therefore the belief function EV[{Q};{ B ) ] ,where

Q = (((1,R , P ) ,( 2 , S , P I ) ,(( 1,S , PI, (2, T , P I ) ,FF) B

= { b( 1,R , P

) + b(2,S, P ) , b( 1 , S ,P ) + b ( 2 , T 7 P ) , b F ( F ) } *0.25

The ignorance term, b(FF) is the sum of the four ignorance terms in the input simple support functions. C. Conductivity Based Classification The output of the classifier is a belief function in FO. The classification algorithm requires the computation of the conductivity of each prototypical object (described in Table I) to the evidential input. Details of this calculation are described in this section. Performing the conductivity analysis yields the classification results summarized in Tables I1 and 111. Table I1 shows the interaction dimensions between the input belief function and the nine prototypical belief functions.

ERKMEN AND STEPHANOU: INFORMATION FRACTALS FOR EVIDENTIAL PATTERN CLASSIFICATION

Computing the fractal dimension for each aggregate of prototypical belief functions yields d i m ( r , ) =0.65,dirn(r2) =1.06, dim(r,) =0.52,dim(r4)

= 0.85.

Finally, the interaction dimensions are projected following the procedure outlined in Section 111, and the conductivity of each class to EV is then computed. The evidential input has the largest conductivity for class 2, and is therefore classified as L-shape.

D. Interpretation of the Results Conductivity depends on the number of attributes of EV that are common to the focal elements of the prototypes, on the beliefs assigned to these focal elements, and on the number of prototypes in each class. The highest interaction dimension of EV (1.37 in Table 11) is with BF, of class 3 (HNDL, 1, because their common attributes have high bpa’s. The next highest interaction dimension (1.03) is with BF, of class 1 (PYRD,). Class 2 has two prototypes (LSHP, and LSHP,) of relatively high interaction dimensions (0.88 and 0.91, respectively). These two prototypes have the same number of common attributes (T,R,S) with EV, but the presence of an additional attribute (L) in LSHP, reduces its relevance to EV. Table I11 shows the projected interaction dimensions. The relevance of each prototypical belief function to EV has now been scaled by the ability of the given frame of discernment to view the relevance within is own dimension. The conductivity of the ith class to EV is based on the ability of that class to view the relevance. The dependence of conductivity on the number of prototypes is implicitly normalized by the projection. From Table 111, class 2 (L-shape) has the highest conductivity (2.17) to EV, i.e., it has the highest ability to view the interaction of its prototypes with the evidence. Since the interaction dimensions for class 2 (0.88,0.91, and 0.38) are lower than the dimension of class 2 (1.061, the aggregate of prototypes in class 2 can view the interaction in its integrity. This is not true of class 1, which has the highest interaction dimensions, but due to its low conductivity (0.47) can only partially view these interactions.

IX. DISCUSSION The focus of this paper is on the fractal modeling of belief functions for approximate reasoning with incomplete evidence and imprecise knowledge. The model is built upon the Dempster-Shafer theory of belief functions and the theory of fractal sets, and is aimed at the representation of information at multiple levels of granularity as well as multiple levels of specificity. After a qualitative justification and interpretation of this model, we have formally defined several concepts and tools needed for its incorporation into evidential reasoning. A particularly important concept is that of conductivity, as it provides the basis of partial evidential matching in our

1113

approach to reasoning by analogy. A conductivity analysis algorithm was derived and was illustrated by an application to a simple object classification problem. The fractal model provides potentially powerful mechanisms for i) a quantitative measure of relevance of a piece of evidence to a knowledge base, and ii) a systematic approach to the coarsening and refining of frames of discernment.

REFERENCES J. Aczel, Z. Daroczy, On Measures of Information and Their Characterizations. New York: Academic Press, 1975. J. C. Bezdek, Pattern Recognition with Fuuy Objective Function Algorithms. New York: Plenum Press, 1981. R. C. Brost, “Automatic grasp planning in the presence of uncertainty,” in Proc. IEEE Conf. Robotics and Automation, 1986, pp. 1575-1581. P. Diaconis and S. L. Zabell, “Updating subjective probability,” J. Amer. Statist. Assn., vol. 77, no. 380, pp. 822-830. K. J. Falconer, The Geometry of Fractal Sets. Cambridge, MA: Cambridge University Press, 1985. H. Farreny and H. Prade, “Tackling uncertainty and imprecision in robotics,” in Robotics Research, 0. D. Faugeras and G. Giralt, Eds. Cambridge, MA: MIT Press, pp. 85-92. P. Garik, R. Richter, J. Hautman, and P. Ramanlal, “Deterministic solutions of fractal growth,” Phys. Rec. A , vol. 32, no. 5, pp. 3156-3159. I. J. Good, “Information, weight of evidence, the singularity between probability measures and signal detection,” Lecture Notes in Mathematics, David Bridston Osteyee, pp. 39-53. J. Gordon and E. H. Shortliffe, 1985, “A method for managing evidential reasoning in a hierarchical hypothesis space,” Int. J . Artif. Intell., vol. 26, pp. 323-357. P. Grassberger, “Estimating the fractal dimensions and entropies of strange attractors,” Chaos, A. V. Holden, Ed. Princeton, NJ: Princeton University Press, 1986, pp. 291-311. G. J. Klir and T. A. Folger, Fuzzy Sets, Uncertainty and Information. Prentice Hall, 1988. S. Kullback, Information Theory and Statistics. Dover, 1959. J. S. Langer, “Instabilities and pattern formation in crystal growth,” Rec. Mod. Phys., vol. 52, no. 1, pp. 1-28, 1980. B. B. Mandelbrot, The Fractal Geometry of Nature. New York: W. H. Freeman and Company, 1983. K. Nakamura, A. P. Sage, and S. Iwai, “An intelligent database interface using psychological similarity between data,” IEEE Trans. Syst. Man Cybem., vol. SMC-13, no. 4, pp. 558-567, 1983. J. Nittmann, G. Daccord, and H. E. Stanley, “Fractal growth of viscous fingers: quantitative characterization of a fluid instability phenomenon,” Nature, vol. 314. J. Pearl, Probabilistic Reasoning in Intelligent Systems. Morgan Kaufman, 1988. H. 0. Peitgen and P. H. Richter, The Beauty of Fractals, Springer Verlag, 1986. L. M. Sander, “Fractal growth processes,” Nature, vol. 322, pp. 789-793, 1986. G. Shafer, A Mathematical Theory of Ecidence. Princeton Press, 1976. J. E. Shore and R. W. Johnson, “Axiomatic derivation of the principle of maximum entropy and the principle of minimum cross-entropy,” IEEE Trans. Inform. Theory, vol. IT-26, no. 1, pp. 26-36, 1980. J. E. Shore and R. M. Gray, “Minimum cross entropy pattern classification and cluster analysis,” IEEE Trans. Pattern Anal. Machine Intell., vol. PAMI-4, no. 1, pp. 11-17, 1982. J. E. Shore, “Relative entropy, probabilistic inference and AI,” in Uncertainty in Artificial Intelligence, L. N. Kana1 and J. F. Lemmer, Eds. Elsevier Science Publishers, 1986. H. E. Stephanou and A. M. Erkmen, “Evidential classification of dexterous grasps for the integration of perception and action,” J. Robotic Syst., vol. 5, no. 4, pp. 309-336, 1988. H. E. Stephanou and S. Y. Lu, “Measuring consensus effectiveness by a generalized entropy criterion,” IEEE Trans. Pattern Anal. Machine Intell., vol. 10, no. 4, pp. 544-554, 1988.

1114

IEEE TRANSACTIONS ON

SYSTEMS, MAN,AND CYBERNETICS, VOL. 20, NO. 5, SEPTEMBER/OCTOBER 1990

[26] T. A. Witten and M. E. Gates, “Tenuous structures from disorderly growth processes,” Science, vol. 232, pp. 1607-1612, 1986. [27] R. R. Yager, “Entropy and specificity in a mathematical theory of evidence,” Int. J. Gen. Syst., vol. 9, pp. 249-260, 1983. I

Aydan M. Erkmen received the B.S.E.E. degree in 1978 from Bogazici University, Turkey, the M.S.E.E. degree in 1981 from Drexel University, Philadelphia, PA, and the Ph.D. degree in 1989 from George Mason University, Fairfax, VA. She is an Assistant Professor of electrical engineering at Middle East Technical University (M.E.T.U.), Turkey. Her current interests include robotics, intelligent control, and chaotic dynamics with a particular focus in predictive reasoning under uncertainty and sensor fusion. At M.E.T.U., she has developed and taught graduate courses on chaotic dynamics and dynamics of manipulation with robot hands. Dr. Erkmen is a member of The Turkish Society of Electrical Engineers, and Eta Kappa Nu.

Harry E. Stephanou (S’73-M’76-SM’87) received the Ph.D. degree in electrical engineering in 1976 from Purdue University, West Lafayette, IN. From 1976 to 1985, he was with the Emon Production Research Company, where he headed the Systems Research Section, F g Range Research Division. In 1985, he joined George Mason University’s School of Information Technology and Engineering. From 1987 to 1988, he also served as Program Director for Robotics and Machine Intelligence at the National Science Foundation. In June 1990, he joined Rensselaer Polytechnic Institute as Director of the New York State Center for Advanced Technology in Automation and Robotics. His research interests are primarily in the area of intelligent robotic systems and include reasoning under uncertainty, multisensor fusion, dextrous manipulation, and intelligent control. Dr. Stephanou is a senior member of the IEEE, Vice-president of the IEEE Society for Robotics and Automation, Chairman of the IEEE Control Systems Society Technical Committee on Intelligent Control, Associate Editor for the IEEE TRANSACTIONS ON SYSTEMS, MAN,AND CYBERNETICS, and Associate Editor for the IEEE TRANSACTIONS ON ROBOTICS AND AUTOMATION. He was the General Chairman of the 1988 IEEE International Symposium on Intelligent Control.

Lihat lebih banyak...

Information fractals for evidential pattern classification

Descripción

Comentarios