Multilevel Hierarchical Kernel Spectral Clustering for Real-Life Large Scale Complex Networks

Share Embed


Descripción

Multilevel Hierarchical Kernel Spectral Clustering for Real-Life Large Scale Complex Networks Raghvendra Mall*, Rocco Langone, Johan A. K. Suykens ESAT-STADIUS, KU Leuven, Leuven, Belgium

Abstract Kernel spectral clustering corresponds to a weighted kernel principal component analysis problem in a constrained optimization framework. The primal formulation leads to an eigen-decomposition of a centered Laplacian matrix at the dual level. The dual formulation allows to build a model on a representative subgraph of the large scale network in the training phase and the model parameters are estimated in the validation stage. The KSC model has a powerful out-of-sample extension property which allows cluster affiliation for the unseen nodes of the big data network. In this paper we exploit the structure of the projections in the eigenspace during the validation stage to automatically determine a set of increasing distance thresholds. We use these distance thresholds in the test phase to obtain multiple levels of hierarchy for the large scale network. The hierarchical structure in the network is determined in a bottom-up fashion. We empirically showcase that real-world networks have multilevel hierarchical organization which cannot be detected efficiently by several state-of-theart large scale hierarchical community detection techniques like the Louvain, OSLOM and Infomap methods. We show that a major advantage of our proposed approach is the ability to locate good quality clusters at both the finer and coarser levels of hierarchy using internal cluster quality metrics on 7 real-life networks. Citation: Mall R, Langone R, Suykens JAK (2014) Multilevel Hierarchical Kernel Spectral Clustering for Real-Life Large Scale Complex Networks. PLoS ONE 9(6): e99966. doi:10.1371/journal.pone.0099966 Editor: Rodrigo Huerta-Quintanilla, Cinvestav-Merida, Mexico Received March 7, 2014; Accepted May 20, 2014; Published June 20, 2014 Copyright: ß 2014 Mall et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Data Availability: The authors confirm that all data underlying the findings are fully available without restriction. http://snap.stanford.edu/data/ https://sites. google.com/site/santofortunato/inthepress2. Funding: This work was supported by Research Council KUL: ERC AdG A-DATADRIVE-B, GOA/11/05 Ambiorics, GOA/10/09MaNet, CoE EF/05/006 Optimization in Engineering(OPTEC), IOF-SCORES4CHEM, several PhD/postdoc and fellow grants; Flemish Government: FWO: PhD/postdoc grants, projects: G0226- .06 (cooperative systems & optimization), G0321.06 (Tensors), G.0302.07 (SVM/Kernel), G.0320.08 (convex MPC), G.0558.08 (Robust MHE), G.0557.08 (Glycemia2), G.0588.09 (Brain-machine) G.0377. 12 (structured models) research communities (WOG:ICCoS, ANMMM, MLDM); G.0377.09 (Mechatronics MPC) IWT: PhD Grants, Eureka-Flite+, SBO LeCoPro, SBO Climaqs, SBO POM, O&O-Dsquare; Belgian Federal Science Policy Office: IUAP P6/04 (DYSCO, Dynamical systems, control and optimization, 2007-2011); EU: ERNSI; FP7-HD-MPC (INFSO-ICT-223854), COST intelliCIS, FP7-EMBOCON (ICT-248940); Contract Research: AMINAL; Other:Helmholtz: viCERP, ACCM, Bauknecht, Hoerbiger. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing Interests: No companies are involved in the project and the authors declare that this does not alter their adherence to PLOS ONE policies on sharing data and materials. * Email: [email protected]

coarser levels of granularity in this multilevel hierarchical system of the real-life networks. A state-of-the-art hierarchical community detection technique for large scale networks is the Louvain method [15]. It uses a popular quality function namely modularity (Q) [3,5,6,16] for locating modular structures in the network in a hierarchical fashion. Modularity measures the difference between a given partition of a network and the expectation of the same partition for a random network. By optimizing modularity, they obtain the modular structures in the network. However, it suffers from a drawback namely the resolution limit problem [17–19]. The issue of resolution limit arises because the optimization of modularity beyond a certain resolution is unable to identify modules even as distinct as cliques which are completely disconnected from the rest of the network. This is because modularity fixes a global resolution to identify modules which works for some networks but not others. Recently the authors of [20] show that methods trying to use variants of modularity to overcome the resolution limit problem, still suffer from the resolution limit. They propose an alternative algorithm namely OSLOM [21] to avoid the issue of resolution. However, in our experiments we observe that OSLOM works well for benchmark synthetic networks [4] but in case of real-life

Introduction Large scale complex networks are ubiquitous in the modern era. Their presence spans a wide range of domains including social networks, trust networks, biological networks, collaboration networks, financial networks etc. A complex network can be represented as a graph G~(V ,E) where V represent the vertices or nodes and E represents the edges or interaction between these nodes in this network. Many real-life complex networks are scalefree [1], follow the power law [2] and exhibit community like structure. By community like structure one means that nodes within one community are densely connected to each other and sparsely connected to nodes outside that community. The large scale network consists of several such communities. This problem of community detection in graphs has received wide attention from several perspectives [3–14]. The community structure exhibited by the real world complex networks often have an inherent hierarchical organization. This suggests that there should be multiple levels of hierarchy in these real-life networks with good quality clusters at each level. In other words, there exist meaningful communities at refined as well as

PLOS ONE | www.plosone.org

1

June 2014 | Volume 9 | Issue 6 | e99966

Multilevel Hierarchical KSC for Large Scale Complex Networks

Figure 1. Steps undertaken by the MH-KSC algorithm. doi:10.1371/journal.pone.0099966.g001

with meaningful clusters at each level. The KSC method has been used effectively to obtain flat partitioning in real-world networks [24,25,27]. In this paper, we exploit the structure of the eigenprojections derived from the KSC model. The projections of the validation set nodes in the eigenspace is used to create an iterative set of affinity matrices resulting in a set of increasing distance thresholds (T ). Since the validation set of nodes is a representative subset of the large scale network [26], we use these distance thresholds (ti [ T ) on the projections of the entire network obtained as a result of the out-of-sample extension property of the KSC model. These distance thresholds, when applied in an iterative manner, provide a multilevel hierarchical organization for the entire network in a bottom-up fashion. We show that our proposed approach is able to discover good quality coarse as well as refined clusters for real-life networks. There are some methods that optimize weighted graph cut objectives [29–31] to provide multilevel clustering for the large scale network. However, these methods suffer from the problem of determining the right value of k which is user defined. In realworld networks the value of k is not known beforehand. So in our experiments, we evaluate the proposed multilevel hierarchical kernel spectral clustering (MH-KSC) algorithm against the Louvain, Infomap and OSLOM methods. These methods automatically determine the number of clusters (k) at each level of hierarchy. Figure 1 provides an overview of steps involved in the MH-KSC algorithm and Figure 2 depicts the result of our proposed MH-KSC approach on email network (Enron). In all our experiments we consider unweighted and undirected networks. All the experiments were performed on a machine with 12 Gb RAM, 2.4 GHz Intel Xeon processor. The maximum size of the kernel matrix that is allowed to be stored in the memory of our PC is 10,000610,000. Thus, the maximum cardinality of our training and validation sets can be 10,000. We use 15% of the total nodes as size of training and validation set (if less than 10,000) based on experimental findings in [32]. We make use of the procedure provided in [25] to divide the data into chunks in order to extend our proposed approach to large scale networks. There are several steps in the proposed methodology which can be implemented on a distributed environment. We describe this in detail later.

networks it is unable to detect quality coarse clusters. We also evaluate another state-of-the-art hierarchical community detection technique called the Infomap method [7]. The Infomap method uses an information theoretic approach to hierarchical community detection. It uses the probability flow of random walks as a substitute for information flow in real-life networks. It then fragments the network into modules by compressing a description of the probability flow. Spectral clustering methods [10–14] belong to the family of unsupervised learning algorithms where clustering information is obtained by the eigen-decomposition of the Laplacian matrix derived from the affinity matrix (S) for the given data. A drawback of these methods is the construction of the large affinity matrix for the entire data which limits the feasibility of the approach to small sized data. To overcome this problem, a kernel spectral clustering (KSC) formulation based on weighted kernel principal component analysis (kPCA) in a primal-dual framework was proposed in [22]. The weighted kPCA problem is formulated in the primal in the context of least squares support vector machines [23] which results in eigen-decomposition of a centered Laplacian matrix in the dual. As a result, a clustering model is obtained in the dual. This model is build on a subset of the original data and has a powerful out-ofsample extension property. This property allows cluster affiliation for unseen data. The KSC method was applied for community detection in graphs by [24]. However, their subset and model selection approach was computationally expensive and memory inefficient. Recently, the KSC method was extended for big data networks in [25]. The method works by building a model on a representative subgraph of the large scale network. This subgraph is obtained by the fast and unique representative subset (FURS) selection technique as proposed in [26]. During the model selection stage, the model parameters are estimated along with determining the number of clusters k in the network. A self-tuned KSC model for big data networks was proposed in [27]. The major advantage of the KSC method is that it creates a model which has a powerful out-of-sample extensions property. Using this property, we can infer community affiliation for unseen nodes of the whole network. In [28], the authors used multiple scales of the kernel parameter s to determine the hierarchies in the data using KSC approach. However, in this approach the clustering model is trained for different values of (k,s) and evaluated for the entire dataset using the out-of-sample extension property. Then, a map is created to match the clusters at two levels of hierarchy. As stated by the authors in [28], during a merge there might be some data points of the merging clusters that go into a non-merging cluster which is then forced to join the merging cluster of the majority. In this paper, we overcome this problem and generate a natural hierarchical organization of the large scale network in an agglomerative fashion. The purpose of hierarchical community detection is to automatically locate multiple levels of granularity in the network PLOS ONE | www.plosone.org

Kernel Spectral Clustering (KSC) Method We first summarize the notations used in the paper.

Notations 1.

2

A graph is mathematically represented as G~(V ,E) where V represents the set of nodes and E(V |V represents the set of edges in the network. Physically, the nodes represent the entities in the network and the edges represent the relationship between these entities. June 2014 | Volume 9 | Issue 6 | e99966

Multilevel Hierarchical KSC for Large Scale Complex Networks

2. 3. 4. 5.

6.

maxk is the maximum number of eigenvectors that we want to evaluate. 7. K(:,:) represents the positive definite kernel function. 8. The matrix S represents the affinity or similarity matrix. 9. P represents the latent variable matrix containing the eigenprojections. 10. h represents the hth level of hierarchy and maxh stands for the coarsest level of hierarchy.

The cardinality of the set V is denoted as N. The training, validation and test set of nodes is given by Vtr , Vvalid and Vtest respectively. The cardinality of the training, validation and test set is given Ntr , Nvalid , Ntest . The adjacency list corresponding to each vertex vi [V is given by xi ~A( : ,i).

Figure 2. Result of proposed MH-KSC approach on the Enron network. doi:10.1371/journal.pone.0099966.g002

PLOS ONE | www.plosone.org

3

June 2014 | Volume 9 | Issue 6 | e99966

Multilevel Hierarchical KSC for Large Scale Complex Networks

Figure 3. Algorithm 1: MH-KSC Algorithm. doi:10.1371/journal.pone.0099966.g003

11. Set C comprises multilevel hierarchical clustering information. 12. Coarsest level of hierarchy corresponds to fine grained clusters and finer levels of hierarchy correspond to coarse clusters.

KSC methodology Given a graph G, we use the fast and unique representative subset (FURS) selection [26] technique to obtain training and validation set of nodes Vtr and Vvalid . FURS [26] is a deterministic subgraph selection technique where nodes with high degree centrality are greedily selected from most or all the communities in the network. Nodes with high degree centrality are usually located

Figure 4. Algorithm 2: GreedyMaxOrder. doi:10.1371/journal.pone.0099966.g004

PLOS ONE | www.plosone.org

4

June 2014 | Volume 9 | Issue 6 | e99966

Multilevel Hierarchical KSC for Large Scale Complex Networks

Figure 5. Algorithm 3: GreedyFirstOrder. doi:10.1371/journal.pone.0099966.g005 Ntr |Ntr maxk-1 score variables. D{1 is the inverse of the degree V [R matrix associated to the kernel matrix V with     Vij ~K xi ,xj ~wðxi Þ> w xj . W is the Ntr |dh feature matrix h  > i and cl [Rz is the regularsuch that W~ wðx1 Þ> ; . . . ; w xNtr

at the center, away from the periphery of the network and can better capture the inherent community structure. Since our goal is a locate multilevel hierarchical clustering in the large scale network, it is essential that the training and validation set are representative of the underlying community structure of the network. A detailed description of the FURS approach and its comparison with other state-of-the-art subset selection techniques is provided in [26]. We use 15% of the total nodes as size of training and validation set (if less than 10,000 otherwise 10,000 nodes) based on experimental findings in [32]. Firstly, we apply FURS to obtain the training set of nodes Vtr . Once these nodes are selected in the training set we remove these nodes from the network but maintain the topology (degree distribution) of the network. We then apply FURS again to obtain the validation set of nodes Vvalid . Thus, both these sets Vtr and Vvalid are selected such that they retain the inherent community structure of the large scale network. We then use the entire large scale network as the test set Vtest . tr For Vtr training nodes the dataset is given by D~fxi gN i~1 , N xi [R . The adjacency list xi can efficiently be stored into memory as real-world networks are highly sparse and have limited connections for each node vi . Given D and maxk, the primal formulation of the weighted kernel PCA [22] is given by:

min

wðl Þ ,eðl Þ ,bl

such that

X X 1 maxk{1 1 maxk{1 | | ðl Þ wðl Þ wðl Þ { c eðl Þ D{1 V e 2 l~1 2Ntr l~1 l ðl Þ

ization constant. We note that Ntr %N i.e. the number of nodes in the training set is much less than the total number of nodes in the large scale network. The kernel matrix V is constructed by calculating the similarity between the adjacency list of each pair of nodes in the training set. x> x

Each element of V, defined as Vij ~ kx ki xj is calculated by i k jk estimating the cosine similarity between the adjacency lists xi and xj using notions of set intersection and union. This corresponds to > using a normalized linear kernel function K ðx,zÞ~ kxxkkzzk [23]. The primal clustering model is then represented by: ðl Þ

ei ~wðl Þ> wðxi Þzbl ,i~1, . . . ,Ntr ,

ð2Þ

where w : RN ?Rdh is the feature map i.e. a mapping to highdimensional feature space dh and bl are the bias terms, l~1, . . . ,maxk-1. For large scale networks we can utilize the explicit expression of the underlying feature map as shown in [25] and set dh ~N. The dual problem corresponding to this primal formulation is given by: (l) (l) D{1 V MD Va ~ll a ,

ð1Þ

ð3Þ

ðl Þ

e ~Ww zbl 1Ntr ,l~1, . . . ,maxk{1,

where MD is the centering 1 matrix which is defined as 0

h i ðl Þ ðl Þ > where eðl Þ ~ e1 , . . . ,eNtr are the projections onto the eigen-

MD ~INtr {@

space, l~1, . . . ,maxk-1 indicates the number of score variables required to encode the maxk clusters. However, it was shown in [27] that we can discover more than maxk communities using these

and the kernel function K : RN |RN ?R plays the role of similarity function. The dual predictive model is:

PLOS ONE | www.plosone.org

5

1N 1> D{1 tr Ntr V 1> D{1 1N tr Ntr V

A. The a(l) are the dual variables

June 2014 | Volume 9 | Issue 6 | e99966

Multilevel Hierarchical KSC for Large Scale Complex Networks

Figure 6. Result of MH-KSC algorithm on benchmark Net1 network. doi:10.1371/journal.pone.0099966.g006

which provides clustering inference for the adjacency list x corresponding to the validation/test node v.

where CosDist(:,:) function calculates the cosine distance between 2 vectors and takes values between [0,2]. Nodes which belong to the same community will have CosDist(ei,ej ) closer to 0, ∀ i, j in the same cluster. It was shown in [27] that a rotation of the Svalid matrix has a block diagonal structure. This block diagonal structure was used to identify the ideal number of clusters k in the network using the concept of entropy and balanced clusters.

Multilevel Hierarchical KSC

Determining the Distance Thresholds

We use the predictive KSC model in the dual to get the latent variable matrix for the validation set Vvalid represented as Pvalid ~½e1 , . . . ,eNvalid > and the test set Vtest (entire network) denoted by Ptest . In [27] the authors create an affinity matrix Svalid using the latent variable matrix Pvalid which is a Nvalid |(maxk-1) matrix, as:

We propose an iterative bottom-up approach on the validation set to determine the set of distance thresholds T . In our approach, we refer to the affinity matrix at the ground level of hierarchy as (0) (0) . The Svalid matrix is obtained by calculating the CosDist(:,:) Svalid between each element of the latent variable matrix Pvalid as mentioned earlier. After several empirical evaluations, we observe that distance threshold at level 0 of hierarchy can be set to values between [0.1,0.2]. In our experiments we set t(0) ~0:15. This allows to make the approach tractable to large scale networks which will be explained later.

^e(l) (x)~

Ntr X

a(l) i K(x,xi )zbl ,

ð4Þ

i~1

    Svalid ði,j Þ~CosDist ei ,ej ~1{cos ei ,ej ~1{

PLOS ONE | www.plosone.org

e> i ej  , kei kej 

ð5Þ

6

June 2014 | Volume 9 | Issue 6 | e99966

Multilevel Hierarchical KSC for Large Scale Complex Networks

Figure 7. Result of MH-KSC algorithm on benchmark Net2 network. doi:10.1371/journal.pone.0099966.g007

To obtain the clusters at the next level of hierarchy we treat the communities at the previous levels as nodes. We then calculate the average cosine distance between these nodes using the information present in them. At each level h of hierarchy we create a new affinity matrix as:

We then use a greedy approach to select the validation node with maximum number of similar nodes in the latent space i.e we select the projection ei which has a maximum number of (0) projections ej satisfying Svalid (i,j)vt(0) . We put the indices of (0) these nodes in C1 representing the 1st cluster at level 0 of hierarchy. We then remove these nodes and corresponding entries (0) to obtain a reduced matrix. This process is repeated from Svalid (0) iteratively until Svalid becomes empty. Thus, we obtain the set C (0) ~fC1(0) , . . . ,Cq(0) g where q is the total number of clusters at ground level of hierarchy. The set C (0) has communities along with the indices of the nodes in these communities.

PLOS ONE | www.plosone.org

P (h) Svalid (i,j)~

P (h{1) (h{1) (h{1) Svalid (m,l) m[C l[C i j , DCi(h{1) D|DCj(h{1) D

ð6Þ

where D:D represents the cardinality of the set. In order to determine the threshold at level h of hierarchy, we estimate the minimum

7

June 2014 | Volume 9 | Issue 6 | e99966

PLOS ONE | www.plosone.org 0.878 1.0

15

9

1

8

9

10

2.19

0.0

0.324

0.056

-

-

0.0

0.786 5.0e-04

5.01e-04

5.021e-04

0.765

4.834e-04 4.856e-04

1

5

13

44

87

97

103

4.86e-04

106

4.74e-04

112

-

-

0.675

0.669

0.668

0.62

-

-

0.0

0.12

1.0

0.636

0.47

0.53

0.595

0.61

0.625

2.544

1.643

0.0

0.74

0.90

0.77

0.692

0.667

0.643

0.612

VI

0.0

0.376

0.82

0.773

0.722

0.706

0.694

0.691

0.685

0.66

Q

2.0e-05

2.12e-05

2.0e-05

1.99e-05

1.99e-05

1.99e-05

1.98e-05

1.99e-05

1.99e-05

1.98e-05

CC

8 39 9

5

9

2

2

38

6

2

1

8

9

1

32

2

k

1

Level

Net1

0.016 0.0

1.0

0.0

1.0 0.996

0.037

1.965

0.988

0.192

0.132

0.0

0.915

0.215

1.0

VI

0.84

ARI

0.786

0.67

0.786

0.655

0.487

0.771

0.786

0.693

Q

5.01e-04

4.83e-04

5.01e-04

4.839e-04

5.07e-04

5.03e-04

5.01e-04

4.87e-05

CC

10

3

2

1

3

1

3

1

Level

Net2

The best results w.r.t. various quality metrics when compared with the ground truth communities for each benchmark network is highlighted. doi:10.1371/journal.pone.0099966.t002

MH-KSC

OSLOM

Infomap

Louvain

Method

13

134

29

141

13

590

13

135

k

1.0

0.685

0.74

0.0

0.612

0.633

0.214

0.0

1.0 0.96

8.58

0.0

0.396

VI

0.003

1.0

0.853

ARI

Table 2. 2 best level of hierarchy obtained by Louvain, Infomap, OSLOM and MH-KSC methods on Net1 and Net2 benchmark networks.

0.82

0.66

076

0.64

0.82

0.003

0.82

0.687

Q

2.0e-05

1.98e-05

2.08e-05

2.07e-05

2.0e-05

1.98e-05

2.0e-05

1.98e-05

CC

The number of clusters close to the actual number, the best and second best results are highlighted. For Net1 only 7 levels of hierarchy are identified by MH-KSC, rest are represented by ‘-’. The MH-KSC method provides more insight by identifying several meaningful levels of hierarchy with good clusters w.r.t. quality metrics like ARI, VI, Q and CC. doi:10.1371/journal.pone.0099966.t001

0.0

0.965

0.016

0.996

39

0.018

0.11

-

0.996

37

40

5

0.972

-

6

63

4

7

-

3

-

-

0.685

-

-

134

-

-

1

2

CC

ARI

Q

k

VI

k

Hierarchy

ARI

Net2

Net1

Table 1. Number of clusters (k) for top 10 levels of hierarchy by MH-KSC method.

Multilevel Hierarchical KSC for Large Scale Complex Networks

June 2014 | Volume 9 | Issue 6 | e99966

Multilevel Hierarchical KSC for Large Scale Complex Networks

Table 3. Nodes (V), Edges (E) and Clustering Coefficients (CCF) for each network.

Network

Nodes

Edges

CCF

Facebook (Fb)

4,039

88,234

0.6055

PGPnet (PGP)

10,876

39,994

0.008

Cond-mat (Cond)

23,133

186,936

0.6334

Enron (Enr)

36,692

367,662

0.497

Epinions (Epn)

75,879

508,837

0.1378

Imdb-Actor (Imdb)

383,640

1,342,595

0.453

Youtube (Utube)

1,134,890

2,987,624

0.081

doi:10.1371/journal.pone.0099966.t003

level in the test case. In our experiments, we observed that the interval any value between [0.1,0.2] is good choice for the initial threshold value at level 0 of hierarchy. To be consistent we chose t(0) ~0:15 for all the networks.

cosine distance between each individual cluster and the other clusters (not considering itself). Then, we select the mean of these values as the new threshold for that level to combine clusters. This makes the approach different from the classical single-link clustering where we combine two clusters which are closest to each other at a given level of hierarchy and the average-link agglomerative clustering where we combine based on the average distance between all the clusters. The reason for using mean of these minimum cosine distance values as the new threshold is that if we consider the minimum of all the distance values then there is a risk of only combining 2 clusters at that level. However, it is desirable to combine multiple sets of different clusters. Thus, the new threshold t(h) at level h is set as: (h) (i,j))),i=j: t(h) ~mean(minj (Svalid

Multilevel Hierarchical KSC for Test Nodes The validation set is a representative subset of the whole network as shown in [26]. Thus, the threshold set T can be used to obtain a hierarchical clustering for the entire network. To make the proposed approach self-tuned, we use t(i) wt(0) w0:15, i.0, during the test phase. In order to prevent creating the affinity matrix for the large network we follow a greedy procedure. We select the projection of the first test node and calculate its similarity with the projections of all the test nodes. We then locate the indices (j) of those projections s.t. CosDist(e1 ,ej )vt(1) . If the total number of such indices is less than 10,000 then we put them in cluster C1(1) otherwise we select the first 10,000 indices and place them in cluster C1(1) . This is due to the constraint that the size of a cluster (C1(1) ) at ground level cannot exceed 10,000. We then remove entries corresponding to those projections in Ptest to obtain a reduced matrix. We perform this procedure iteratively until Ptest is empty to obtain C (1) ~fC1(1) , . . . ,Cr(1) g where r is the total number of clusters at hierarchical level 1. After the 1st level, we use the same procedure that was for validation set i.e. creating an affinity matrix at each level using the cluster information along with the threshold set T to obtain the hierarchical structure in an agglomerative fashion. The cluster memberships are propagated iteratively from the 1st level to the highest level of hierarchy. The multilevel hierarchical kernel spectral clustering (MH-KSC) method is described in Figure 3 which refers to Algorithm 2 and Algorithm 3 in Figure 4 and Figure 5 respectively.

ð7Þ

We use this process iteratively till we reach the coarsest cluster where we have 1 cluster containing all the nodes. As a consequence we obtain the hierarchical clustering C ~fC (0) , . . . ,C (maxh) g automatically. As we move from one level of hierarchy to another the value of distance threshold increases since we are merging large clusters at coarser levels of hierarchy. We finally end up with a set of increasing distance thresholds T ~ft(0) , . . . ,t(maxh) g.

Requirements for Feasibility to Large Scale Networks The whole large scale network is used as test set. The latent variable matrix for the test set is obtained by out-of-sample extensions of the predictive KSC model and defined as Ptest ~½e1 , . . . ,eNtest > . Since we use the entire network as test set, therefore, Ntest ~N. The Ptest matrix is a N|(maxk-1) dimensional matrix. So, we can store this Ptest matrix in memory but cannot create an affinity matrix of size N|N due to memory constraints. To make the approach feasible to large scale network we put a condition that the maximum size of a cluster at ground level cannot exceed 10,000 (depending on the available computer memory) and the maximum number of clusters allowed at the ground level is 10,000. This limits the size of the affinity matrix at that level of hierarchy to be less than 10,000610,000. It also effects the choice of the initial value of the distance threshold t(0) . If we set t(0) too high (&0:2) then majority of the nodes at the ground level in the test case will fall in one community resulting in one giant connected component. If we set the value of t(0) too low (%0:1) then we will end up with lot of singleton clusters at the ground PLOS ONE | www.plosone.org

Time Complexity Analysis The two steps in our proposed approach which require the maximum computation time are the out-of-sample extensions for the test set and the creation of the affinity matrix from the ground level clusters. Since we use the entire network as test set the time required for out-of-sample extension is O(Ntr |N). Our greedy procedure to obtain the clustering information at the ground level C (1) requires O(r|N) computations where r is the number of clusters at 1st level of hierarchy for the test set. This is because for each cluster C1(1) [C (1) we remove all the indices belonging in that cluster from the matrix Ptest . As a result the size of Ptest decreases till it reduces to zero resulting in O(r|N) computations. The affinity matrix 9

June 2014 | Volume 9 | Issue 6 | e99966

PLOS ONE | www.plosone.org

10 0.439 3.0e-07

0.524 2.65e-07

2185

CC

9984

k

2.78e-06

0.47

1609

3.1e-06

Q

0.357 1.43e-06

Q

7431

k

CC

1.4e-06

CC

0.156

3133

8808

k 0.105

3.18e-05

Q

0.388

0.30 1.19e-05

1002

2.6e-05

0.567

1171

9.84e-05

Q

2208

k

274 0.693

CC

2.49e-05

CC

2676

k 0.5

8.48e-05

CC

Q

0.682

345

k

Q

1.56e-04

2.47e-05

CC

192 0.764

0.604

358

k

Level 2

Q

Level 1

Metrics

1.3e-06

0.679

529

2.79e-06

0.473

890

6.4e-06

0.158

1964

3.1e-05

0.444

464

3.7e-05

0.586

621

5.88e-05

0.705

202

2.38e-04

0.769

152

Level 3

The best results corresponding to each metric for individual networks are highlighted. doi:10.1371/journal.pone.0099966.t004

Utube

Imdb

Epn

Enr

Cond

PGP

Fb

Network

Hierarchical Organization

0.427

351

0.599 1.0e-06

0.682 2.4e-06

180

4.24e-06

0.503

313

9.5e-06

0.184

0.508

7.6e-06

0.491

131

1.03e-5

0.486

100

6.42e-06

0.521 5.6e-06

130

1.07e-05

0.483

71

1.99e-06

0.514

72

9.0e-06

0.184 7.0e-06

97 0.186

1.651e-04

0.325

76

3.45e-05

0.582

58

2.33e-05

0.306

46

7.46e-06

0.513

46

2.42e-05

0.146

66

2.56e-05

0.328

59

1.43e-05

0.574

41

4.13e-04

0.701

1.07e-04

24

0.729

2.44e-04

0.821

37

Level 9

46

1.76e-04

0.818

43

Level 8

166

2.2e-03

0.43

119

2.37e-05

0.582

80

1.0e-04

0.728

59

2.16e-04

0.812

71

Level 7

200

1.26e-05

0.183

220

2.69e-04

0.454 7.04e-05

163

211

5.86e-05

0.614

0.615 3.6e-05

102

8.03e-05

0.727

83

1.63e-04

0.81

90

Level 6

171

7.2e-05

0.725

129

1.95e-04

0.792

105

Level 5

274

5.6e-06

0.485

468

7.0e-06

0.176

957

5.3e-05

0.451

303

3.52e-05

0.611

324

1.38e-04

0.715

156

1.91e-04

0.789

121

Level 4

Table 4. Results on MH-KSC algorithm on 7 real-life networks using quality metrics Q and CC.

1.55e-04

0.303

26

9.2e-07

0.406

21

7.8e-06

0.006

26

5.46e-05

0.271

48

1.4e-05

0.515

24

4.89e-05

0.698

19

2.4e-04

0.83

21

Level 10

Multilevel Hierarchical KSC for Large Scale Complex Networks

June 2014 | Volume 9 | Issue 6 | e99966

PLOS ONE | www.plosone.org

11 2.22e-06

1.38e-06

CC

11587

3.25e-06

0.714

6964

1.0e-06

0.711

33623

k

0.727

1.0e-06

4544

4.25e-06

0.323

3.98e-06

0.715

6450

1.85e-06

0.729

3910

5.57e-06

0.324

1325

1.88e-05

1.28e-05 1574

0.608

0.546

1433

2.97e-05

1.56e-05 4001

0.7

0.56

1825

8.66e-05

4.95e-05 6732

0.857

4.06e-06

0.715

6369

2.5e-06

0.729

3815

6.75e-06

0.324

1301

4.58e-05

0.613

1237

3.49e-05

0.731

1066

6.8e-05

0.882

154

1.33e-04

9.88e-05 566

0.846

155

Level 5

0.82

225

Level 4

0.705

2392

-

-

-

Level 3

0.591

0.696

-

22613

4.2e-06

0.319

2818

-

-

-

-

-

-

-

-

-

-

-

-

Level 2

Q

-

CC

-

k

Q

1.86e-06

CC

0.287

10351

k

Q

-

CC

-

k -

-

CC

Q

-

-

k

Q

-

CC

k -

-

CC

Q

-

-

k

Q

Level 1

Metrics

The best results are highlighted and ‘-’ is used in of absence of available partitions. doi:10.1371/journal.pone.0099966.t005

Utube

Imdb

Epn

Enr

Cond

PGP

Fb

Network

Hierarchical Organization

Table 5. Results of Louvain method on 7 real-life networks indicating the top 6 levels of hierarchy.

9.96e-06

0.715

6364

2.82e-06

0.729

3804

1.13e-05

0.324

1300

6.48e-05

0.614

1230

4.15e-05

0.732

1011

1.0e-04

0.884

100

1.32e-04

0.847

151

Level 6

Multilevel Hierarchical KSC for Large Scale Complex Networks

June 2014 | Volume 9 | Issue 6 | e99966

PLOS ONE | www.plosone.org

12

Level 2

1.77e-05

8.39e-04

1.83e-05 14170

CC

k

1.23e-05

-

0.698 5.56e-06

0.035 1.38e-06

Q

CC

1.52e-06

0.396

10703

k

18539

4.72e-06

1.23e-06

CC 976

0.707

0.04

Q

-

14308

k

-

4.63e-05

3.97e-06

CC

0.162

1693

-

3238

4.48e-04

5.3e-06

Q

50

0.151

0.015

Q

-

1920

k

-

2.78e-05

1.71e-05

CC

0.483

1084

0.027

0.648

Q

4092

1.74e-04

173

1009

k

1.40e-04

1.66e-04

CC

0.748

0.041

Q

431

85

k

-

65

2.3e-04

2.86e-05

0.862

0.763

0.055

CC

-

Level 1 -

131

Q

Level 1 325

Metrics

k

Hierarchical Info

Hierarchical Info

The best results for each method corresponding to each network is highlighted and ‘-’ represent not applicable cases. doi:10.1371/journal.pone.0099966.t006

Utube

Imdb

Epn

Enr

Cond

PGP

Fb

Network

OSLOM

Infomap

Table 6. Results of Infomap and OSLOM methods.

Level 2

3.1e-07

0.53

6547

1.35e-06

0.045

7469

9.75e-06

0.226

584

1.75e-05

0.317

3149

2.48e-05

0.574

2211

5.32e-05

0.799

143

2.0e-04

0.045

161

Level 3

2.72e-07

0.588

4184

2.03e-06

0.092

2639

2.45e-05

0.239

206

4.96e-05

0.382

2177

3.04e-05

0.615

1745

2.06e-04

0.709

51

2.0e-04

0.133

50

Level 4

6.1e-06

0.487

2003

7.95r-06

0.1

2017

8.2e-06

0.098

30

9.92e-05

0.412

2014

6.56e-05

0.615

1613

1.56e-04

0.709

48

3.0e-04

0.352

27

Level 5

5.69e-06

0.027

1908

1.17e-05

0.115

2082

7.9e-06

0.019

25

7.22e-05

0.442

1970

1.16e-05

0.05

1468

6.64e-05

0.709

45

3.0e-04

0.415

21

Multilevel Hierarchical KSC for Large Scale Complex Networks

June 2014 | Volume 9 | Issue 6 | e99966

Multilevel Hierarchical KSC for Large Scale Complex Networks

Figure 8. Tree based visualization of the multilevel hierarchical organization prevalent in 2 real-life networks. doi:10.1371/journal.pone.0099966.g008 (1) Stest is a symmetric matrix so we only need to compute the upper or the lower triangular matrix. The number of cluster-cluster r|(r{1) where the size of similarities that we have to calculate is 2 each cluster at ground level can be maximum 10,000.

PLOS ONE | www.plosone.org

However, as shown in [25], we can perform the out-of-sample extensions in parallel on n computers and rows of the affinity matrix can also be calculated in parallel thereby reducing the 1 complexity by . n 13

June 2014 | Volume 9 | Issue 6 | e99966

Multilevel Hierarchical KSC for Large Scale Complex Networks

Figure 9. MH-KSC algorithm for the PGP network. Communities with same colour belong to one cluster. doi:10.1371/journal.pone.0099966.g009

evaluate the communities obtained by our proposed MH-KSC approach using an external quality metric like Adjusted Rand Index (ARI) and Variation of Information (VI) [33]. We also evaluate the cluster information using internal cluster quality metrics like Modualrity (Q) [3] and Cut-Conductance (CC) [29]. We compare MH-KSC with Louvain, Infomap and OSLOM. Figures 6 and 7 showcase the result of MH-KSC algorithm on the Net1 and Net2 respectively. From Figures 6a and 7a, we observe the affinity matrices generated corresponding to the test set for Net1 and Net2 respectively. From Figures 6b and 7b, we can observe the communities prevalent in the original network and the communities estimated by MH-KSC method for Net1 and Net2 respectively. In Net1 there are 9 macro communities and 37

Experimental Results We conducted experiments on 2 synthetic datasets obtained from the toolkit in [4] and 7 real-world networks obtained from Stanford SNAP library (http://snap.stanford.edu/data/index. html).

Synthetic Network Experiments The synthetic networks are referred as Net1 and Net2 and have 2,000 and 50,000 nodes respectively. The ground truth for these 2 benchmark networks are known at 2 levels of hierarchy. These 2 levels of hierarchy for these benchmark networks are obtained by using 2 different mixing parameters i.e. m1 and m2 for macro and micro communities. We fixed m1 ~0:1 and m2 ~0:2 in our experiments. Since the ground truth is known beforehand, we PLOS ONE | www.plosone.org

14

June 2014 | Volume 9 | Issue 6 | e99966

Multilevel Hierarchical KSC for Large Scale Complex Networks

Figure 10. Results of Louvain, Infomap and OSLOM methods for PGP network. doi:10.1371/journal.pone.0099966.g010

micro communities while in Net2 there are 13 macro communities and 141 micro communities as depicted by Figures 6b and 7b. Table 1 illustrates the first 10 levels of hierarchy for Net1 and Net2 and evaluates the clusters obtained at each level of hierarchy w.r.t. quality metrics ARI, VI, Q and CC. Higher values of ARI (close to 1) and lower values of VI (close to 0) represent good quality clusters. Both these external quality metrics are normalized as shown in [33]. Higher values of modularity (Q close to 1) and lower values of cut-conductance (CC close to 0) indicate better clustering information.

PLOS ONE | www.plosone.org

Table 2 provides the result of Louvain, Infomap and OSLOM methods and compares it with the best levels of hierarchy for Net1 and Net2 . The Louvain, Infomap and OSLOM methods require multiple runs as in each iteration they result in a different partition. We perform 10 runs and report the mean results in Table 2. From Table 2, it can be observed that the best results for Louvain and Infomap methods generally occur at finer levels of hierarchy w.r.t. to ARI, VI and Q metric. Thus, these two methods work well to identify macro communities. The Louvain method works the better than MH-KSC for Net2 at macro and micro

15

June 2014 | Volume 9 | Issue 6 | e99966

Multilevel Hierarchical KSC for Large Scale Complex Networks

Figure 11. Representing the 2 best levels of hierarchy for Epn network w.r.t. modularity criterion. doi:10.1371/journal.pone.0099966.g011

level. However, it cannot obtain similar quality micro communities when compared with MH-KSC method for Net1 as inferred from Table 2. The Infomap method performs the worst among all the methods w.r.t. detection of communities at coarser levels of granularity. OSLOM performs well w.r.t. to locating both macro communities for Net1 and micro communities for Net2 as observed from Table 2. It performs better than any method w.r.t. locating micro communities for Net2 w.r.t. ARI and VI metric. However, it performs worst while trying to identify the macro communities for the same benchmark network. The MHKSC performs best on Net1 while it performs better w.r.t. locating macro communities for Net2 .

about topological characteristics of these real-life networks. The Fb and Epn networks are social networks, PGP is a trust based network, Cond is a collaboration network between researchers, Enr is an email network, Imdb is an actor-actor collaboration network and Utube is a web graph depicting friendship between the users of Youtube. In case of real-life networks the true hierarchical structure is not known beforehand. Hence, it is important to show whether they exhibit hierarchical organization which can be tested by identifying good quality clusters w.r.t. internal quality metrics like Q and CC at multiple levels of hierarchy. We showcase the results for 10 levels of hierarchy in a bottomup fashion for the MH-KSC method in Table 4. The finest level of hierarchy has all nodes in one community and is not very insightful. Clusters at finer levels of granularity comprises giant connected components. So, it is more meaningful to give more emphasis to fine grained clusters at coarser levels of hierarchy. To

Real-Life Network Experiments We experimented on 7 real-life networks from the Stanford SNAP datasets. These networks are anonymous networks and are converted to undirected and unweighted networks before performing experiments on them. Table 3 provides information PLOS ONE | www.plosone.org

16

June 2014 | Volume 9 | Issue 6 | e99966

Multilevel Hierarchical KSC for Large Scale Complex Networks

Figure 12. Representing the 2 best levels of hierarchy for Epn network w.r.t. cut-conductance criterion. doi:10.1371/journal.pone.0099966.g012

2 levels of hierarchy is quite drastic. This reflects that the Infomap method is not very consistent w.r.t. various quality metrics. We compare the performance of MH-KSC method with OSLOM in detail. From Tables 4 and 5 we observe that the MH-KSC technique outperforms OSLOM w.r.t. both quality metrics for Fb, Enr, Imdb and Utube networks while OSLOM does the same only for Cond network. In case of PGP, Cond and Epn networks OSLOM results in better Q than MH-KSC. However, MH-KSC approach has better CC value for PGP and Epn networks. For large scale networks like Enr, Imdb and Utube, OSLOM cannot identify good quality coarser clusters i.e. number of clusters detected are always .1000.

show that real-life networks exhibit hierarchy we evaluate our proposed MH-KSC approach in Table 4. We compare MH-KSC algorithm with Louvain [15], Infomap [7] and OSLOM [21]. We perform 10 runs for each of these methods as they generate a separate partition each time when they are executed. The mean results of Louvain method is reported in Table 5. Table 6 showcases the results for Infomap and OSLOM method. From Table 5 it is evident that the Louvain method works best w.r.t. the modularity (Q) criterion. This aligns with methodology as it is trying to optimize for Q. However, the Louvain method always performs worse than MH-KSC algorithm w.r.t. cut-conductance CC as observed from Tables 4 and 5. Another issue with the Louvain method is that except for the Fb and PGP networks it is not able to detect (,1000 clusters) high quality clusters at finer levels of granularity. This is attributed to the resolution limit problem suffered by Louvain method. From Table 6 we observe that the Infomap method produces only 2 levels of hierarchy. In most of the cases, the clusters at one level of hierarchy perform good w.r.t. only 1 quality metric except the PGP and Cond networks. The difference between the quality of the clusters at the

PLOS ONE | www.plosone.org

Visualization and Illustrations We provide a tree based visualization of the multilevel hierarchical organization for Fb and Enr networks in Figure 8. The hierarchical structure is depicted as tree for Fb and Enr network in Figures 8a and 8b respectively. We plot the results corresponding to fine, intermediate and coarse levels of hierarchy for PGP network using the software provided in [21]. The software requires all the nodes in the

17

June 2014 | Volume 9 | Issue 6 | e99966

Multilevel Hierarchical KSC for Large Scale Complex Networks

network along with 2 levels of hierarchy. In Figure 9 we plot the results for PGP net corresponding to MH-KSC algorithm using 2 fine, 4 intermediate and 2 coarse levels of the hierarchical organization. For Louvain method we use 3rd and 4th level of hierarchy as inputs for the fine clusters, 4th and 5th level of hierarchy as inputs for intermediate clusters and 5th and 6th level of hierarchy as inputs for plotting coarsest clusters. The Infomap method only generates 2 level of hierarchy which correspond to a plot for coarse clusters. Similarly, for OSLOM we plot coarse and fine clusters. The results for Louvain, Infomap and OSLOM methods are depicted in Figure 10. Figures 9 and 10 show that MH-KSC algorithm allows to depict richer structures than the other methods. It has more flexibility and allows the visualization at coarser, intermediate and finer levels of granularity. From Figures 10a, 10b, 10c and Table 5, we observe that the Louvain method can only detect quality clusters at coarser levels of granularity and cannot detect less than 1,00 communities. While the Infomap method can only locate giant connected components for the PGP network as observed from Figure 10d and Table 6. The OSLOM method also seems to work reasonably well as observed from Figures 10e and 10f. However, it detects fewer levels of hierarchy and thus has less flexibility in terms of selection for the level of hierarchy than the proposed MHKSC approach. We provide a visualization of the 2 best layers of hierarchy for Epn network based on the Q and the CC criterion for MH-KSC,

Louvain, Infomap and OSLOM methods respectively in Figures 11 and 12. The result for Infomap method in both the figures is the same as it only generates 2 levels of hierarchy.

Conclusions We proposed a new multilevel hierarchical kernel spectral clustering (MH-KSC) algorithm. The approach relies on the KSC primal-dual formulation and exploits the structure of the projections in the eigenspace. The projections of the validation set provided a set (T) of increasing distance thresholds. These distance thresholds were used along with affinity matrix obtained from the projections in an iterative procedure to obtain a multilevel hierarchical organization in a bottom-up fashion. We highlighted some of the necessary conditions for the feasibility of the approach to large scale networks. We showed that many reallife networks exhibit hierarchical structure. Our proposed approach was able to identify good quality clusters for both coarse as well as fine levels of granularity. We compared and evaluated our MH-KSC approach against several state-of-the-art large scale hierarchical community detection techniques.

Author Contributions Conceived and designed the experiments: RM RL JS. Performed the experiments: RM. Analyzed the data: RM. Contributed reagents/ materials/analysis tools: RM RL. Wrote the paper: RM.

References 18. Kumpula M, Saramaki J, Kaski K, Kertesz J (2007) Limited resolution and multiresolution methods in complex network community detection. Fluctuation Noise Letters, 7(209). 19. Good H, Montjoye AD, Clauset A (2010) Performance of modularity maximization in practical contexts. Physical Review E, 81(046106). 20. Lanchichinetti A, Fortunato S (2011) Limits of modularity maximization in community detection. Physical Review E, 84(066122). 21. Lanchichinetti A, Radicchi F, Ramasco J, Fortunato S (2011) Finding statistically significant communities in networks. PLOS ONE, 6(e18961). 22. Alzate C, Suykens JAK (2009) Multiway spectral clustering with out-of-sample extensions through weighted kernel PCA. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(2):335–347. 23. Suykens JAK, Gestel TV, Brabanter JD, Moor BD, Vandewalle J (2002) Least squares support vector machines. World Scientific. 24. Langone R, Alzate C, Suykens JAK (2012) Kernel spectral clustering for community detection in complex networks. In IEEE WCCI/IJCNN, pp. 2596– 2603. 25. Mall R, Langone R, Suykens JAK (2013) Kernel Spectral Clustering for Big Data Networks, Entropy (Special Issue: Big Data), 15(5):1567–1586. 26. Mall R, Langone R, Suykens JAK (2013) FURS: Fast and Unique Representative Subset selection retaining large scale community structure. Social Network Analysis and Mining, 3(4):1075–1095. 27. Mall R, Langone R, Suykens JAK (2013) Self-Tuned Kernel Spectral Clustering for Large Scale Networks. In Proceedings of the IEEE International Conference on Big Data (IEEE BigData 2013), pp. 385–393. 28. Alzate C, Suykens JAK (2012) Hierarchical kernel spectral clustering. Neural Networks, 35:21–30. 29. Dhillon I, Guan Y, Kulis B (2007) Weighted Graph Cuts without Eigenvectors: A Multilevel Approach. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(11):1944–1957. 30. Karypis G, Kumar V (1999) A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J Scientific Computing, 20(1):359–392. 31. Kushnir D, Galun M, Brandt A (2006) Fast multiscale clustering and manifold identification. Pattern Recognition, 39(10):1876–1891. 32. Leskovec J, Faloutsos C (2006) Sampling from large graphs. KDD, pp. 631–636. 33. Rabbany R, Takaffoli M, Fagnan J, Zaiane OR, Campello RJGB (2012) Relative Validity Criteria for Community Mining Algorithms. International Conference on Advances in Social Networks Analysis and Mining (ASONAM), pp. 258–265.

1. Baraba´si A, Albert R (1999) Emergence of scaling in random networks. Science, 286(5439):509–512. 2. Clauset A, Rohilla SC, Newman ME (2009) Power-law distribution in empirical data. SIAM Review, 51:661–703. 3. Girvan M, Newman ME (2002) Community structure in social and biological networks. PNAS, 99(12):7821–7826. 4. Fortunato S (2009) Community detection in graphs. Physics Reports 486:75– 174. 5. Danaon L, Dia´z-Guilera A, Duch J, Arenas A (2005) Comparing community structure identification. Journal of Statistical Mechanics: Theory and Experiment, 09(P09008+). 6. Clauset A, Newman ME, Moore C (2004) Finding community structure in very large scale networks. Physical Review E, 70(066111). 7. Rosvall M, Bergstrom C (2008) Maps of random walks on complex networks reveal community structure. PNAS, 105:1118–1123. 8. Schaeffer S (2006) Algorithms for Nonuniform Networks. Phd thesis, Helsinki University of Technology. 9. Lancichinetti A, Fortunato S (2009) Community detection algorithms: a comparitive analysis. Physical Review E, 80(056117). 10. Ng AY, Jordan MI, Weiss Y (2002) On spectral clustering: analysis and an algorithm. Dietterich, TG, Becker, S, Ghahramani, Z, editors. MIT Press: Cambridge, MA, In Proceedings of the Advances in Neural Information Processing Systems, pp. 849–856. 11. Shi J, Malik J (2000) Normalized cuts and image segmentation.IEEE Transactions on Pattern Analysis and Intelligence, 22(8):888–905. 12. von Luxburg U (2007) A tutorial on Spectral clustering. Stat. Comput, 17:395– 416. 13. Chung FRK (1997) Spectral Graph Theory. American Mathematical Society. 14. Zelnik-Manor L, Perona P (2005) Self-tuning spectral clustering. Saul, LK, Weiss, Y, Bottou, L, editors; MIT Press: Cambridge, MA. Advances in Neural Information Processing Systems, pp. 1601–1608. 15. Blondel V, Guillaume J, Lambiotte R, Lefebvre L (2008) Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment, 10:P10008. 16. Newman ME (2004) Analysis of weighted networks. Physical Review E, 70(056131). 17. Fortunato S, Barthe´lemy M (2007) Resolution limit in community detection. PNAS, 104(36).

PLOS ONE | www.plosone.org

18

June 2014 | Volume 9 | Issue 6 | e99966

Lihat lebih banyak...

Comentarios

Copyright © 2017 DATOSPDF Inc.