mm   40             60          80    100       120


          Entropy Based
40        Community Detection in
          Augmented Social
          Networks
60
          J. Cruz1
          1 LUSSI
                            C. Bothorel1    F. Poulet2
                 Department
          Telecom – Bretagne
          France
          2 IRISA
80        Rennes 1 University
          France
Outline
         mm              40             60   80                100                120
   1     Introduction
            Motivation
            Related Work
           40
            Augmented Networks

   2     Clustering Algorithm
          60
   3     Experiments and Results

   4     Conclusions
          80


page 2         CRUZ, BOTHOREL, POULET             Entropy Based Community Detection
Motivation
Introduction    Motivation

                  mm           40          60            80                100                120
               A social network is composed of actors,   An augmented network:
               persons or organizations, and the
               links between them.
               Social networks have been “simplified”
                   40
               to fit into graph structures, leaving
               behind any additional information...       Node Attributes

               That information correspond to the
               semantic, and yet social, aspects of
               the 60
                   network


         The question is:
         How we can use both, the graph and the social information
             80
         to detect communities?

     page 3              CRUZ, BOTHOREL, POULET               Entropy Based Community Detection
Related Work
Introduction   Related Work

                mm      40                      60     80                100                120
         Data Clustering
         Unsupervised clustering algorithms using some (dis)similarity
         measure between points in some n−dimensional space.
                 Hierarchical clustering [1].
                 40
                 k −means, fuzzy c−means [1].
                 Self–organizing maps [2]
         Communities Detection
         Algorithms designed to find community structures in graphs using
              60
         information from edges.
                 Modularity optimization: Newman [3], Blondel [4] . . .
                 Overlapping communities using GAs Pizzuti [5] . . .
                 Community detection using attributes and structural information
                 80
                 [6].

     page 4            CRUZ, BOTHOREL, POULET               Entropy Based Community Detection
Quality Measures / Data Types
Introduction   Related Work

                 mm
               Type              40 Objective 60        80         100
                                                              Examples                    120
                           Reduce the distance between      Manhattan L1
                          the members of the same group      Euclidean L2
               Data
                             while the distance between     Chebyshev L∞
                 40             groups is increased.          Entropy H
                           Increase the number of edges       Coverage γ
                            within each community while     Conductance ϕ
           Graphs
                           the number of edges between     Performance perf
                              communities is reduced.        Modularity Q
                 60
         The selected measures:
     Entropy measures the disorder of            Modularity measures the fraction
     each group: the more similar the            of edges falling into the groups
     objects, more ordered is the group          minus the minimum number of
             80
     (a.k.a less entropy).                       expected edges between nodes
                                                 [7].

     page 5            CRUZ, BOTHOREL, POULET             Entropy Based Community Detection
Semantic Information
Introduction   Augmented Networks

                mm    40        60         80                          100                120
         Given an augmented network G (V , E, FV ):
                                                          ∗
                Given a subset of features of the nodes: FV ∈ P (FV ),
                Each node is associated with a vector ξ of f attributes.
                ξ ∈ Rf
                40
                The union of all the vectors ξFV is the vectorial
                                               ∗

                representation of the node set:
                                  ASFV∗
            60
        Node Attr 1            Attr 2 . . .    Attr f   The attributes set AS is
         1     ξ11              ξ12     ···     ξ1f     the matricial representa-
         2     ξ21              ξ22     ···     ξ2f     tion of the augmented in-
          .
          .     .
                .                .
                                 .      ..       .
                                                 .      formation from the net-
          .     .                .         .     .
            80                                          work.
         n     ξn1              ξn2     ···     ξnf

     page 6           CRUZ, BOTHOREL, POULET              Entropy Based Community Detection
Data Entropy
Introduction   Augmented Networks

                mm                  40               60            80                100                120
         Given a group C of N = |C| elements, the entropy H (C) of the
         group is given by:
                                     N−1      N
                 40 H (C) = −                        sij ln sij + 1 − sij ln 1 − sij
                                         i=1 j=i+1

         where sij is a similarity measure of nodes i and j .
         Similarity measures?
             60
         Entropy measures the (dis)order of a partition, however it is
         necessary to calculate the distance between the nodes. This is
         made using metrics like the Cosine distance and the Jaccard
         distance among others.
              80


     page 7           CRUZ, BOTHOREL, POULET                            Entropy Based Community Detection
General Architecture
Clustering Algorithm

             mm                   40             60   This80 the general architec-
                                                             is      100        120
          Augmented
                                                      ture of the algorithm, which
           Network
                                                      finds communities using struc-
                                                      tural and semantic criteria at
                 40                                   the same time extracted from
                                                      G (V , E, FV )


                 60




                 80


     page 8             CRUZ, BOTHOREL, POULET              Entropy Based Community Detection
General Architecture
Clustering Algorithm

             mm                   40            60       80                100                120
                                    Modularity
          Augmented G
                                   Optimization
           Network                                   Using the social graph G (V , E)
                                     First Step
                                                     the algorithm finds a first parti-
                 40                     First        tion C0 with optimal modularity
                                       Partition
                                         C0


                 60




                 80


     page 8             CRUZ, BOTHOREL, POULET                Entropy Based Community Detection
General Architecture
Clustering Algorithm

                mm                40             60         80                100                120




                                                      .
                 40                                   .
                                                      .




                 60
     Using the structure of              The algorithm takes a       ...the node is assigned
     the social network in               random node and put         to that community; the
     which each node is a                   it into a random              node is returned
          community.                       community, if the          otherwise. The result
            80
                                         movement increases              is the partition C0
                                            the modularity...
     page 8             CRUZ, BOTHOREL, POULET                   Entropy Based Community Detection
General Architecture
Clustering Algorithm

             mm                   40            60
                                    Modularity       The 80entropy optimization 120
                                                                     100
                                                                                  al-
          Augmented G
                                   Optimization      gorithm uses the partition C0
           Network
                                     First Step      as initial configuration and the
                                                     PoVFV from the augmented
                                                           ∗
                 40                     First        network to move nodes across
                                       Partition     the groups.
                                         C0

                                                      Entropy
                 60 ASFV
                       ∗
                                    Entropy Op-
                                                      Partition
                                     timization
                                                        CH


                 80


     page 8             CRUZ, BOTHOREL, POULET              Entropy Based Community Detection
General Architecture
Clustering Algorithm

                mm                40             60           80                100                120




                                                      .
                 40                                   .
                                                      .




                 60
        Given an initial                  Take a random point           ...take the point back
      partition C0 from the                and insert it into a          to its original group
        first step of the                  random group. If the          otherwise. The result
           modularity                      entropy is reduced,             is the partition CH
              80
         optimization...                  leave the point in its
                                              new group...
     page 8             CRUZ, BOTHOREL, POULET                     Entropy Based Community Detection
General Architecture
Clustering Algorithm

             mm                   40            60       80                100                120
                                    Modularity
          Augmented G                                The partition CH has the same
                                   Optimization
           Network                                   number of groups as C0 but
                                     First Step
                                                     with a different configuration.
                 40                     First        The modularity optimization al-
                                       Partition     gorithm will continue with CH .
                                         C0

                                                      Entropy
                 60 ASFV
                       ∗
                                    Entropy Op-                               Community
                                                      Partition
                                     timization                               Aggregation
                                                        CH


                 80                                                        Final Partition Ck


     page 8             CRUZ, BOTHOREL, POULET                Entropy Based Community Detection
General Architecture
Clustering Algorithm

                mm                40                60            80                100                120

                                                 Ent ropy
                                                 Opt imizat ion

                 40
               Communit y
               Det ect ion

                                                                        Communit y
                 60                                                     Aggregat ion




                 80
         Adapted from [4]

     page 8             CRUZ, BOTHOREL, POULET                         Entropy Based Community Detection
Experimental Setup
Experiments and Results

               mm               40             60          80                100                120
        Data used:
     Each graph in this data set                The graph contains 6386 nodes
     contains a set of semantic                 and 435324 edges. Has an initial
     information for each node:
            40                                  modularity of −2.8629 × 10−4 .
              Student faculty                   In each case, the initial entropy
                                                has been calculated using different
              Gender                            criteria:
              Major
               60                                   AS   Feature         H0          Classes
                                                     1   Gender        0.2286            3
              Second major/minor                     2    Major        0.2318           77
              House                             30 executions of the experiments
                                                were performed for each point of view.
              Year
                80
              High school
     page 9           CRUZ, BOTHOREL, POULET                    Entropy Based Community Detection
Results
Experiments and Results

               mm                40            60      80                100                120
               There is a compromise between the entropy and the
               modularity.
               There are 7 communities for each attribute set AS:
                40 From 3 classes in AS1
                  •
                  • From 77 classes in AS
                                         2

        Results – Measures
          AS              Exp.        Average Q          Average Entropy
            60
          AS1             CFU         0.4180 (±0)           0.2286 (±0)
                     CFU+Ent      0.2565 (±0.006065)   0.1381 (±0.0025741)
          AS2          CFU            0.4180 (±0)           0.2318 (±0)
                     CFU+Ent      0.2440 (±0.004242)    0.1356 (±0.001493)
                80


    page 10           CRUZ, BOTHOREL, POULET                Entropy Based Community Detection
Results
Experiments and Results

               mm               40             60      80                 100                    120




                                                            r
                                                           y
                                                        nde




                                                       use
                                                       cult


                                                       jor
                                                        or


                                                        ar

                                                         .
                                                    H. S
                                                    Min


                                                    Ye
                                                    Ge
                                                    Ma
                                                    Fa




                                                    Ho
        Results – Rand Index
             Pair                    Rand Index
        AS∅ − ASGender                 0.4232
            40
         AS∅ − ASMajor                 0.3070
       ASGender − ASMajor              0.3919
              Each partition configuration is




                                                                                             r
              different for each attribute set




                                                                                          nde
                                                       jor
                                                     Ma




                                                                                        Ge
                 60
              Non–topological information
              change the result of the
              clustering process

                80


    page 10           CRUZ, BOTHOREL, POULET                 Entropy Based Community Detection
Algorithm Complexity Considerations
Experiments and Results

               mm               40             60                                 80                   100                        120
                                                                                                Algorithm Execution Time
              The complexity of entropy                                   60000

              calculation is, in general
                                                                          50000
              O n2 × f (n points and f
              features).




                                                    Execution Time (ms)
                 40                                                       40000


              Using only the contribution of                              30000
              a point to the group entropy,
                                                                          20000
              the complexity is reduced to a
              near–linear behavior.                                       10000
                 60
              Using a fixed number of                                          0
                                                                                  0       20         40          60          80         100
              nodes (6386) and varying                                                             Number of Features
              only the number of features                                                      Simple Matching Coefficient
                                                                                                         Cosine Distance
              this linear behavior is
              observed.
                 80


    page 11           CRUZ, BOTHOREL, POULET                                           Entropy Based Community Detection
Algorithm Complexity Considerations
Experiments and Results
                                                                                            Algorithm Memory Usage

               mm               40             60                      900
                                                                                 80                      100                 120
                                                                       800
              In general, the memory




                                                    Memory Used (Mb)
                                                                       700

              usage is linear, however, the                            600

              SMS graph is stepper than                                500


              the cosine distance.
                 40                                                    400

                                                                       300
              For the SMC near the 40                                        0        20         40          60        80   100

              features, the memory used                                                    SMC
                                                                                               Number of Features
                                                                                                          Memory Baseline
              grows, coinciding with the
                                                                       900
              execution time increase.
                 60                                                    800

              The behavior of the graphs is
                                                    Memory Used (Mb)
                                                                       700

              due to the Java’s memory                                 600

              management system.                                       500

              Anyway, the usage never                                  400

              explodes.
                 80                                                    300

                                                                             0        20         40          60        80   100
                                                                                               Number of Features
                                                                                 Cosine Distance          Memory Baseline

    page 11           CRUZ, BOTHOREL, POULET                                          Entropy Based Community Detection
Conclusions
Conclusions

              mm             40             60      80                100                120
              Each type of information in the augmented network has
              different representations different and measures of
              similarity: those measures behave oppositely.
              A entropy based algorithm has been proposed to cluster
              40
              an augmented network.
              Using different points of view it is possible to have different
              partition configuration from the same social graph.
              The overall complexity of the algorithm is linear on the
              60
              number of features used to calculate the entropy.
              The memory used increases although it does not explode
              when the number attributes is increased.
              80


    page 12        CRUZ, BOTHOREL, POULET                Entropy Based Community Detection
mm             40             60   80                100                120


                               Thank you.
          40



          Do you have any questions?
          60




          80


page 13        CRUZ, BOTHOREL, POULET             Entropy Based Community Detection
Bibliography I
Appendix   Bibliography

              mm                40             60   80                100                120
              M. Kantardzic, Data Mining: Concepts, Models, Methods,
              and Algorithms.
              Wiley-IEEE Press, 1 ed., Oct. 2002.
               40
              T. Kohonen, Self-Organizing Maps.
              Springer, 1997.
              M. E. Newman, “Scientific collaboration networks. ii.
              shortest paths, weighted networks, and centrality.,” Physical
               60
              Review. E, Statistical Nonliner and Soft Matter Physics,
              vol. 64, p. 7, July 2001.


                80


    page 14           CRUZ, BOTHOREL, POULET             Entropy Based Community Detection
Bibliography II
Appendix   Bibliography

              mm                40             60   80                100                120
              V. D. Blondel, J.-L. Guillaume, R. Lambiotte, and
              E. Lefebvre, “Fast unfolding of communities in large
              networks,” Journal of Statistical Mechanics: Theory and
              Experiment, vol. 2008, no. 10, p. P10008 (12pp), 2008.
               40
              C. Pizzuti, “Overlapped community detection in complex
              networks,” in GECCO ’09: Proceedings of the 11th Annual
              conference on Genetic and evolutionary computation, (New
              York, NY, USA), pp. 859–866, ACM, 2009.
               60
              Y. Zhou, H. Cheng, and J. X. Yu, “Graph clustering based
              on structural/attribute similarities,” Proc. VLDB Endow.,
              vol. 2, pp. 718–729, August 2009.
                80


    page 15           CRUZ, BOTHOREL, POULET             Entropy Based Community Detection
Bibliography III
Appendix   Bibliography

              mm                40             60   80                100                120
              M. E. J. Newman and M. Girvan, “Finding and evaluating
              community structure in networks,” Physical Review. E,
              Statistical Nonliner and Soft Matter Physics, vol. 69,
              p. 026113, Feb 2004.
               40
              T. Li, S. Ma, and M. Ogihara, “Entropy-based criterion in
              categorical clustering,” in Proceedings of the twenty-first
              international conference on Machine learning, ICML ’04,
              (New York, NY, USA), pp. 68–, ACM, 2004.
               60




                80


    page 16           CRUZ, BOTHOREL, POULET             Entropy Based Community Detection
Entropy Minimization Algorithm [8]
Appendix   Bibliography

               mm               40             60       80                100                120
       Given a partition C:

                                                                                   A
       1. Calculate the set’s initial entropy
                40
       2. Take a random point from a random
          group and insert it into other random                                                  B
          cluster
       3. Has the entropy improved?
                60
              3.1 Yes: leave the point in its new cluster
              3.2 No: take back the point to its original
                   cluster
                                                                                        C
       4. Go to 2 until no further changes can be
           80
          made

    page 17           CRUZ, BOTHOREL, POULET                 Entropy Based Community Detection
Entropy Minimization Algorithm [8]
Appendix   Bibliography

               mm               40             60       80                100                120
       Given a partition C:

                                                                                   A
       1. Calculate the set’s initial entropy
                40
       2. Take a random point from a random
          group and insert it into other random                                                  B
          cluster
       3. Has the entropy improved?
                60
              3.1 Yes: leave the point in its new cluster
              3.2 No: take back the point to its original
                   cluster
                                                                                        C
       4. Go to 2 until no further changes can be
           80
          made

    page 17           CRUZ, BOTHOREL, POULET                 Entropy Based Community Detection
Entropy Minimization Algorithm [8]
Appendix   Bibliography

               mm               40             60       80                100                120
       Given a partition C:

                                                                                   A
       1. Calculate the set’s initial entropy
                40
       2. Take a random point from a random
          group and insert it into other random                                                  B
          cluster
       3. Has the entropy improved?
                60
              3.1 Yes: leave the point in its new cluster
              3.2 No: take back the point to its original
                   cluster
                                                                                        C
       4. Go to 2 until no further changes can be
           80
          made

    page 17           CRUZ, BOTHOREL, POULET                 Entropy Based Community Detection
Entropy Minimization Algorithm [8]
Appendix   Bibliography

               mm               40             60       80                100                120
       Given a partition C:

                                                                                   A
       1. Calculate the set’s initial entropy
                40
       2. Take a random point from a random
          group and insert it into other random                                                  B
          cluster
       3. Has the entropy improved?
                60
              3.1 Yes: leave the point in its new cluster
              3.2 No: take back the point to its original
                   cluster
                                                                                        C
       4. Go to 2 until no further changes can be
           80
          made

    page 17           CRUZ, BOTHOREL, POULET                 Entropy Based Community Detection
Entropy Minimization Algorithm [8]
Appendix   Bibliography

               mm               40             60       80                100                120
       Given a partition C:

                                                                                   A
       1. Calculate the set’s initial entropy
                40
       2. Take a random point from a random
          group and insert it into other random                                                  B
          cluster
       3. Has the entropy improved?
                60
              3.1 Yes: leave the point in its new cluster
              3.2 No: take back the point to its original
                   cluster
                                                                                        C
       4. Go to 2 until no further changes can be
           80
          made

    page 17           CRUZ, BOTHOREL, POULET                 Entropy Based Community Detection

More Related Content

PDF
Community detection in social networks[1]
PPTX
Community detection algorithms
PPTX
Community detection
PPT
Clique-based Network Clustering
PPTX
Community Detection
PDF
Community Detection in Social Media
PDF
AN GROUP BEHAVIOR MOBILITY MODEL FOR OPPORTUNISTIC NETWORKS
PDF
Community Detection with Networkx
Community detection in social networks[1]
Community detection algorithms
Community detection
Clique-based Network Clustering
Community Detection
Community Detection in Social Media
AN GROUP BEHAVIOR MOBILITY MODEL FOR OPPORTUNISTIC NETWORKS
Community Detection with Networkx

What's hot (19)

PPTX
Network sampling, community detection
PDF
Social network analysis basics
PPTX
Scalable community detection with the louvain algorithm
PPTX
Using content and interactions for discovering communities in
PDF
16 zaman nips10_workshop_v2
PDF
Action and content based Community Detection in Social Networks
PPTX
Recomendation system: Community Detection Based Recomendation System using Hy...
PDF
Community detection in graphs
PDF
Community detection
PDF
MODELING SOCIAL GAUSS-MARKOV MOBILITY FOR OPPORTUNISTIC NETWORK
PPTX
17 Statistical Models for Networks
PDF
Detecting Community Structures in Social Networks by Graph Sparsification
PDF
Taxonomy and survey of community
PDF
Exploratory social network analysis with pajek
PDF
Graph Community Detection Algorithm for Distributed Memory Parallel Computing...
PPTX
08 Statistical Models for Nets I, cross-section
PDF
MobiCom CHANTS
PDF
A collaborative contact based watchdog for detecting selfish nodes in coopera...
PDF
F1074547
Network sampling, community detection
Social network analysis basics
Scalable community detection with the louvain algorithm
Using content and interactions for discovering communities in
16 zaman nips10_workshop_v2
Action and content based Community Detection in Social Networks
Recomendation system: Community Detection Based Recomendation System using Hy...
Community detection in graphs
Community detection
MODELING SOCIAL GAUSS-MARKOV MOBILITY FOR OPPORTUNISTIC NETWORK
17 Statistical Models for Networks
Detecting Community Structures in Social Networks by Graph Sparsification
Taxonomy and survey of community
Exploratory social network analysis with pajek
Graph Community Detection Algorithm for Distributed Memory Parallel Computing...
08 Statistical Models for Nets I, cross-section
MobiCom CHANTS
A collaborative contact based watchdog for detecting selfish nodes in coopera...
F1074547
Ad

Viewers also liked (7)

PDF
Poster presented at EGC 2011
PDF
ToTeM : une méthode de détection de communautés adaptée à la fouille de résea...
PDF
A Social Network Based Model for e-Mail Information Visualization - Sunbelt ...
PDF
Fisrt project poster
PDF
Presentation Journée Thématique - Fouille de Grands Graphes
PPTX
Poster Journée F&R
PPTX
Presentation OOC2013
Poster presented at EGC 2011
ToTeM : une méthode de détection de communautés adaptée à la fouille de résea...
A Social Network Based Model for e-Mail Information Visualization - Sunbelt ...
Fisrt project poster
Presentation Journée Thématique - Fouille de Grands Graphes
Poster Journée F&R
Presentation OOC2013
Ad

Similar to Entropy based algorithm for community detection in augmented networks (20)

PPTX
Collaborative Similarity Measure for Intra-Graph Clustering
PDF
Community structure in social and biological structures
PPTX
Community Extracting Using Intersection Graph and Content Analysis in Complex...
PDF
Iscc2011 ioannis stavrakakis_ keynote
PPTX
Presentation on Graph Clustering (vldb 09)
PDF
Node similarity
PDF
Formulation of modularity factor for community detection applying
PPTX
Community Structure-based Audience Expansion for Digital Advertising
PPTX
Networkx & Gephi Tutorial #Pydata NYC
PDF
DAOR - Bridging the Gap between Community and Node Representations: Graph Emb...
PDF
Parallel Community Detection for Massive Graphs
PDF
Social Networks
PDF
Application Areas of Community Detection: A Review : NOTES
PDF
Community Detection
PPTX
Computational Social Science, Lecture 06: Networks, Part II
PPTX
Algorithm in Social network of graph and social network analysis
PDF
IRJET - Exploring Agglomerative Spectral Clustering Technique Employed for...
PDF
Machine Learning
PDF
Community Detection in Networks Using Page Rank Vectors
PDF
Community Detection in Networks Using Page Rank Vectors
Collaborative Similarity Measure for Intra-Graph Clustering
Community structure in social and biological structures
Community Extracting Using Intersection Graph and Content Analysis in Complex...
Iscc2011 ioannis stavrakakis_ keynote
Presentation on Graph Clustering (vldb 09)
Node similarity
Formulation of modularity factor for community detection applying
Community Structure-based Audience Expansion for Digital Advertising
Networkx & Gephi Tutorial #Pydata NYC
DAOR - Bridging the Gap between Community and Node Representations: Graph Emb...
Parallel Community Detection for Massive Graphs
Social Networks
Application Areas of Community Detection: A Review : NOTES
Community Detection
Computational Social Science, Lecture 06: Networks, Part II
Algorithm in Social network of graph and social network analysis
IRJET - Exploring Agglomerative Spectral Clustering Technique Employed for...
Machine Learning
Community Detection in Networks Using Page Rank Vectors
Community Detection in Networks Using Page Rank Vectors

Recently uploaded (20)

PDF
Ensemble model-based arrhythmia classification with local interpretable model...
PDF
Auditboard EB SOX Playbook 2023 edition.
PPTX
SGT Report The Beast Plan and Cyberphysical Systems of Control
PDF
ment.tech-Siri Delay Opens AI Startup Opportunity in 2025.pdf
PDF
Early detection and classification of bone marrow changes in lumbar vertebrae...
PDF
Transform-Quality-Engineering-with-AI-A-60-Day-Blueprint-for-Digital-Success.pdf
PDF
The-Future-of-Automotive-Quality-is-Here-AI-Driven-Engineering.pdf
PDF
A symptom-driven medical diagnosis support model based on machine learning te...
PDF
Transform-Your-Supply-Chain-with-AI-Driven-Quality-Engineering.pdf
PDF
AI.gov: A Trojan Horse in the Age of Artificial Intelligence
PPTX
Presentation - Principles of Instructional Design.pptx
PDF
4 layer Arch & Reference Arch of IoT.pdf
PPTX
AI-driven Assurance Across Your End-to-end Network With ThousandEyes
PDF
EIS-Webinar-Regulated-Industries-2025-08.pdf
PDF
CEH Module 2 Footprinting CEH V13, concepts
PDF
The-2025-Engineering-Revolution-AI-Quality-and-DevOps-Convergence.pdf
PDF
“The Future of Visual AI: Efficient Multimodal Intelligence,” a Keynote Prese...
PDF
Build Real-Time ML Apps with Python, Feast & NoSQL
PDF
LMS bot: enhanced learning management systems for improved student learning e...
PDF
Advancing precision in air quality forecasting through machine learning integ...
Ensemble model-based arrhythmia classification with local interpretable model...
Auditboard EB SOX Playbook 2023 edition.
SGT Report The Beast Plan and Cyberphysical Systems of Control
ment.tech-Siri Delay Opens AI Startup Opportunity in 2025.pdf
Early detection and classification of bone marrow changes in lumbar vertebrae...
Transform-Quality-Engineering-with-AI-A-60-Day-Blueprint-for-Digital-Success.pdf
The-Future-of-Automotive-Quality-is-Here-AI-Driven-Engineering.pdf
A symptom-driven medical diagnosis support model based on machine learning te...
Transform-Your-Supply-Chain-with-AI-Driven-Quality-Engineering.pdf
AI.gov: A Trojan Horse in the Age of Artificial Intelligence
Presentation - Principles of Instructional Design.pptx
4 layer Arch & Reference Arch of IoT.pdf
AI-driven Assurance Across Your End-to-end Network With ThousandEyes
EIS-Webinar-Regulated-Industries-2025-08.pdf
CEH Module 2 Footprinting CEH V13, concepts
The-2025-Engineering-Revolution-AI-Quality-and-DevOps-Convergence.pdf
“The Future of Visual AI: Efficient Multimodal Intelligence,” a Keynote Prese...
Build Real-Time ML Apps with Python, Feast & NoSQL
LMS bot: enhanced learning management systems for improved student learning e...
Advancing precision in air quality forecasting through machine learning integ...

Entropy based algorithm for community detection in augmented networks

  • 1. mm 40 60 80 100 120 Entropy Based 40 Community Detection in Augmented Social Networks 60 J. Cruz1 1 LUSSI C. Bothorel1 F. Poulet2 Department Telecom – Bretagne France 2 IRISA 80 Rennes 1 University France
  • 2. Outline mm 40 60 80 100 120 1 Introduction Motivation Related Work 40 Augmented Networks 2 Clustering Algorithm 60 3 Experiments and Results 4 Conclusions 80 page 2 CRUZ, BOTHOREL, POULET Entropy Based Community Detection
  • 3. Motivation Introduction Motivation mm 40 60 80 100 120 A social network is composed of actors, An augmented network: persons or organizations, and the links between them. Social networks have been “simplified” 40 to fit into graph structures, leaving behind any additional information... Node Attributes That information correspond to the semantic, and yet social, aspects of the 60 network The question is: How we can use both, the graph and the social information 80 to detect communities? page 3 CRUZ, BOTHOREL, POULET Entropy Based Community Detection
  • 4. Related Work Introduction Related Work mm 40 60 80 100 120 Data Clustering Unsupervised clustering algorithms using some (dis)similarity measure between points in some n−dimensional space. Hierarchical clustering [1]. 40 k −means, fuzzy c−means [1]. Self–organizing maps [2] Communities Detection Algorithms designed to find community structures in graphs using 60 information from edges. Modularity optimization: Newman [3], Blondel [4] . . . Overlapping communities using GAs Pizzuti [5] . . . Community detection using attributes and structural information 80 [6]. page 4 CRUZ, BOTHOREL, POULET Entropy Based Community Detection
  • 5. Quality Measures / Data Types Introduction Related Work mm Type 40 Objective 60 80 100 Examples 120 Reduce the distance between Manhattan L1 the members of the same group Euclidean L2 Data while the distance between Chebyshev L∞ 40 groups is increased. Entropy H Increase the number of edges Coverage γ within each community while Conductance ϕ Graphs the number of edges between Performance perf communities is reduced. Modularity Q 60 The selected measures: Entropy measures the disorder of Modularity measures the fraction each group: the more similar the of edges falling into the groups objects, more ordered is the group minus the minimum number of 80 (a.k.a less entropy). expected edges between nodes [7]. page 5 CRUZ, BOTHOREL, POULET Entropy Based Community Detection
  • 6. Semantic Information Introduction Augmented Networks mm 40 60 80 100 120 Given an augmented network G (V , E, FV ): ∗ Given a subset of features of the nodes: FV ∈ P (FV ), Each node is associated with a vector ξ of f attributes. ξ ∈ Rf 40 The union of all the vectors ξFV is the vectorial ∗ representation of the node set: ASFV∗ 60 Node Attr 1 Attr 2 . . . Attr f The attributes set AS is 1 ξ11 ξ12 ··· ξ1f the matricial representa- 2 ξ21 ξ22 ··· ξ2f tion of the augmented in- . . . . . . .. . . formation from the net- . . . . . 80 work. n ξn1 ξn2 ··· ξnf page 6 CRUZ, BOTHOREL, POULET Entropy Based Community Detection
  • 7. Data Entropy Introduction Augmented Networks mm 40 60 80 100 120 Given a group C of N = |C| elements, the entropy H (C) of the group is given by: N−1 N 40 H (C) = − sij ln sij + 1 − sij ln 1 − sij i=1 j=i+1 where sij is a similarity measure of nodes i and j . Similarity measures? 60 Entropy measures the (dis)order of a partition, however it is necessary to calculate the distance between the nodes. This is made using metrics like the Cosine distance and the Jaccard distance among others. 80 page 7 CRUZ, BOTHOREL, POULET Entropy Based Community Detection
  • 8. General Architecture Clustering Algorithm mm 40 60 This80 the general architec- is 100 120 Augmented ture of the algorithm, which Network finds communities using struc- tural and semantic criteria at 40 the same time extracted from G (V , E, FV ) 60 80 page 8 CRUZ, BOTHOREL, POULET Entropy Based Community Detection
  • 9. General Architecture Clustering Algorithm mm 40 60 80 100 120 Modularity Augmented G Optimization Network Using the social graph G (V , E) First Step the algorithm finds a first parti- 40 First tion C0 with optimal modularity Partition C0 60 80 page 8 CRUZ, BOTHOREL, POULET Entropy Based Community Detection
  • 10. General Architecture Clustering Algorithm mm 40 60 80 100 120 . 40 . . 60 Using the structure of The algorithm takes a ...the node is assigned the social network in random node and put to that community; the which each node is a it into a random node is returned community. community, if the otherwise. The result 80 movement increases is the partition C0 the modularity... page 8 CRUZ, BOTHOREL, POULET Entropy Based Community Detection
  • 11. General Architecture Clustering Algorithm mm 40 60 Modularity The 80entropy optimization 120 100 al- Augmented G Optimization gorithm uses the partition C0 Network First Step as initial configuration and the PoVFV from the augmented ∗ 40 First network to move nodes across Partition the groups. C0 Entropy 60 ASFV ∗ Entropy Op- Partition timization CH 80 page 8 CRUZ, BOTHOREL, POULET Entropy Based Community Detection
  • 12. General Architecture Clustering Algorithm mm 40 60 80 100 120 . 40 . . 60 Given an initial Take a random point ...take the point back partition C0 from the and insert it into a to its original group first step of the random group. If the otherwise. The result modularity entropy is reduced, is the partition CH 80 optimization... leave the point in its new group... page 8 CRUZ, BOTHOREL, POULET Entropy Based Community Detection
  • 13. General Architecture Clustering Algorithm mm 40 60 80 100 120 Modularity Augmented G The partition CH has the same Optimization Network number of groups as C0 but First Step with a different configuration. 40 First The modularity optimization al- Partition gorithm will continue with CH . C0 Entropy 60 ASFV ∗ Entropy Op- Community Partition timization Aggregation CH 80 Final Partition Ck page 8 CRUZ, BOTHOREL, POULET Entropy Based Community Detection
  • 14. General Architecture Clustering Algorithm mm 40 60 80 100 120 Ent ropy Opt imizat ion 40 Communit y Det ect ion Communit y 60 Aggregat ion 80 Adapted from [4] page 8 CRUZ, BOTHOREL, POULET Entropy Based Community Detection
  • 15. Experimental Setup Experiments and Results mm 40 60 80 100 120 Data used: Each graph in this data set The graph contains 6386 nodes contains a set of semantic and 435324 edges. Has an initial information for each node: 40 modularity of −2.8629 × 10−4 . Student faculty In each case, the initial entropy has been calculated using different Gender criteria: Major 60 AS Feature H0 Classes 1 Gender 0.2286 3 Second major/minor 2 Major 0.2318 77 House 30 executions of the experiments were performed for each point of view. Year 80 High school page 9 CRUZ, BOTHOREL, POULET Entropy Based Community Detection
  • 16. Results Experiments and Results mm 40 60 80 100 120 There is a compromise between the entropy and the modularity. There are 7 communities for each attribute set AS: 40 From 3 classes in AS1 • • From 77 classes in AS 2 Results – Measures AS Exp. Average Q Average Entropy 60 AS1 CFU 0.4180 (±0) 0.2286 (±0) CFU+Ent 0.2565 (±0.006065) 0.1381 (±0.0025741) AS2 CFU 0.4180 (±0) 0.2318 (±0) CFU+Ent 0.2440 (±0.004242) 0.1356 (±0.001493) 80 page 10 CRUZ, BOTHOREL, POULET Entropy Based Community Detection
  • 17. Results Experiments and Results mm 40 60 80 100 120 r y nde use cult jor or ar . H. S Min Ye Ge Ma Fa Ho Results – Rand Index Pair Rand Index AS∅ − ASGender 0.4232 40 AS∅ − ASMajor 0.3070 ASGender − ASMajor 0.3919 Each partition configuration is r different for each attribute set nde jor Ma Ge 60 Non–topological information change the result of the clustering process 80 page 10 CRUZ, BOTHOREL, POULET Entropy Based Community Detection
  • 18. Algorithm Complexity Considerations Experiments and Results mm 40 60 80 100 120 Algorithm Execution Time The complexity of entropy 60000 calculation is, in general 50000 O n2 × f (n points and f features). Execution Time (ms) 40 40000 Using only the contribution of 30000 a point to the group entropy, 20000 the complexity is reduced to a near–linear behavior. 10000 60 Using a fixed number of 0 0 20 40 60 80 100 nodes (6386) and varying Number of Features only the number of features Simple Matching Coefficient Cosine Distance this linear behavior is observed. 80 page 11 CRUZ, BOTHOREL, POULET Entropy Based Community Detection
  • 19. Algorithm Complexity Considerations Experiments and Results Algorithm Memory Usage mm 40 60 900 80 100 120 800 In general, the memory Memory Used (Mb) 700 usage is linear, however, the 600 SMS graph is stepper than 500 the cosine distance. 40 400 300 For the SMC near the 40 0 20 40 60 80 100 features, the memory used SMC Number of Features Memory Baseline grows, coinciding with the 900 execution time increase. 60 800 The behavior of the graphs is Memory Used (Mb) 700 due to the Java’s memory 600 management system. 500 Anyway, the usage never 400 explodes. 80 300 0 20 40 60 80 100 Number of Features Cosine Distance Memory Baseline page 11 CRUZ, BOTHOREL, POULET Entropy Based Community Detection
  • 20. Conclusions Conclusions mm 40 60 80 100 120 Each type of information in the augmented network has different representations different and measures of similarity: those measures behave oppositely. A entropy based algorithm has been proposed to cluster 40 an augmented network. Using different points of view it is possible to have different partition configuration from the same social graph. The overall complexity of the algorithm is linear on the 60 number of features used to calculate the entropy. The memory used increases although it does not explode when the number attributes is increased. 80 page 12 CRUZ, BOTHOREL, POULET Entropy Based Community Detection
  • 21. mm 40 60 80 100 120 Thank you. 40 Do you have any questions? 60 80 page 13 CRUZ, BOTHOREL, POULET Entropy Based Community Detection
  • 22. Bibliography I Appendix Bibliography mm 40 60 80 100 120 M. Kantardzic, Data Mining: Concepts, Models, Methods, and Algorithms. Wiley-IEEE Press, 1 ed., Oct. 2002. 40 T. Kohonen, Self-Organizing Maps. Springer, 1997. M. E. Newman, “Scientific collaboration networks. ii. shortest paths, weighted networks, and centrality.,” Physical 60 Review. E, Statistical Nonliner and Soft Matter Physics, vol. 64, p. 7, July 2001. 80 page 14 CRUZ, BOTHOREL, POULET Entropy Based Community Detection
  • 23. Bibliography II Appendix Bibliography mm 40 60 80 100 120 V. D. Blondel, J.-L. Guillaume, R. Lambiotte, and E. Lefebvre, “Fast unfolding of communities in large networks,” Journal of Statistical Mechanics: Theory and Experiment, vol. 2008, no. 10, p. P10008 (12pp), 2008. 40 C. Pizzuti, “Overlapped community detection in complex networks,” in GECCO ’09: Proceedings of the 11th Annual conference on Genetic and evolutionary computation, (New York, NY, USA), pp. 859–866, ACM, 2009. 60 Y. Zhou, H. Cheng, and J. X. Yu, “Graph clustering based on structural/attribute similarities,” Proc. VLDB Endow., vol. 2, pp. 718–729, August 2009. 80 page 15 CRUZ, BOTHOREL, POULET Entropy Based Community Detection
  • 24. Bibliography III Appendix Bibliography mm 40 60 80 100 120 M. E. J. Newman and M. Girvan, “Finding and evaluating community structure in networks,” Physical Review. E, Statistical Nonliner and Soft Matter Physics, vol. 69, p. 026113, Feb 2004. 40 T. Li, S. Ma, and M. Ogihara, “Entropy-based criterion in categorical clustering,” in Proceedings of the twenty-first international conference on Machine learning, ICML ’04, (New York, NY, USA), pp. 68–, ACM, 2004. 60 80 page 16 CRUZ, BOTHOREL, POULET Entropy Based Community Detection
  • 25. Entropy Minimization Algorithm [8] Appendix Bibliography mm 40 60 80 100 120 Given a partition C: A 1. Calculate the set’s initial entropy 40 2. Take a random point from a random group and insert it into other random B cluster 3. Has the entropy improved? 60 3.1 Yes: leave the point in its new cluster 3.2 No: take back the point to its original cluster C 4. Go to 2 until no further changes can be 80 made page 17 CRUZ, BOTHOREL, POULET Entropy Based Community Detection
  • 26. Entropy Minimization Algorithm [8] Appendix Bibliography mm 40 60 80 100 120 Given a partition C: A 1. Calculate the set’s initial entropy 40 2. Take a random point from a random group and insert it into other random B cluster 3. Has the entropy improved? 60 3.1 Yes: leave the point in its new cluster 3.2 No: take back the point to its original cluster C 4. Go to 2 until no further changes can be 80 made page 17 CRUZ, BOTHOREL, POULET Entropy Based Community Detection
  • 27. Entropy Minimization Algorithm [8] Appendix Bibliography mm 40 60 80 100 120 Given a partition C: A 1. Calculate the set’s initial entropy 40 2. Take a random point from a random group and insert it into other random B cluster 3. Has the entropy improved? 60 3.1 Yes: leave the point in its new cluster 3.2 No: take back the point to its original cluster C 4. Go to 2 until no further changes can be 80 made page 17 CRUZ, BOTHOREL, POULET Entropy Based Community Detection
  • 28. Entropy Minimization Algorithm [8] Appendix Bibliography mm 40 60 80 100 120 Given a partition C: A 1. Calculate the set’s initial entropy 40 2. Take a random point from a random group and insert it into other random B cluster 3. Has the entropy improved? 60 3.1 Yes: leave the point in its new cluster 3.2 No: take back the point to its original cluster C 4. Go to 2 until no further changes can be 80 made page 17 CRUZ, BOTHOREL, POULET Entropy Based Community Detection
  • 29. Entropy Minimization Algorithm [8] Appendix Bibliography mm 40 60 80 100 120 Given a partition C: A 1. Calculate the set’s initial entropy 40 2. Take a random point from a random group and insert it into other random B cluster 3. Has the entropy improved? 60 3.1 Yes: leave the point in its new cluster 3.2 No: take back the point to its original cluster C 4. Go to 2 until no further changes can be 80 made page 17 CRUZ, BOTHOREL, POULET Entropy Based Community Detection