0% found this document useful (0 votes)
14 views

Social Networks

The course syllabus for Social Network Analysis outlines a 12-week program covering key concepts such as network types, handling real-world datasets, and the strength of weak ties. It includes topics like link analysis, power laws, and the small world phenomenon, emphasizing the importance of understanding relationships within networks. The course aims to provide insights into how information and behaviors spread through social networks, utilizing various datasets and analytical techniques.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

Social Networks

The course syllabus for Social Network Analysis outlines a 12-week program covering key concepts such as network types, handling real-world datasets, and the strength of weak ties. It includes topics like link analysis, power laws, and the small world phenomenon, emphasizing the importance of understanding relationships within networks. The course aims to provide insights into how information and behaviors spread through social networks, utilizing various datasets and analytical techniques.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 85

Course Syllabus

Course Title: Social Network Analysis

Weekly Breakdown:

●​ Week 1: Introduction
●​ Week 2: Handling Real-world Network Datasets
●​ Week 3: Strength of Weak Ties
●​ Week 4: Strong and Weak Relationships (Continued) & Homophily​
Further exploration of tie strength and the concept of homophily (people connecting
due to similarity).​

●​ Week 5: Homophily Continued and +Ve / -Ve Relationships​


Positive vs. negative ties, structural balance theory.​

●​ Week 6: Link Analysis​


Concepts like centrality, PageRank, hubs and authorities.​

●​ Week 7: Cascading Behaviour in Networks​


Information diffusion, contagion models, and how ideas/viruses spread.​

●​ Week 8: Link Analysis (Continued)​


Advanced link analysis metrics and algorithms.​

●​ Week 9: Power Laws and Rich-Get-Richer Phenomena​


Preferential attachment, scale-free networks, and their implications.​

●​ Week 10: Power law (contd..) and Epidemics​


Epidemic models like SI, SIR, and SIS; role of network topology in disease spread.​

●​ Week 11: Small World Phenomenon​


Watts-Strogatz model, six degrees of separation, clustering.​

●​ Week 12: Pseudocore (How to Go Viral on the Web)​


Core-periphery structure, influence maximization, viral marketing strategies.

engineerstudyhub.in
Week 1: Introduction
Overview of social network​

A social network is a structure made up of individuals or organizations (called nodes)
connected by one or more specific types of interdependency, such as friendship,
communication, or influence (called edges or links). ​

Soc ial network analysis (SNA) is a method used to study these relationships and
understand how information, behaviors, or trends spread within a network.

Key concepts ​

SNA is a broad topic, but these are some of the essential terms, concepts, and theories
you need to know to understand how it works. In Social Network Analysis (SNA), the
network is represented using a graph, which consists of two main components: nodes
and edges.

1. Nodes (Vertices):

●​ Nodes represent the entities or actors in a network.​

●​ These can be individuals, organizations, webpages, or any unit of analysis


depending on the context.​

●​ For example:​

○​ In a social media network like Facebook, a node can represent a user.​

○​ In a citation network, a node may represent a research paper.​

2. Edges (Links or Connections):

●​ Edges represent the relationships or interactions between nodes.​

●​ These can be directed (one-way) or undirected (two-way), weighted (with value)


or unweighted.​

●​ For example:​

○​ A friendship connection between two people (undirected edge).​

engineerstudyhub.in
○​ A follower link on Twitter (directed edge).​

○​ The number of messages sent between two people (weighted edge).​

Example:

Consider a small social network of 3 people: A, B, and C

●​ Nodes: A, B, C​

●​ Edges:​

○​ A is friends with B​

○​ B is friends with C​

This forms a simple graph with 3 nodes and 2 edges.

Nodes and edges are the fundamental building blocks of social networks. They help in
visualizing and analyzing relationships and interactions within any social system.​

engineerstudyhub.in


Network types​

In Social Network Analysis, a network is a collection of nodes (also called vertices) and
edges (also called links) that represent entities and the relationships or interactions
between them.

●​ Nodes: Represent individuals, organizations, or any objects.​

●​ Edges: Represent relationships like friendship, communication, influence, etc.​

A network can be represented mathematically using graph theory, where it is visualized


as a graph of points (nodes) connected by lines (edges).

Types of Networks

Networks can be classified into different types based on direction, weight, and structure:

1.​ Undirected Network:​

○​ Edges have no direction.​

○​ Relationship is mutual (e.g., Facebook friends).​

○​ Represented as simple lines.​

○​ Example: A–B means A is connected to B and vice versa.​

2.​ Directed Network (Digraph):​

○​ Edges have a direction.​

○​ Relationship is one-way (e.g., Twitter followers).​

○​ Represented with arrows.​

engineerstudyhub.in
○​ Example: A → B means A follows B.​

3.​ Weighted Network:​

○​ Each edge carries a value or weight representing the strength, frequency,


or cost of interaction.​

○​ Example: Number of emails sent between two people.​

4.​ Unweighted Network:​

○​ All edges are treated equally without any weight or value.​

○​ Only presence or absence of a link is considered.​

5.​ Homogeneous Network:​

○​ All nodes are of the same type.​

○​ Example: A friendship network of students.​

6.​ Heterogeneous Network:​

○​ Nodes belong to different types.​

○​ Example: A network of students and courses, where students enroll in


courses.​

Networks and their types help researchers understand various forms of relationships
and behaviors in social, technological, and biological systems. Choosing the right type of
network is crucial for effective analysis.


engineerstudyhub.in
Week 2: Handling Real-world Network
Datasets
Handling Real-world Network Datasets

Introduction to Dataset

❖​ Real-world network datasets are crucial for understanding the structure and
dynamics of various interconnected systems in the world.
❖​ These datasets represent entities as nodes (vertices) and relationships between
them as edges (links).
❖​ The analysis of such datasets is essential for applications in fields like social
media analysis, biological networks, transportation, and more.
❖​ Real-world networks can range from simple connections like social media
followers to complex biological interactions.

Ingredient Network

An ingredient network is a type of dataset that represents relationships between


different ingredients used in recipes.

Nodes represent ingredients, and edges represent the co-occurrence of ingredients in


the same recipe.

This type of network helps understand food pairings, common ingredient combinations,
and trends in culinary preferences.

Key features:

●​ Nodes: Ingredients​

●​ Edges: Co-occurrence of ingredients in the same recipe​

●​ Applications: Recipe recommendation systems, culinary analysis​

Synonym Network

A synonym network consists of words or phrases that are related through synonyms. In
this network, nodes represent words, and edges represent the synonym relationship
between them. ​

A synonym network helps understand language structure and is useful in NLP tasks like
text summarization, machine translation, and sentiment analysis.
Key features:

●​ Nodes: Words or phrases​

●​ Edges: Synonym relationships between words​

●​ Applications: NLP tasks, language modeling, thesaurus generation​

Web Graph

●​ A web graph shows how web pages (nodes) are linked through hyperlinks (edges).
●​ It's used in SEO, web crawling, and ranking algorithms like Google's PageRank.
●​ A web graph connects web pages using hyperlinks, showing how they're linked.
●​ It's helpful for search engines to crawl, rank pages, and improve SEO.
●​ It's used to understand website connections and improve search ranking.

Key features:

●​ Nodes: Web pages or websites


●​ Edges: Hyperlinks between pages or websites
●​ Applications: Search engine algorithms, website analysis, link prediction
Social Network Dataset

A social network dataset is a type of real-world network that represents social


relationships between individuals or groups.

In this network, nodes represent people or entities, and edges represent social
connections such as friendships, follows, or collaborations. ​

Social networks are used in applications such as social media analysis, recommendation
systems, and influence modeling.

Key features:

●​ Nodes: Individuals, organizations, or entities​

●​ Edges: Social connections like friendships, follows, or interactions​

●​ Applications: Social media analysis, community detection, recommendation


systems​

Handling real-world network datasets requires a solid understanding of graph theory


and the specific domain of the dataset. ​

The ability to interpret these networks enables applications across various fields such as
social media analysis, recommendation systems, and even culinary or linguistic
research.​

By using tools like graph analytics and machine learning techniques, these datasets can
provide valuable insights into complex interconnected systems.












Datasets: Different Formats

Datasets can come in different formats, including:

●​ CSV (Comma-Separated Values): A simple text format for tabular data.​

●​ JSON (JavaScript Object Notation): A lightweight data format used for storing
data in key-value pairs.​

●​ XML (Extensible Markup Language): A flexible, text-based format for structured


data.​

●​ Excel (XLS/XLSX): A spreadsheet format for tabular data with more advanced
features.​

●​ Parquet: A columnar storage format used for large-scale data processing.​

●​ SQL: Data stored in relational databases using structured query language.​

Each format has its uses depending on the data and how it's processed.

Introduction: Emergence of Connectedness

In Social Network Analysis (SNA), the "Emergence of Connectedness" refers to the


point at which individual nodes (such as people or organizations) begin forming links
(like relationships or interactions) that gradually develop into a larger, more cohesive
network. It describes the early phase of network development when isolated individuals
or small groups start becoming interconnected.

Connectedness:

●​ In network terms, connectedness is the degree to which nodes are linked


together.​

●​ A connected network means there is a path (direct or indirect) between any two
nodes.​

Emergence:
●​ "Emergence" refers to how a complex structure or pattern arises from simpler
individual interactions.​

●​ In SNA, it describes how individual connections give rise to larger network


structures like clusters, communities, or a fully connected network.​

Importance:

●​ Helps understand how information spreads across a network.​

●​ Reveals how influence or power begins to centralize in certain nodes.​

●​ Explains the growth of communities and the formation of social capital.​

●​ Useful in identifying tipping points—when a network shifts from being


fragmented to interconnected.​

Advanced Material : Emergence of Connectedness

Connectedness is how things link together to form a system. Here's a short summary:

1.​ Social Networks: People or groups are connected, forming networks with a few
important nodes (influencers) and many less connected ones.​

2.​ Biological Networks: In nature, things like cells or animals are connected to help
them survive, often without any leader.​

3.​ Technology Networks: The internet connects billions of devices, with key hubs
like popular websites.​

4.​ Mathematics: Graph theory helps study how things are connected and how
networks grow.​

5.​ Why It Matters:​

○​ Resilience: Good connectedness makes systems stronger.​

○​ Efficiency: More connections can make things work better.​


○​ Vulnerability: Too many wrong connections can cause failures.​

6.​ Key Ideas:​

○​ Percolation: How things spread in a network.​

○​ Phase Transitions: Networks can change drastically when connections


increase.​

Connectedness shapes how systems grow, adapt, and work.


Week 3: Strength of Weak Ties
Granovetter's Strength of weak ties​

❖​ Mark Granovetter, a sociologist, introduced the concept in 1973.​

❖​ He explained how weak ties (casual acquaintances) can be more helpful than
strong ties (close friends and family) in some situations.​

❖​ Weak ties act as bridges between different social groups and help spread new
information.​

❖​ People with only strong ties are often part of the same circle, so they usually know
the same things.​

❖​ But weak ties connect us to new people and new opportunities, like jobs, ideas,
or resources.​

❖​ For example, most people find jobs through weak ties, not their closest friends.​

❖​ Granovetter showed that information travels faster and wider through weak
ties.​

❖​ This idea is now used in networking, social media, marketing, and job hunting.


Triads, clustering coefficient and neighborhood overlap​

Triads

●​ A triad is a group of three nodes (people) in a network.​

●​ These three nodes can have different connection patterns:​

○​ All three are connected (closed triad).​

○​ Only two are connected (open triad).​

●​ Studying triads helps us understand how relationships form and grow in a


network.​

Clustering Coefficient
●​ The clustering coefficient measures how tightly-knit a node’s friends are.​

●​ It shows the likelihood that two friends of a person are also friends with each
other.​

●​ Mathematically:​

●​ A high clustering coefficient means a tightly connected group (like close friend
circles).​

Neighborhood Overlap

●​ Neighborhood overlap checks how much the neighbors (friends) of two


connected people overlap.​

●​ It’s calculated as:​

●​ High overlap suggests the connection is in a tight community, while low overlap
may indicate a bridge between different communities (like a weak tie).​

Structure of Weak Ties, Bridges, and Local Bridges

1.​ Weak Ties​

○​ These are casual or infrequent connections, like old classmates, distant


colleagues, or online acquaintances.​

○​ They link people from different social circles.​

○​ Weak ties help in spreading new information across different parts of a


network.​

2.​ Bridges​
○​ A bridge is a connection (tie) between two nodes (people) that connect
two otherwise separate groups.​

○​ Without this bridge, the two groups would be disconnected.​

○​ Bridges are essential for the flow of new information across a network.​

3.​ Local Bridges​

○​ A local bridge is a tie where the two connected individuals have no mutual
friends.​

○​ It is the only direct link between their two social circles.​

○​ Local bridges are usually weak ties, and they play a crucial role in finding
new jobs, opportunities, or knowledge.


Validation of Granovetter's Theory Using Cell Phone Data

●​ Researchers used mobile phone call records from millions of users to analyze
social networks at a large scale.​

●​ Each call and text created a link (tie) between users — with frequency and
duration indicating tie strength.​

●​ The researchers studied the relationship between tie strength and the diversity
of contacts.​

●​ They found that people who communicated more frequently (strong ties) often
had similar social circles.​

●​ However, weaker ties (less frequent communication) connected users to very


different social groups.​

●​ These weak ties helped spread information more broadly across the network —
just like Granovetter had predicted.​

●​ One famous study by Onnela et al. (2007) used mobile phone data and showed
that removing weak ties caused a large drop in overall network connectivity.​

Embeddedness

●​ Embeddedness refers to how much a person's connections are deeply rooted in a


social group or network.​

●​ A relationship is highly embedded if the two people have many mutual


connections.​

●​ Granovetter argued that economic behavior is embedded in social relationships


— trust, reputation, and social norms affect decisions.​

●​ Embedded ties often offer strong trust and support but may lack new or diverse
information.​
Structural Holes (by Ronald Burt)

●​ A structural hole is a gap between two social groups that are not directly
connected.​

●​ A person who connects two disconnected groups acts as a bridge across the
structural hole.​

●​ This position gives them information and control advantage, because they can
access non-redundant information from both sides.​

●​ People who span structural holes are often innovators, leaders, or influencers
because they see things others can’t.​

Social Capital

●​ Social capital refers to the resources and benefits people get from their social
relationships.​

●​ These can include information, help, trust, emotional support, and


opportunities.​

●​ Social capital is built through networks, norms, and trust that facilitate
cooperation.​

●​ Strong ties offer emotional support; weak ties and bridges offer access to new
information and opportunities — both types contribute to social capital.

Tie Strength

●​ Tie strength refers to how close and active a relationship is between two people.​

●​ Strong ties: Close friends, family – frequent communication, emotional


closeness.​

●​ Weak ties: Acquaintances, old colleagues – less frequent interaction, but broader
reach.​

●​ Tie strength affects how information, support, and influence spread in networks.
Social Media and Tie Strength

●​ Social media platforms like Facebook, Instagram, Twitter, and LinkedIn allow us
to maintain both strong and weak ties.​

●​ You interact more deeply with strong ties (likes, comments, DMs).​

●​ But you stay connected with weak ties through occasional updates – and these
ties are often useful for new opportunities (jobs, trends, ideas).​

●​ Social media blurs the line between strong and weak ties – people can quickly
reconnect or build new ties.

Passive Engagement

●​ Passive engagement means viewing content without actively interacting (e.g.,


just scrolling, watching stories, reading posts).​

●​ It’s a common way to maintain weak ties – even without direct messaging or
commenting.​

●​ Passive engagement helps people stay updated about others’ lives, which keeps
weak ties alive and relevant.​

●​ Studies show even passive interaction can influence emotions, social comparison,
and information flow.​

Betweenness Measures

●​ Betweenness centrality measures how often a node (person) lies on the shortest
path between other nodes in a network.​

●​ A node with high betweenness acts like a bridge or connector between different
parts of the network.​

●​ Such nodes often have influence or control over information flow because they
connect otherwise separate groups.
●​ Use case: Helps find key influencers, network bottlenecks, or important
connectors in social networks, transport, or communication systems.​
Graph Partitioning

●​ Graph partitioning means dividing a network into smaller groups or


communities (clusters), where nodes in the same group are tightly connected,
and nodes across groups have fewer connections.​

●​ Purpose: To simplify large networks, detect communities, or improve efficiency


in tasks like routing, data storage, or load balancing.​

●​ Popular methods:​

○​ Min-cut: Cuts the least number of edges to divide the graph.​

○​ Spectral clustering: Uses matrix properties (like eigenvalues) to detect


structure.​

○​ Modularity-based partitioning: Groups nodes to maximize intra-group


links and minimize inter-group links.​

●​ In social networks, partitioning helps to find interest groups, friend circles, or


communities.​

●​ Combined with betweenness, you can identify bridges between partitions or


detect community leaders.​

Finding Communities in a Graph (Brute Force Method)

●​ In graph theory, a community is a set of nodes within a graph that are more
densely connected to each other than to nodes outside the community.
●​ Identifying communities helps in analyzing complex networks like social media,
biological networks, and organizational structures.
●​ The Brute Force Method is one of the simplest techniques to detect communities
in a graph, though it may not always be the most efficient for large graphs.

Steps Involved in the Brute Force Method:

1.​ Generate All Possible Subsets:​


○​ First, generate all possible subsets of nodes from the graph. Each subset
represents a potential community.​

○​ For an undirected graph with n nodes, there are 2^n subsets (including the
empty set and the entire set of nodes).​

2.​ Measure Internal Connectivity:​

○​ For each subset, evaluate the internal connectivity—i.e., the number of


edges between nodes within the subset.​

○​ The more edges that exist between nodes inside the subset, the stronger
the community.​

3.​ Measure External Connectivity:​

○​ For each subset, evaluate the external connectivity, which is the number of
edges that connect nodes in the subset to nodes outside it.​

○​ A good community will have many internal edges and few external edges.​

4.​ Calculate the Modularity:​

○​ The modularity of a subset is a measure that compares the internal edges


to the expected number of edges if nodes were randomly placed in the
network.​

○​ Communities with higher modularity values indicate strong cohesion and


are more likely to be actual communities.​

5.​ Select the Best Communities:​

○​ Among all subsets, identify those that have high modularity (i.e., high
internal connectivity and low external connectivity).​

○​ These subsets represent potential communities.

Girvan-Newman Algorithm for Community Detection

1.​ Objective: The Girvan-Newman algorithm is used to detect communities


(clusters) in a graph by progressively removing edges that are most likely to be
between different communities.​

2.​ Key Idea:​

○​ Communities are groups of nodes that are densely connected internally


and sparsely connected externally.​

○​ The Girvan-Newman algorithm focuses on identifying edges that bridge


communities and removes them one by one.​

○​ By removing these edges, the graph will naturally split into separate
components, which correspond to different communities.​

Steps of the Girvan-Newman Algorithm:

1.​ Step 1: Calculate Edge Betweenness Centrality​

○​ Edge betweenness centrality measures how often an edge lies on the


shortest path between two nodes in the graph.​

○​ An edge with high betweenness centrality is likely to be a "bridge"


between communities.​

2.​ Step 2: Remove the Edge with the Highest Betweenness​

○​ After calculating the betweenness centrality for all edges, remove the edge
with the highest betweenness.​

○​ This edge is assumed to be the most significant link between two different
communities.​

3.​ Step 3: Recalculate Betweenness Centrality​

○​ After removing the edge, recalculate the betweenness centrality for all
remaining edges, as the removal of one edge might change the shortest
paths.​

4.​ Step 4: Repeat the Process​


○​ Continue removing edges with the highest betweenness centrality and
recalculating until the graph splits into separate connected components
(each representing a community).​

5.​ Step 5: Stop When Components Are Disconnected​

○​ The process stops when the graph is divided into disconnected


components, which are the final community groups.
Week 4: Strong and Weak Relationships
& Homophily
Introduction to Homophily

Homophily is the tendency of individuals to associate and bond with others who are
similar to themselves.​

The phrase "birds of a feather flock together" captures this idea.

Types of Homophily:

1.​ Status Homophily: Based on social status (age, gender, education, religion, etc.)
2.​ Value Homophily: Based on shared beliefs, attitudes, or values.

Why Homophily Happens:

●​ People find it easier to communicate with those who are similar.


●​ Shared experiences and perspectives lead to more trust and comfort.

Effects of Homophily:

●​ Leads to the formation of tightly knit communities.


●​ Encourages echo chambers where only similar ideas circulate.
●​ Can limit diversity in thoughts, opportunities, and information flow.​

Should You Watch Your Company?​

◆​ Yes—being surrounded only by similar people can restrict your growth and exposure
to new ideas.​

◆​ Diverse networks (including weak ties and people from different backgrounds) offer
new perspectives and opportunities.​

◆​ It’s important to balance comfort with diversity in your social and professional
circles.


Selection and Social Influence​

Selection

●​ People choose friends or connections based on similar interests, behaviors, or


characteristics.​

●​ Example: A student who enjoys studying may befriend others who also study
seriously.​

●​ This leads to homophily—people becoming similar because they select similar


others.​

Social Influence

●​ After forming relationships, people tend to influence each other's behavior.​

●​ Over time, individuals in a group may adopt similar habits, attitudes, or beliefs.​

●​ Example: A person might start exercising regularly if their friends are into fitness.


Interplay between Selection and Social Influence

❖​ Selection and social influence don’t happen in isolation—they often occur


together in social networks.
❖​ People choose friends or connections who are already similar to them in behavior,
values, or interests.
❖​ After the connection is made, people begin to influence each other, becoming
even more similar over time.
❖​ This creates a feedback loop: select similar people → become more alike through
influence → continue surrounding oneself with similar people.
❖​ In real-world data, it is often difficult to distinguish whether people are similar
because of selection or due to influence after forming a connection.
Homophily - Definition and Measurement​

Definition of Homophily

●​ Homophily is the tendency of individuals to form ties with others who are similar
to themselves.​

●​ Captured by the phrase: “Birds of a feather flock together.”​

Types of Homophily

●​ Status Homophily: Based on formal characteristics like age, gender, race,


education, etc.​

●​ Value Homophily: Based on internal beliefs, attitudes, or values.

Measurement of Homophily:​

●​ Homophily can be measured by comparing the similarity between connected


nodes (people) in a network.​

Two Common Ways to Measure:​

●​ E-I Index (External-Internal Index):​

○​ Compares the number of external (different) ties vs internal (similar) ties.​

○​ Formula: E-I = (E - I) / (E + I)​

■​ E = number of edges to dissimilar others​

■​ I = number of edges to similar others​

○​ Value ranges from -1 (complete homophily) to +1 (complete heterophily).​

●​ Assortativity Coefficient:​
○​ Measures the tendency of nodes to connect with others of the same type
or attribute.​

○​ Value ranges from -1 to +1.​

○​ Closer to +1 means strong homophily, closer to -1 means strong


heterophily.

Foci Closure and Membership Closure​



Foci Closure

●​ Foci closure refers to the idea that people form connections because they share a
common focus (or activity/place).
●​ A focus is any shared context like a school, workplace, club, gym, or online group.
●​ If person A and person B both go to the same gym (focus), they are more likely to
become friends.
●​ Two parents who meet at their child’s school may become friends because the
school acts as the common focus.

Membership Closure

●​ Membership closure is the reverse of foci closure—it refers to a person joining a


group or focus because their friends are already members.
●​ If many of your friends are part of a book club, you are more likely to join that
book club too.
●​ A student joins a coding club because their close friends are already part of it.


Fatman Model in Social Networks

The Fatman Model was introduced by Alain Barrat, Marc Barthélemy, and Alessandro
Vespignani in 2004. ​

It explains the growth and evolution of social networks by considering two main
factors:

1.​ Preferential Attachment​


○​ Nodes tend to connect to others that already have a large number of
connections.​

○​ This leads to the “rich get richer” effect in social networks.​

2.​ Fitness​

○​ Fitness refers to the intrinsic properties of a node (individual), such as age,


income, education, etc., which affect its chances of gaining new
connections.​

○​ Nodes with higher fitness values are more likely to receive more
connections over time.​

Fat-Tailed Distribution

●​ The model is named the Fatman Model because the fitness values follow a
fat-tailed distribution.​

●​ In this distribution:​

○​ A few nodes have very high fitness (many connections).​

○​ Most nodes have low fitness (few connections).​

Applications

●​ Applied to various networks such as:​

○​ Online social networks​

○​ Scientific collaboration networks​

○​ Transportation networks​

●​ It has shown that networks with a fat-tailed fitness distribution are more robust
and realistic compared to networks with uniform distributions.​
Flow of Fatman Evolutionary Model

1.​ Initialize the Network:​

○​ Create a base network of individuals and their connections.​

2.​ Assign Attributes:​

○​ Give each node attributes like age, gender, occupation, etc.​

3.​ Define Interaction Rules:​

○​ Rules to decide how connections are formed or broken based on fitness


and existing ties.​

4.​ Simulate Evolution:​

○​ Run the model over time to simulate the network’s natural growth and
changes.​

5.​ Analyze Properties:​

○​ Check degree distribution, clustering, and communities in the network.​

6.​ Refine the Model:​

○​ Adjust rules or attributes to better match real-world network behavior.​

7.​ Repeat:​

○​ Iterate until the simulated network matches real-world observations.

The Fatman Model effectively captures the dual impact of popularity (preferential
attachment) and personal attributes (fitness) on social network growth. It provides
insights into how structure, robustness, and inequalities emerge in real-world social
systems.

Quantifying the Effect of Triadic Closure​



Quantifying the Effect of Triadic Closure is an important concept in social network
analysis. ​

Triadic closure refers to the phenomenon where if two individuals (A and B) are both
connected to a third individual (C), there is a high probability that A and B will form a
direct connection with each other. ​

The effect of triadic closure is often used to understand how social connections form and
strengthen over time.​

Identify Triads in the Network

●​ Triads are groups of three nodes in a network.​

●​ To identify triads, check if two nodes (A and B) are connected to a common third
node (C).​

●​ This can be done by finding all pairs of neighbors that share a common neighbor.​

Triadic Closure Index

●​ Define a triadic closure index to measure how likely a triad is to close.​

●​ A simple triadic closure index can be defined as the ratio of actual closed triads to
the total number of possible triads in a network.​

Triadic Closure Index (TCI) = (Number of closed triads) / (Total number of
possible triads)​

●​ A triad is closed when all three nodes (A, B, and C) are directly connected to each
other.


Week 5: +Ve / -Ve Relationships
Spatial Segregation: Simulation of the Schelling Model​

❖​ Schelling Model is a simple simulation that shows how individual preferences can
lead to segregation in society.​

❖​ Each person (or agent) wants to live near people similar to them (based on religion,
community, etc.).​

❖​ Even with a small preference for similarity, the result can be large-scale segregation
over time.​

❖​ The simulation takes place on a grid where agents of different types are randomly
placed.​

❖​ An agent checks its neighborhood — if it’s unhappy (not enough similar neighbors),
it moves to a new empty spot.​

❖​ This process repeats until most or all agents are satisfied with their neighborhood.​

❖​ Over time, this leads to clustering — similar agents group together, creating
segregated zones.​

❖​ The model shows how local individual choices can lead to unintentional large-scale
separation.

Positive and Negative Relationships - Introduction

In the context of social networks, relationships between individuals or entities can


generally be classified as positive or negative, depending on the nature of the
interaction or connection. ​

Positive Relationships:

●​ Individuals or entities collaborate for common goals, resulting in personal or


collective growth.​

●​ Built on trust, leading to collaborative actions.​

●​ Provides emotional, social, or financial support in times of need.​


●​ Enhances social bonds, fostering community and interconnectedness.​

●​ Represented by edges with positive weight in a graph, indicating strong, positive


connections.​

Negative Relationships:

●​ Conflict and Distrust arises from misunderstandings, opposing interests, or


competition, leading to conflict.​

●​ Common in competitive situations for limited resources, causing hostility.​

●​ Creates stress, anxiety, and tension among individuals or groups.​

●​ Represented by edges with negative weight or the absence of connections,


hindering social cohesion.

Structural Balance

●​ Definition:​
Structural balance is a concept from social network theory that examines the
stability of relationships in a network, especially in triads (groups of three nodes).​

●​ Basic Idea:​
It focuses on whether the pattern of positive (friendship) and negative (hostility)
relationships in a triad creates harmony or tension.​

●​ Balanced Triads:​
A triad is considered balanced if:​

○​ All three relationships are positive (+ + +), or​

○​ Two relationships are negative and one is positive (+ – –)​


These configurations are psychologically stable.
Balance Theorem

Statement:​

A complete signed graph is said to be balanced if and only if:

All edges in the graph are positive, or The set of nodes can be divided into two mutually
hostile groups such that:

○​ All edges within each group are positive, and​

○​ All edges between the two groups are negative.​

Proof of Balance Theorem

(⇒) If a graph is balanced, then it satisfies the above condition:

1.​ Consider a complete signed graph where all triangles (triads) are balanced.​

2.​ According to balance theory, only the triads with signs (+ + +) or (+ – –) are
balanced.​

3.​ Pick any node A and classify the rest of the nodes based on their relationship with
A:​

○​ Group X: Nodes connected to A by positive edges.​

○​ Group Y: Nodes connected to A by negative edges.​

4.​ To maintain balance:​

○​ All nodes in Group X must be positively connected to each other.​

○​ All nodes in Group Y must also be positively connected to each other.​

○​ Edges between Group X and Group Y must be negative.​

5.​ Hence, the graph satisfies the condition of being divided into two friendly groups
with negative relationships between them.​
(⇐) If the graph satisfies the above condition, then it is balanced:

1.​ Suppose the graph is divided into two groups with:​

○​ Positive edges within each group.​

○​ Negative edges between groups.​

2.​ Any triangle (triad) in the graph will fall into one of two categories:​

○​ All nodes from one group → edges are all positive → triad is balanced.​

○​ Two nodes from one group and one from the other → edges form a (+ – –)
pattern → also balanced.​

3.​ Thus, all triads in the graph are balanced.​

Therefore, a complete signed graph is balanced if and only if it satisfies the conditions
stated in the balance theorem.

Positive and Negative Edges in Signed Graphs

In signed graphs, each edge between nodes carries a sign—either positive (+) or
negative (–)—to represent the type of relationship between the connected nodes.

Positive Edges (+)

●​ A positive edge indicates a friendly, cooperative, or harmonious relationship.​

●​ Common in social networks where individuals trust or support each other.​

●​ Example: Friendship, alliance, agreement.​

●​ Graphically, it may be shown with a solid line or a label (+).

Negative Edges (–)

●​ A negative edge represents a hostile, competitive, or conflicting relationship.​

●​ Seen in cases of rivalry, distrust, or opposition between entities.​


●​ Example: Enemy relation, disagreement, conflict.​

●​ Graphically, it may be shown with a dashed line or a label (–).

Role in Social Network Analysis

●​ Signed edges help model real-world social dynamics more accurately.​

●​ They are essential in analyzing structural balance, conflict resolution, and group
dynamics

Example:

Consider a triangle with nodes A, B, and C:

●​ If all edges are positive (+): The group is fully friendly.​

●​ If two edges are negative (–) and one is positive (+): Still considered structurally
balanced under balance theory.


Week 6: Link Analysis
The Web Graph​

The Web Graph is a directed graph that represents the structure of the World Wide Web.
In this graph:

●​ Nodes (or vertices) represent web pages (URLs).​

●​ Edges (or links) represent hyperlinks from one web page to another.​

This graph is very large, complex, and dynamic, and is crucial in understanding how web
pages are connected, helping in search engine indexing, ranking algorithms, and web
crawling.

Collecting the Web Graph​



Collecting the Web Graph involves creating this graph structure by scanning or crawling
the web. This is done using Web Crawlers (Spiders), which:

●​ Start from a set of seed URLs.​

●​ Fetch the content of these pages.​

●​ Extract all the hyperlinks.​

●​ Add new discovered links to the crawling queue.​

●​ Store the connection information (from → to) as graph edges.

Equal Coin Distribution​



Equal Coin Distribution is a classic problem in which a set of coins (or items) must be
evenly distributed among individuals, ensuring that all individuals receive the same
number of coins, possibly by moving coins from one to another.

The aim is to:

●​ Minimize the number of moves or transactions.​

●​ Determine whether equal distribution is possible.​


●​ Sometimes calculate the minimum effort or steps required for equal sharing.

Problem Definition:

Given an array where each element represents the number of coins a person has, the goal
is to redistribute the coins such that every person has the same number of coins.

Example:

Array: [4, 1, 7]​


Total Coins = 12, People = 3​
Average = 4 coins per person​
Redistribution: Person 2 gives 3 coins to Person 1.​

Conditions for Equal Distribution:

Equal distribution is only possible if the total number of coins is divisible by the
number of people.

If Total Coins%Number of People =0, then equal distribution is not possible.

Random Walk Coin Distribution​



Random Walk Coin Distribution is a probabilistic model where coins are passed
between individuals (or nodes) based on random walks. It is used to study the
convergence of coin distribution and how randomness affects balance over time.

2. What is a Random Walk?

A random walk is a mathematical process where a system moves step-by-step in a


random direction. In this context:

●​ Each person or node passes a coin to a randomly chosen neighbor at each step.​

●​ This is repeated over many iterations.​

●​ Eventually, the coin distribution tends toward uniformity under certain


conditions.

3. Model Explanation:

●​ Initial Setup: A set of nodes, each having some coins (possibly uneven).​
●​ Random Process: At each time unit, a node gives a coin to one of its neighbors
chosen uniformly at random.​

●​ This process continues, simulating a random walk of coins on a graph.

Google Page Ranking Using Web Graph​



Google PageRank is a link analysis algorithm developed by Larry Page and Sergey Brin
to rank web pages in search engine results. It uses the Web Graph, where:

●​ Nodes = web pages​

●​ Edges = hyperlinks between pages​

The core idea is: a page is important if it is linked to by other important pages.

2. Role of Web Graph:

The Web Graph helps model the entire web as a directed graph. In this graph:

●​ Each page points to others via hyperlinks.​

●​ The PageRank algorithm computes a ranking score for each page using the
structure of this graph.

3. PageRank Algorithm Concept:

●​ A web surfer starts on a random page and either:​

○​ Follows a link from the current page (probability d, typically 0.85)​

○​ Jumps to any random page (probability 1 - d)​

The PageRank (PR) of a page A is given by:


Where:

●​ N = total number of pages​

●​ d = damping factor (usually 0.85)​

●​ Tᵢ = pages linking to A​

●​ C(Tᵢ) = number of outbound links from Tᵢ​

4. Steps Involved:

1.​ Construct the Web Graph by crawling web pages.​

2.​ Initialize PageRank of all pages equally (e.g., 1/N).​

3.​ Iteratively update PageRank values using the formula.​

4.​ Continue until values converge (changes become very small).​

DegreeRank versus PageRank


DegreeRank

Definition:

DegreeRank is a simple ranking method based on the number of incoming or outgoing


links a page has in the web graph.

Types:

●​ In-Degree Rank: Number of incoming links (more popular measure).​

●​ Out-Degree Rank: Number of outgoing links.​

Formula:

DegreeRank (A)=In-Degree of Page A


Advantages:

●​ Simple to calculate.​

●​ Fast and requires no iteration.​

●​ Useful for quick estimation of importance.​

Limitations:

●​ Does not consider quality of incoming links.​

●​ Easily manipulatable (spam pages linking to each other).

PageRank

Definition:

PageRank is an iterative algorithm developed by Google, which assigns a ranking score


to each page based on the quality and quantity of its backlinks.

Formula:

Where:

●​ d = damping factor (usually 0.85)​

●​ NN = total pages​

●​ Ti = pages linking to A​

●​ C(Ti​) = number of outbound links from Ti​

Advantages:
●​ Considers both quantity and quality of links.​

●​ Resistant to spam and link farms.​

●​ Provides a more realistic measure of importance.​

Limitations:

●​ Computationally expensive (requires multiple iterations).​

●​ Slower for large graphs.

4. Key Differences Table:

DegreeRank PageRank

Number of incoming/outgoing Quality and quantity of links


links

Simple (non-iterative) Iterative and computationally


intensive

Less accurate More accurate and reliable

Low High

No Yes
Week 7: Cascading Behaviour in
Networks
Diffusion in Networks​

Diffusion in networks refers to the process by which something (information,
influence, disease, innovation, or behavior) spreads through the nodes and edges of a
network. It models how entities interact and propagate effects through a connected
structure.​

Types of Diffusion:

1.​ Information Diffusion:​

○​ Spread of news, tweets, posts, or knowledge on platforms like Twitter,


WhatsApp, etc.​

2.​ Disease/Contagion Diffusion:​

○​ Models the transmission of diseases (like COVID-19) in epidemiology.​

3.​ Innovation Diffusion:​

○​ How new technologies, ideas, or products spread in a population.​

4.​ Influence Diffusion:​

○​ How individuals influence others’ behavior (e.g., in marketing or politics).

Modeling Diffusion​

Common Models for Modeling Diffusion:

1. Independent Cascade Model (ICM):

●​ Each active node gets one chance to activate each of its inactive neighbors with a
certain probability (p).​

●​ If successful, the neighbor becomes active in the next time step.​

●​ Process continues until no new activations occur.​


Used for: Viral marketing, product recommendation.

2. Linear Threshold Model (LTM):

●​ Each node has a threshold value.​

●​ A node becomes active when the total influence from its active neighbors exceeds
its threshold.​

●​ Influence values are assigned to each edge.​

Used for: Opinion dynamics, social influence.

3. Epidemic Models:

1.​ SIR Model (Susceptible-Infected-Recovered):​

○​ Nodes move from Susceptible → Infected → Recovered​

○​ Once recovered, they can't be infected again.​

2.​ SIS Model (Susceptible-Infected-Susceptible):​

○​ After being infected, a node can become susceptible again.​

Used for: Disease spread, rumor propagation.

Impact of Communities on Diffusion

●​ Tight-knit communities can either speed up internal diffusion or slow down


spreading to other communities.​

●​ Information spreads faster within a community due to strong connections, but


slower between communities because of fewer cross-links.​

●​ Nodes that connect different communities (called bridge nodes) play a key role in
spreading diffusion across communities.​

●​ In tightly connected communities, people may keep sharing the same


information, causing redundancy but not reaching new people (echo chamber
effect).​

●​ Highly modular networks with strong community divisions may limit the overall
reach of diffusion.​

●​ Marketers or health officials must target key nodes in each community to


maximize diffusion.​

●​ For example, a viral tweet may spread quickly among students but take time to
reach professionals unless someone shares it across communities.

Cascade and Clusters​



Cascade

1.​ A cascade occurs when a small initial action (like one node becoming active)
triggers a chain reaction in which many other nodes become active.​

2.​ Cascades spread across the network as each node influences others to also
activate.​

3.​ The effectiveness of a cascade depends on the network structure and the
strength of connections between nodes.​

4.​ Cascades can be self-propagating, where one event leads to another, eventually
impacting a large portion of the network.​

5.​ External triggers or initial seed nodes are crucial in starting a cascade.​

6.​ Cascades can sometimes fail if the network structure doesn’t allow enough
influence to spread.​

7.​ Example: A viral marketing campaign where a single influential user’s post
encourages many others to share it.​

Clusters

1.​ Clusters are groups of tightly connected nodes in a network, where most nodes
are directly or indirectly connected to each other.​
2.​ Clusters are often formed based on similar characteristics or shared interests,
creating communities within the larger network.​

3.​ In a clustered network, nodes are more likely to interact with others in the same
cluster than with nodes in different clusters.​

4.​ Clusters can act as barriers to diffusion, especially if they have fewer connections
to other clusters (i.e., weak inter-cluster connections).​

5.​ A strong cluster can resist external influence, making it difficult for information
or behaviors to spread beyond the cluster.​

6.​ The size and density of a cluster affect how easily diffusion can move across it.​

7.​ Example: In social media, users who share similar interests form clusters, and
trends within one cluster may not immediately spread to other clusters without
key influencers.

Knowledge, Thresholds, and Collective Action​



Knowledge

1.​ Knowledge refers to the information or understanding that is shared or


transferred across nodes in a network.​

2.​ The spread of knowledge can happen through interactions between nodes or
through media, like social networks or educational platforms.​

3.​ Knowledge diffusion is often dependent on social influence, where individuals


adopt information based on peers or trusted sources.​

4.​ In some cases, specialized knowledge might only be accessible to certain nodes,
creating a knowledge gap in the network.​

5.​ Example: A new research paper can spread through academic networks, where
experts share and discuss the findings.​

6.​ Knowledge diffusion is often non-linear, meaning it may not spread evenly across
the network.​
7.​ The spread of knowledge can be accelerated by influential nodes (e.g., experts or
thought leaders) in the network.​

Thresholds

1.​ Thresholds refer to the minimum level of influence a node requires from its
neighbors to take action or adopt a behavior.​

2.​ High thresholds mean a node requires a lot of influence from others to adopt
something, while low thresholds mean it needs less influence.​

3.​ Nodes with low thresholds are more likely to adopt behaviors or spread
information quickly, while those with high thresholds may resist adoption until a
critical mass of neighbors adopts.​

4.​ The average threshold in a network impacts the speed and extent of diffusion
across the network.​

5.​ Thresholds are often used in social influence models like the Linear Threshold
Model (LTM).​

6.​ Example: A person might need to see at least five of their friends using a new app
(threshold) before deciding to download it.​

7.​ Network effects can amplify thresholds, where the adoption behavior of others
increases the likelihood of a node adopting.​

Collective Action

1.​ Collective action refers to the efforts of multiple individuals in a network coming
together to achieve a common goal.​

2.​ Collective action often depends on the coordination and cooperation of


individuals who might be pursuing their self-interest but align on shared goals.​

3.​ The success of collective action is influenced by incentives, social influence, and
group dynamics.​
4.​ Free rider problems can emerge, where some individuals benefit from the actions
of others without contributing themselves.​

5.​ In social networks, collective action is often triggered by shared interests, like a
social cause, political movement, or protest.​

6.​ Example: A group of users in a social media campaign work together to spread
awareness about an environmental issue.​

7.​ The critical mass of participants is essential for collective action to succeed, as it
generates the momentum needed to drive change.
Week 8: Link Analysis (Continued)
Hubs and Authorities​

Link Analysis involves analyzing the structure of links between nodes to identify
important hubs and authorities in a network, which is particularly useful in search
engine ranking and recommendation systems.​

Hubs are nodes (web pages, individuals, etc.) that have a large number of outgoing links
to other nodes (web pages or resources).​

●​ Example: A webpage that links to many other pages within a topic or domain.​

Authorities are nodes that receive many incoming links from hubs, indicating that they
are considered important or authoritative on a particular subject.​

●​ Example: A well-cited research paper or a high-quality, relevant webpage that


others frequently link to.

Conservation and Convergence​



Conservation in PageRank:

●​ Conservation refers to the idea that the total PageRank score across all pages in a
network remains constant or conserved.​

●​ Each time a link is passed from one page to another, the PageRank is distributed
across the outgoing links.​

●​ In the steady state, the total sum of all PageRank values across all pages is equal
to the initial sum, which is typically 1 (if normalized).

Convergence in PageRank:

●​ Convergence refers to the process where, after several iterations, the PageRank
scores stabilize.​

●​ Initially, PageRank scores are assigned randomly or equally, but after applying the
algorithm iteratively, the scores converge to a final set of values.​
●​ Convergence is reached when the PageRank values no longer change significantly
between iterations.​

Convergence in Repeated Matrix Multiplication

What is Convergence in Repeated Matrix Multiplication?

When we multiply a vector by a matrix repeatedly, we might notice that the result starts
to stay the same after some time. This means the vector settles into a steady state. This
steady state is called convergence.

When Does Convergence Happen?

Convergence happens when:

1.​ The matrix A is stochastic (it has probabilities and its columns add up to 1).​

2.​ The matrix is irreducible, meaning you can get from any state (or page) to any
other state (or page).​

3.​ The matrix is aperiodic, meaning there’s no fixed cycle or repeating pattern.​

Example: PageRank

In PageRank (the algorithm Google uses to rank web pages), the web is represented as a
big matrix where each page is connected to other pages via links. We repeatedly multiply
a "rank vector" by this matrix, and the vector will eventually converge to a steady state.

●​ This steady state tells us how important each page is in the web. Once the vector
stops changing, we have the final PageRank for each page.​

How Does It Look?

1.​ Start with an initial guess, like all pages having the same rank.​

2.​ Multiply by the matrix repeatedly (this is like the random surfer moving around
the web).​

3.​ After some iterations, the ranks stabilize, and that’s when we have convergence.​
In simple terms, after enough multiplications, the system stops changing and gives us
the final result.

Why It Works

When you keep multiplying, the vector eventually "learns" how the web is connected
and stabilizes to a final set of values, which tells you the importance (or PageRank) of
each page. This is the converged vector.

PageRank as a Matrix Operation

●​ PageRank is a famous algorithm used by Google to rank web pages based on their
importance.
●​ It uses a graph model where web pages are represented as nodes and hyperlinks
as directed edges between these nodes.
●​ The PageRank algorithm can be described as a matrix operation, where the
matrix represents the structure of the web.

Representation of the Web as a Matrix

In the PageRank algorithm, the web is represented as a directed graph where each web
page is a node, and each hyperlink is a directed edge. ​

The link structure of the web can be represented by an adjacency matrix A of size n×n,
where n is the number of web pages. In this matrix:

However, a simple adjacency matrix may not be enough to apply PageRank directly. We
need to transform this matrix into a form that can be used in the PageRank calculation.
Week 9: Power Laws and
Rich-Get-Richer Phenomena
Introduction to Power Law

🔹 What is a Power Law?


A Power Law describes a relationship where a few things are very common, and most things are very
rare.

In network terms:

A few nodes (people, websites, etc.) have lots of connections, while most nodes have only a
few.

🔸 Mathematical Form:
P(k)∝k−γP(k) \propto k^{-\gamma}P(k)∝k−γ

Where:

●​ P(k) = probability that a node has k connections (degree)​

●​ γ (gamma) = a constant (usually between 2 and 3)​

This means as k increases, P(k) decreases quickly.

🔍 Real-life Examples:
●​ A few websites (like Google, Facebook) get millions of visits, while most get very few.​

●​ In social networks, some users have thousands of followers, but most have only a few.​

●​ City populations: a few are huge, most are small.

📊 In Network Graphs:
●​ If you plot number of nodes vs. their degree (number of connections) on a log-log scale, a power
law appears as a straight line.​

●​ Such networks are called "scale-free networks".​


✅ Key Features of Power Law Networks:
Feature Description

Few hubs Some nodes have very high degrees

Long tail Many nodes have very few connections

Robustness Network stays connected even if random nodes fail

Vulnerability If a hub is removed, the network can break down

📌 Summary:
●​ A power law shows uneven distribution — few with a lot, many with little.​

●​ Common in social networks, internet, biology, and economics.​

●​ Important for understanding how networks grow, behave, and break.

Why do Normal Distributions Appear?

🔹 Why Do Normal Distributions Appear?


They appear because of something called the Central Limit Theorem (CLT).

🧠 Central Limit Theorem (in simple words):


When you add up a lot of small, random things, their total tends to follow a normal
distribution, even if the original things aren’t normal.

🔸 Example:
Imagine you:

●​ Roll a dice 1 time → the result is uniform (1 to 6).​

●​ Roll it 100 times and take the average → that average will start to look like a normal distribution.​
Same with:

●​ Heights of people​

●​ Test scores​

●​ Measurement errors​

●​ IQ scores​

●​ Weight of packed items​

These are all sums or averages of many tiny factors (genes, environment, skill, chance...), so they
naturally form a bell curve.

✅ Key Reasons Why Normal Distributions Happen:


Reason Explanation

CLT Sums/averages of many small things = normal distribution

Natural randomness Small effects combine to form a predictable pattern

Error & noise Measurement errors are often normally distributed

Biological & social traits Traits like height, weight, IQ come from many small causes

📊 Shape:
The graph of a normal distribution is symmetric and bell-shaped, centered around the mean (average),
with most values close to the mean and fewer as you move away.

📌 Summary:
●​ Normal distributions appear because many random factors add up.​

●​ Thanks to the Central Limit Theorem, the result often looks like a bell curve.​

●​ That’s why we see them in nature, exams, measurements, and more.


Power Law emerges in WWW graphs

🌐 Power Law in WWW Graphs


In the World Wide Web, every website is a node, and a hyperlink from one site to another is a connection
(edge).

Over time, the web grows in a way that follows a power law distribution:

A few websites get millions of links, while most get only a few.

🔹 Why Does This Happen?


This happens mainly because of a concept called “preferential attachment”:

New websites are more likely to link to popular websites.

For example:

●​ A new blog is more likely to link to Google, Wikipedia, or YouTube than a random unknown site​

●​ The more links a website already has, the more likely it is to get new ones.​

This creates a rich-get-richer effect, and over time it forms a power law distribution.

📊 Power Law Pattern:


If you count how many links each website has and plot it:

●​ Most websites have very few incoming links.​

●​ A few websites (like Google or Facebook) have millions.​

●​ The graph of this data on a log-log scale becomes a straight line, showing a power law.​
✅ Real-World Example:
Website Approx. Inbound Links

Google Millions

Wikipedia Millions

YourBlog.com Maybe 5–10

📌 Summary:
Aspect Explanation

Network Type Web = Graph of pages and links

Link Distribution Few pages with lots of links, many with few

Growth Pattern Follows "preferential attachment"

Result Power law emerges naturally

Detecting the Presence of Power Law

🔍 How to Detect a Power Law


To see if your data follows a power law, you look at the degree distribution (how many connections each
node has) and analyze its pattern.

✅ Steps to Detect Power Law:


1. Collect Degree Data

●​ For a network (like WWW, social network), count how many connections (degree) each node has.
2. Plot Degree Distribution

●​ Make a plot of:​

○​ x-axis = degree (k)​

○​ y-axis = number of nodes with that degree​

This gives you a degree distribution graph.

3. Use a Log-Log Plot

●​ Re-plot the same data using a log-log scale (log x-axis and log y-axis).​

📈 If the points form a straight line on this plot, your data likely follows a power law.

4. Fit the Power Law

●​ Use a mathematical model to fit the power law:​


P(k)∼k−γP(k) \sim k^{-\gamma}P(k)∼k−γ
●​ You can use software like Python + NetworkX, or Powerlaw library to fit the data.​

5. Check with Statistical Tests

●​ Use tests like the Kolmogorov–Smirnov (K-S) test to check how well the data fits a power law.​

●​ Compare with other models (exponential, log-normal) to confirm it's not something else.​
🧠 Tools You Can Use:
Tool/Library Purpose

Python + NetworkX To extract degrees from a graph

Matplotlib/Seaborn To plot log-log graphs

Powerlaw (Python) To fit and test power law models

Gephi Visual and statistical graph analysis

Rich Get Richer Phenomenon

💰 What is the Rich Get Richer Phenomenon?


It means:

Things that already have a lot, tend to get even more over time.

In network terms:

Nodes (like websites or people) that already have many links or connections are more likely
to get even more connections.

📈 Also Called:
●​ Preferential Attachment​

●​ Cumulative Advantage​

●​ Matthew Effect (from a Bible verse: “the rich get richer…”)​


🔗 Example in Real Life:
Context How It Works

Websites Popular sites like Google get more backlinks.

Social Media Influencers get more followers easily.

Jobs/Wealth Rich people get more opportunities/income.

Academic Papers Famous papers get cited more often.

🧠 How It Happens in Networks:


When a new node joins the network, it tends to connect to nodes that are already popular.

This forms a power law distribution:

●​ A few nodes (hubs) with many links​

●​ Many nodes with only a few​

🔄 Mechanism: Preferential Attachment


1.​ Start with a small network​

2.​ New nodes arrive one by one​

3.​ Each new node prefers to connect to already popular nodes​

4.​ This feedback loop leads to some nodes becoming very rich in connections​
Week 10: Power law (contd..) and
Epidemics
Rich Get Richer - A Possible Reason

🔹 "Rich Get Richer" – A Possible Reason (in simple words)


The phrase “Rich Get Richer” means that people or things that are already popular or successful tend to gain even
more over time.

✅ In Network Terms (like social or web networks):


Nodes (people, pages, etc.) that already have more connections are more likely to get even more
new connections.

This is called Preferential Attachment, a key idea behind the Barabási–Albert model of network growth.

🧠 Simple Example:
●​ Imagine a new user joins a social network.​

●​ Who will they follow?​

○​ Probably someone who is already popular (has many followers).​

●​ So the popular person gets even more followers.​

●​ The rich (popular) get richer.​

🔄 Why does this happen?


●​ Visibility: Popular nodes are more visible.​

●​ Trust: People trust things that others already like.​

●​ Influence: More connections = more influence = more growth.​

📊 Real-Life Examples:
Domain Rich Get Richer Example

Social Media Influencers keep gaining followers


Web Pages Popular websites get more backlinks

Money/Wealth Rich people invest and earn even more

Videos (YouTube) Viral videos keep getting more views

📌 Conclusion:
The “Rich Get Richer” effect is a natural result of how humans behave in networks — we tend to follow, link to, or
trust things that are already popular.

Epidemics - An Introduction

🔹 Epidemics – An Introduction (in Simple Words)


An epidemic is when something (like a disease, rumor, or idea) spreads quickly from person to person in a
network — like a chain reaction.

🧠 In Real Life:
●​ A virus spreads when infected people meet healthy ones.​

●​ A rumor spreads when people tell their friends.​

●​ A viral post spreads when people keep sharing it.​

So epidemics don’t just happen with diseases, they also apply to information, trends, and technology.

🔗 In Network Science:
●​ People are nodes.​

●​ Their connections (like friendships or contacts) are edges.​

●​ The structure of the network affects how fast and how far an epidemic spreads.​
🔄 Epidemic Spread Models (Basics):
1.​ SIR Model – People move through three stages:​

○​ Susceptible (can get infected)​

○​ Infected (currently spreading)​

○​ Recovered (no longer spreading)​

2.​ SIS Model – People can be reinfected after recovering:​

○​ Susceptible → Infected → Susceptible again​

📌 Key Factors That Influence Spread:


●​ Number of connections (degree)​

●​ Network density​

●​ Hubs or highly connected people​

●​ Transmission rate (how easily it spreads)​

✅ Summary:
Term Meaning

Epidemic Fast, wide spread of something (disease/info)

Network People (nodes) and their connections (edges)

Spread Rate How quickly something moves through the network

Models SIR, SIS – show how infection moves between people


Simple Branching Process for Modeling Epidemics

🔹 Simple Branching Process – What is it?


The branching process is a basic way to model how an epidemic spreads.​
It shows how one infected person can infect others, and how the infection grows like a tree (or branches).

🌳 How it works (step-by-step):


1.​ Start with 1 infected person (called generation 0).​

2.​ That person can infect a few others (generation 1).​

3.​ Each of them infects more people (generation 2), and so on...​

4.​ The process continues until no one is infected.​

It’s like a chain reaction that can grow or die out.

🧠 Key Idea: Reproduction Number (R)


●​ R = average number of people infected by 1 person​

●​ If R > 1, the epidemic grows​

●​ If R < 1, the epidemic dies out​

Example:
Let’s say each person infects 2 people on average (R = 2):

Gen 0: 1 person

Gen 1: 2 people

Gen 2: 4 people

Gen 3: 8 people
Total = 1 + 2 + 4 + 8 = 15 people infected

If R = 0.5 (each person infects only 0.5 people on average), it might look like:

Gen 0: 1

Gen 1: 1

Gen 2: 0

Basic Reproductive Number

🔹 What is the Basic Reproductive Number (R₀)?


R₀ (pronounced "R naught") is a number that tells us:

👉 On average, how many people one infected person will infect in a fully healthy population.

✅ Why is R₀ important?
It helps us predict whether a disease will spread, slow down, or stop.

📊 How to Understand R₀:


R₀ Value What It Means

R₀ > 1 Epidemic will grow (infection spreads)

R₀ = 1 Epidemic stays steady

R₀ < 1 Epidemic will die out


🧠 Example:
If R₀ = 3:

●​ Each infected person gives the disease to 3 other people​

●​ Those 3 infect 3 more each → the disease spreads quickly​

If R₀ = 0.5:

●​ Each person infects less than 1 person​

●​ The disease slows down and disappears​

🔬 What affects R₀?


●​ How easily the disease spreads (transmission rate)​

●​ How many people the infected person contacts​

●​ How long they are contagious​

✅ Summary:
Term Meaning

R₀ Avg. number of new infections from 1 person

R₀ > 1 Disease spreads

R₀ < 1 Disease fades out


Used For Predicting & controlling epidemics

SIR and SIS Spreading Models

🧪 1. SIR Model (Susceptible → Infected → Recovered)


This model describes how diseases spread and people recover.

👥 Population Groups:
●​ S (Susceptible): Healthy people who can get infected​

●​ I (Infected): People who have the disease and can spread it​

●​ R (Recovered): People who recovered or died — they don’t spread or catch the disease again​

🔁 Flow:
S→I→R

✅ Used for:
●​ Diseases like measles, COVID-19, where people usually don’t get re-infected once
recovered.

🔁 2. SIS Model (Susceptible → Infected → Susceptible again)


This model is used when people don’t gain long-term immunity after infection.

👥 Population Groups:
●​ S (Susceptible)​

●​ I (Infected)​
🔁 Flow:
S → I → S (again)

After recovering, people go back to being susceptible, and can get infected again.

✅ Used for:
●​ Diseases like common cold, flu, or STDs, where people can get the disease multiple
times.​

📊 Quick Comparison:
Feature SIR Model SIS Model

Immunity After Recovery Yes (Recovered) No (Can get sick


again)

Groups S, I, R S, I

Real-world use Measles, COVID-19 Cold, Flu

📌 Summary:
●​ SIR: Once recovered, you're safe​

●​ SIS: You can get sick again and again


Comparison between SIR and SIS Spreading Models

📊 Quick Comparison:
Feature SIR Model SIS Model

Groups 3 (Susceptible, Infected, Recovered) 2 (Susceptible, Infected)

Immunity Yes (Recovered individuals are immune) No (Individuals can get reinfected)

Reinfection No Yes

Examples of Diseases Measles, COVID-19, Smallpox Common Cold, Flu, STDs

Use Long-term immunity diseases Short-term immunity diseases


Percolation Model

🔹 What is the Percolation Model?


The Percolation Model helps us understand how something spreads (like a disease, information, or fire) through a
network, especially when some connections or nodes are missing or blocked.

It's like asking:

💬 “How likely is it that the spread can reach a large part of the network?”

🧠 Think of it like this:


Imagine pouring water on a sponge:

●​ If the sponge has many open holes, water flows through easily.​

●​ If too many holes are blocked, water can’t pass through — it stops.​

In a similar way:

●​ If many people are connected in a network → a disease or idea spreads easily.​

●​ If not enough people are connected → the spread stops or slows down.​

🔁 In Networks:
●​ Nodes = people (or devices, pages, etc.)​

●​ Edges = connections (friendships, links, etc.)​

●​ Percolation threshold = the minimum number of connections needed for the network to be “connected”
enough for large-scale spreading.​

📊 Used for Studying:


●​ Epidemics (will a disease spread or not?)​
●​ Rumor or information flow​

●​ Network robustness (what happens when parts fail?)​

✅ Key Terms:
Term Meaning

Percolation Spread through a network

Threshold (pc) Critical point where spreading becomes possible

Above threshold Spread reaches most of the network

Below threshold Spread dies out quickly

📌 Example:
●​ If only 30% of people are connected → maybe not enough to spread COVID-19.​

●​ But if 70% of people are connected → a large-scale outbreak may happen.


Week 11: Small World Phenomenon
🔹 Milgram’s Experiment (1960s)
Stanley Milgram, a psychologist, conducted an experiment to test how connected people are in the real world.

🧪 What He Did:
●​ He gave letters to random people in the U.S.​

●​ They had to forward the letter to a specific target person (a stockbroker in Boston)…​

●​ Only through people they personally knew.​

📊 What He Found:
●​ On average, it took 6 steps (6 people) to reach the target.​

●​ This led to the idea of “6 degrees of separation” — we are all connected through a short chain of
people.​

❓ The Reason Behind the Experiment:


Milgram wanted to explore:

“How small is the world, really? Can people find short paths to each other in a large network?”

His goal was to understand how social connections work, and whether people could navigate these networks
without knowing the full map.

⚙️ The Generative Model (Watts-Strogatz Model)


To explain how networks can be both clustered and have short paths, scientists developed a model:

✅ Features of the Watts-Strogatz Small-World Model:


●​ High clustering: Friends of your friends are likely your friends.​

●​ Short paths: You can reach any person in just a few steps.​

This model shows how a network can be small, even when it’s large in size
🌐 Decentralized Search
This means finding a target in a network using only local information — like Milgram’s participants.

People didn’t have a map of the full network — they only knew who they were connected to, and made
decisions based on that.

✅ Why it’s important:


●​ It shows how people (or computers) can search and navigate in a big network without global
knowledge.​

●​ Helps design efficient algorithms for search, routing, or message delivery in networks like social media or
the Internet.​

📌 Summary Table:
Topic Meaning

Milgram’s Experiment Showed that people are connected by short chains (~6 steps)

The Reason To understand how social networks connect people

Generative Model Watts-Strogatz model explains small-world networks (clustered + short)

Decentralized Search Finding paths using local knowledge (not full network map)
Week 12: Pseudocore (How to Go Viral
on the Web)
🌐 Small World Networks: Introduction
A Small World Network is a type of network where:

●​ Most nodes are not directly connected, but​

●​ You can reach any node from any other in a few steps.​

📌 Key Features:
●​ High clustering: Friends of your friends are likely your friends.​

●​ Short average path: You can reach distant parts of the network in a few hops.​

📊 Real-life examples: Social media, brain networks, internet, transport systems.

🔍 Myopic Search
Myopic Search is a greedy, local search method in a network.

🔧 How it works:
●​ A person/node doesn’t know the full network.​

●​ It only knows its neighbors.​

●​ It passes the message to the neighbor who seems closest to the target.​

📦 Like delivering a letter by only asking your friends to forward it to someone they think is closer to
the recipient.

⚖️ Myopic Search vs Optimal Search


Feature Myopic Search Optimal Search

Information used Local (neighbors only) Global (full network map)

Efficiency May not always find the shortest path Always finds the shortest path

Realistic? Yes, in real-world scenarios No, needs full network knowledge


Myopic search is more realistic, especially in human or social networks.

⏱️ Time Taken by Myopic Search


●​ Depends on how well-connected the network is.​

●​ In Small World Networks, Myopic Search is surprisingly fast.​

●​ Usually takes few steps (hops) to reach the target.​

Milgram’s experiment (6 degrees of separation) showed this: even with local decisions, people can
reach targets quickly.

🔹 PseudoCores: Introduction
In large networks, core nodes are very well connected and help in spreading messages fast.

But not all important nodes are part of this core...

➕ Enter PseudoCores:
●​ These are clusters of nodes that behave like cores but aren’t centrally located.​

●​ They still help spread information fast.​

●​ Acts as hubs for efficient message passing.​

🧠 Who Are the Right Key Nodes?


Key nodes are:

●​ Highly connected​

●​ Well-positioned in the network​

●​ Important for fast spreading and navigation​


They can be influencers, central hubs, or bridge nodes connecting different parts of the network.

🔍 Finding the Right Key Nodes (the Core)


You can find them using:

●​ Degree centrality (most connections)​

●​ Betweenness centrality (on many shortest paths)​

●​ Closeness centrality (shortest total distance to all others)​

These help locate true cores and pseudo cores.

🔸 Pseudo Core
●​ Not the actual central core, but acts like it in message spreading.​

●​ Helps in decentralized routing (like Myopic Search).​

●​ Improves speed and reach of information flow.​

🧩 Summary Table:
Concept Description

Small World Network Short paths, high clustering

Myopic Search Greedy, local search using only neighbor info

Myopic vs Optimal Myopic = realistic; Optimal = shortest but needs full info

PseudoCore Non-central group of nodes helping fast spread like a core

Key Nodes Nodes critical for flow; found using centrality measures

You might also like