Skip to content

This Repository covers the final project of our five-week workshop on Graph Convolutional Networks, a niche within graph-based machine learning.

Notifications You must be signed in to change notification settings

pouyasattari/Graph-Link-Prediction-algorithms-by-Cypher-Query-in-Neo4j

Repository files navigation

Table of Contents

  1. โš›๏ธŽ Graph Database Creation in Neo4j
  2. ๐Ÿง  Query to Show the Graph
  3. ๐Ÿงช Link Prediction Algorithms
  4. ๐Ÿš€ Let's Run the Algorithms Query
  5. ๐Ÿ“Ÿ Algorithms Results in CSV

 Relational Link Prediction Algorithms on Graph Database !!

Cover Image

โš›๏ธŽ Graph Database Creation in Neo4j

To create your database you can use this Cypher Query. nodes + relationships

// Person nodes
CREATE (alice:Person {name: 'Alice', gender: 'Female', age: 30, bornIn: 'New York'}),
       (bob:Person {name: 'Bob', gender: 'Male', age: 32, bornIn: 'London'}),
       (carol:Person {name: 'Carol', gender: 'Female', age: 25, bornIn: 'Sydney'}),
       (dave:Person {name: 'Dave', gender: 'Male', age: 28, bornIn: 'Toronto'}),
       (eve:Person {name: 'Eve', gender: 'Female', age: 22, bornIn: 'Paris'}),
       (frank:Person {name: 'Frank', gender: 'Male', age: 35, bornIn: 'Berlin'}),
       (grace:Person {name: 'Grace', gender: 'Female', age: 29, bornIn: 'Tokyo'}),
       (hank:Person {name: 'Hank', gender: 'Male', age: 40, bornIn: 'Cape Town'}),
       (irene:Person {name: 'Irene', gender: 'Female', age: 24, bornIn: 'Moscow'}),
       (jack:Person {name: 'Jack', gender: 'Male', age: 27, bornIn: 'Rio de Janeiro'})

// Pets
CREATE (rocky:Pet {name: 'Rocky', species: 'Dog', breed: 'Golden Retriever'}),
       (bella:Pet {name: 'Bella', species: 'Cat', breed: 'Siamese'}),
       (max:Pet {name: 'Max', species: 'Parrot', breed: 'Macaw'}),
       (lucy:Pet {name: 'Lucy', species: 'Rabbit', breed: 'Holland Lop'})

// Products
CREATE (smartphone:Product {name: 'Smartphone', brand: 'TechBrand'}),
       (laptop:Product {name: 'Laptop', brand: 'CompTech'}),
       (book:Product {name: 'Book', title: 'The Great Adventure'})

// Activities
CREATE (jogging:Activity {name: 'Jogging', fact: 'Can increase lifespan by 3 years'}),
       (painting:Activity {name: 'Painting', fact: 'Improves mental health and creativity'}),
       (cooking:Activity {name: 'Cooking', fact: 'Is a form of therapeutic art'})

// relationships
CREATE (alice)-[:MARRIED_TO {MarriedOn: '2020-02-25'}]->(bob),
       (bob)-[:DIVORCED_FROM {DivorcedOn: '2023-09-02'}]->(alice),
       (carol)-[:SURPRISED]->(dave),
       (eve)-[:OWNS]->(rocky),
       (frank)-[:OWNS]->(bella),
       (grace)-[:OWNS]->(max),
       (hank)-[:OWNS]->(lucy),
       (irene)-[:FRIENDS_WITH {since: '2021-09-19'}]->(jack),
       (jack)-[:MARRIED_TO]->(irene),
       (alice)-[:BOUGHT]->(smartphone),
       (bob)-[:BOUGHT]->(laptop),
       (carol)-[:READS]->(book),
       (dave)-[:PARTICIPATES_IN]->(jogging),
       (eve)-[:PRACTICES]->(painting),
       (frank)-[:ENJOYS]->(cooking),
       (grace)-[:USES]->(smartphone),
       (hank)-[:WORKS_ON]->(laptop),
       (irene)-[:STUDIES]->(book),
       (jack)-[:EXERCISES_WITH]->(jogging),
       (max)-[:CARED_FOR_BY]->(grace),
       (lucy)-[:PLAYS_WITH]->(jack),
       (laptop)-[:NEEDED_FOR]->(cooking),
       (jogging)-[:LIKED_BY]->(hank),
       (painting)-[:CHOSEN_BY]->(carol),
       (cooking)-[:LEARNED_BY]->(alice),
       (hank)-[:FRIENDS_WITH]->(jack)
RETURN *


๐Ÿง  Query to Show the Graph:

match (n) return n
Common Neighbors Formula

โ›“๐Ÿ–‡ Link Prediction Algorithms

โš™๏ธ Adamic-Adar Algorithm

The Adamic-Adar algorithm was introduced in 2003 by Lada Adamic and Eytan Adar. It's designed to predict links in a social network by measuring the closeness between pairs of nodes. The algorithm calculates the similarity between nodes based on the commonality of their connections, using the following formula:

Adamic-Adar Formula

The formula essentially sums the inverse logarithm of the degree of common neighbors between two nodes. Here, N(u) represents the set of nodes adjacent to node u. A computed value of 0 indicates that two nodes are not close, suggesting no common neighbors, while higher values signify a closer relationship, indicating multiple shared connections.

Key Points:

  • Purpose: Predicts the likelihood of a future link between two nodes in a social network.
  • Interpretation: A higher Adamic-Adar score implies a stronger relationship or connection likelihood between the nodes based on their shared neighbors.
  • Usage: The library provides a function to calculate the closeness between two nodes.

๐Ÿ‘ฅ Common Neighbors Algorithm

The Common Neighbors algorithm is based on the premise that two nodes (individuals) that share a mutual friend or connection are more likely to be introduced than those without any mutual connections. This concept is fundamental in social network analysis for predicting potential links.

Common Neighbors Formula

The formula counts the number of shared neighbors between two nodes, where N(x) and N(y) represent the sets of nodes adjacent to nodes x and y, respectively. A computed value of 0 signifies no common neighbors, whereas higher values indicate a stronger connection due to multiple shared neighbors.

Key Points:

  • Purpose: Identifies potential connections between two nodes based on mutual acquaintances.
  • Interpretation: Higher values suggest a higher likelihood of forming a connection.
  • Usage: Function available in the library to calculate mutual connections.

๐Ÿ”— Preferential Attachment Algorithm

Preferential Attachment reflects the principle that the more connections a node has, the more likely it is to acquire new connections. This concept was popularized by Albert-Lรกszlรณ Barabรกsi and Rรฉka Albert and is crucial for understanding the growth of scale-free networks.

Preferential Attachment Formula

The formula calculates the product of the degrees (number of connections) of two nodes, indicating that nodes with higher degrees are more likely to connect. |N(u)| and |N(v)| denote the degrees of nodes u and v, respectively.

Key Points:

  • Purpose: Predicts the formation of new links between nodes based on existing connections.
  • Interpretation: Higher values indicate a greater potential for link formation.
  • Usage: Function available in the library for assessing potential link formation.

๐Ÿ’ก Resource Allocation Algorithm

Introduced in 2009 by Tao Zhou, Linyuan Lรผ, and Yi-Cheng Zhang, the Resource Allocation algorithm is a link prediction measure that simulates how resources would be distributed through shared neighbors between two nodes.

Resource Allocation Formula

This algorithm sums the inverse of the degree of common neighbors, similar to the Adamic-Adar algorithm but applied in a different context. N(u) represents the set of nodes adjacent to node u.

Key Points:

  • Purpose: Assesses the closeness of nodes based on how resources would be allocated through their common connections.
  • Interpretation: Higher values suggest a closer relationship due to efficient resource sharing.
  • Usage: Function available in the library for calculating resource allocation efficiency.

๐Ÿ‘ซ Same Community Algorithm

The Same Community algorithm determines whether two nodes belong to the same community, which is crucial for understanding the structure of networks and predicting future connections based on community membership.

Related Techniques and Metrics:

  • Production-quality: Ensures that the algorithm is suitable for use in real-world applications.

    • Conductance metric: Measures the quality of a graph partitioning into communities.

    • K-Core Decomposition: Identifies the core structure of the network by peeling off layers of the graph.

    • K-1 Coloring: A graph coloring method that helps in distinguishing communities.

    • K-Means Clustering: A partitioning method that can be used to group nodes into communities based on their features.

    • Label Propagation: Utilizes node labels to propagate and determine communities rapidly.

    • Leiden: An algorithm that refines the partitioning of a network into communities for higher quality results.

    • Local Clustering Coefficient: Measures the degree to which nodes in a graph tend to cluster together.

    • Louvain: A popular method for detecting communities based on modularity optimization.

    • Modularity metric: Quantifies the strength of division of a network into communities.

    • Modularity Optimization: Aims to maximize the modularity metric for better community detection.

    • Strongly Connected Components: Identifies groups of nodes where every node is reachable from every other node in the group.

    • Triangle Count: Can indicate tightly-knit groups of nodes, suggesting potential communities.

    • Weakly Connected Components: Finds groups of nodes connected by paths regardless of direction.

  • Alpha: A parameter in some algorithms that influences the detection of communities.

    • Approximate Maximum k-cut: A method for dividing the graph into k groups to maximize the weight of the edges within each group.

    • Speaker-Listener Label Propagation (SLPA): An algorithm that considers the dynamics of speaking and listening behaviors of nodes for community detection.

This method provides a binary outcome: 0 indicates nodes are in different communities, while 1 signifies that both nodes are within the same community.

Key Points:

  • Purpose: Identifies whether two nodes are part of the same social or functional group.
  • Interpretation: A value of 1 indicates a strong likelihood of connection due to shared community membership.
  • Usage: Function in the library for detecting community-based connections.

๐ŸŒ Total Neighbors Algorithm

Total Neighbors measures the closeness of nodes based on the total number of unique neighbors they have, underlining the idea that nodes with more connections are more likely to form new links.

Total Neighbors Formula

This formula calculates the combined unique set of neighbors for two nodes, where N(x) and N(y) represent the sets of nodes adjacent to nodes x and y, respectively.

Key Points:

  • Purpose: Evaluates potential connections based on the breadth of an individual's network.
  • Interpretation: Higher values indicate a greater likelihood of forming new connections.
  • Usage: Library function for assessing the extent of node connections.

๐Ÿš€ Let's Run the Algorithms

MATCH (p1:Person {name: 'Hank'}), (p2:Person {name: 'Jack'})
WITH p1, p2
RETURN 
    gds.alpha.linkprediction.adamicAdar(p1, p2) AS AdamicAdar,
    gds.alpha.linkprediction.commonNeighbors(p1, p2) AS CommonNeighbors,
    gds.alpha.linkprediction.preferentialAttachment(p1, p2) AS PreferentialAttachment,
    gds.alpha.linkprediction.resourceAllocation(p1, p2) AS ResourceAllocation,
    gds.alpha.linkprediction.sameCommunity(p1, p2) AS SameCommunity,
    gds.alpha.linkprediction.totalNeighbors(p1, p2) AS TotalNeighbors
Common Neighbors Formula

This Cypher Query is accessible by link prediction algorithms Query.cypher


Results in a single CSV file which is downloadable at link prediction Algorithms RESULT csv

AdamicAdar CommonNeighbors PreferentialAttachment ResourceAllocation SameCommunity TotalNeighbors
2.352934267515801 2.0 20.0 0.8333333333333333 0.0 6.0

Tip

One of the interesting topics nowadays is using Neo4j as a Graph Database for โœ… Event Extraction by LLM models.
So, keep researching this topic for your next step!
I also want to work on this topic for my Master's thesis.
I'd be really happy if we could connect if you're working in this area.
hello@sattari.org

linkedin logo ยท SATTARI.org

About

This Repository covers the final project of our five-week workshop on Graph Convolutional Networks, a niche within graph-based machine learning.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages