⚛︎ Graph Database Creation in Neo4j

To create your database you can use this Cypher Query. nodes + relationships

// Person nodes
CREATE (alice:Person {name: 'Alice', gender: 'Female', age: 30, bornIn: 'New York'}),
       (bob:Person {name: 'Bob', gender: 'Male', age: 32, bornIn: 'London'}),
       (carol:Person {name: 'Carol', gender: 'Female', age: 25, bornIn: 'Sydney'}),
       (dave:Person {name: 'Dave', gender: 'Male', age: 28, bornIn: 'Toronto'}),
       (eve:Person {name: 'Eve', gender: 'Female', age: 22, bornIn: 'Paris'}),
       (frank:Person {name: 'Frank', gender: 'Male', age: 35, bornIn: 'Berlin'}),
       (grace:Person {name: 'Grace', gender: 'Female', age: 29, bornIn: 'Tokyo'}),
       (hank:Person {name: 'Hank', gender: 'Male', age: 40, bornIn: 'Cape Town'}),
       (irene:Person {name: 'Irene', gender: 'Female', age: 24, bornIn: 'Moscow'}),
       (jack:Person {name: 'Jack', gender: 'Male', age: 27, bornIn: 'Rio de Janeiro'})

// Pets
CREATE (rocky:Pet {name: 'Rocky', species: 'Dog', breed: 'Golden Retriever'}),
       (bella:Pet {name: 'Bella', species: 'Cat', breed: 'Siamese'}),
       (max:Pet {name: 'Max', species: 'Parrot', breed: 'Macaw'}),
       (lucy:Pet {name: 'Lucy', species: 'Rabbit', breed: 'Holland Lop'})

// Products
CREATE (smartphone:Product {name: 'Smartphone', brand: 'TechBrand'}),
       (laptop:Product {name: 'Laptop', brand: 'CompTech'}),
       (book:Product {name: 'Book', title: 'The Great Adventure'})

// Activities
CREATE (jogging:Activity {name: 'Jogging', fact: 'Can increase lifespan by 3 years'}),
       (painting:Activity {name: 'Painting', fact: 'Improves mental health and creativity'}),
       (cooking:Activity {name: 'Cooking', fact: 'Is a form of therapeutic art'})

// relationships
CREATE (alice)-[:MARRIED_TO {MarriedOn: '2020-02-25'}]->(bob),
       (bob)-[:DIVORCED_FROM {DivorcedOn: '2023-09-02'}]->(alice),
       (carol)-[:SURPRISED]->(dave),
       (eve)-[:OWNS]->(rocky),
       (frank)-[:OWNS]->(bella),
       (grace)-[:OWNS]->(max),
       (hank)-[:OWNS]->(lucy),
       (irene)-[:FRIENDS_WITH {since: '2021-09-19'}]->(jack),
       (jack)-[:MARRIED_TO]->(irene),
       (alice)-[:BOUGHT]->(smartphone),
       (bob)-[:BOUGHT]->(laptop),
       (carol)-[:READS]->(book),
       (dave)-[:PARTICIPATES_IN]->(jogging),
       (eve)-[:PRACTICES]->(painting),
       (frank)-[:ENJOYS]->(cooking),
       (grace)-[:USES]->(smartphone),
       (hank)-[:WORKS_ON]->(laptop),
       (irene)-[:STUDIES]->(book),
       (jack)-[:EXERCISES_WITH]->(jogging),
       (max)-[:CARED_FOR_BY]->(grace),
       (lucy)-[:PLAYS_WITH]->(jack),
       (laptop)-[:NEEDED_FOR]->(cooking),
       (jogging)-[:LIKED_BY]->(hank),
       (painting)-[:CHOSEN_BY]->(carol),
       (cooking)-[:LEARNED_BY]->(alice),
       (hank)-[:FRIENDS_WITH]->(jack)
RETURN *

This Cypher Query is accessible by database creation Query.cypher

🧠 Query to Show the Graph:

match (n) return n

⛓🖇 Link Prediction Algorithms

⚙️ Adamic-Adar Algorithm

The Adamic-Adar algorithm was introduced in 2003 by Lada Adamic and Eytan Adar. It's designed to predict links in a social network by measuring the closeness between pairs of nodes. The algorithm calculates the similarity between nodes based on the commonality of their connections, using the following formula:

The formula essentially sums the inverse logarithm of the degree of common neighbors between two nodes. Here, N(u) represents the set of nodes adjacent to node u. A computed value of 0 indicates that two nodes are not close, suggesting no common neighbors, while higher values signify a closer relationship, indicating multiple shared connections.

Key Points:

Purpose: Predicts the likelihood of a future link between two nodes in a social network.
Interpretation: A higher Adamic-Adar score implies a stronger relationship or connection likelihood between the nodes based on their shared neighbors.
Usage: The library provides a function to calculate the closeness between two nodes.

👥 Common Neighbors Algorithm

The Common Neighbors algorithm is based on the premise that two nodes (individuals) that share a mutual friend or connection are more likely to be introduced than those without any mutual connections. This concept is fundamental in social network analysis for predicting potential links.

The formula counts the number of shared neighbors between two nodes, where N(x) and N(y) represent the sets of nodes adjacent to nodes x and y, respectively. A computed value of 0 signifies no common neighbors, whereas higher values indicate a stronger connection due to multiple shared neighbors.

Key Points:

Purpose: Identifies potential connections between two nodes based on mutual acquaintances.
Interpretation: Higher values suggest a higher likelihood of forming a connection.
Usage: Function available in the library to calculate mutual connections.

🔗 Preferential Attachment Algorithm

Preferential Attachment reflects the principle that the more connections a node has, the more likely it is to acquire new connections. This concept was popularized by Albert-László Barabási and Réka Albert and is crucial for understanding the growth of scale-free networks.

The formula calculates the product of the degrees (number of connections) of two nodes, indicating that nodes with higher degrees are more likely to connect. |N(u)| and |N(v)| denote the degrees of nodes u and v, respectively.

Key Points:

Purpose: Predicts the formation of new links between nodes based on existing connections.
Interpretation: Higher values indicate a greater potential for link formation.
Usage: Function available in the library for assessing potential link formation.

💡 Resource Allocation Algorithm

Introduced in 2009 by Tao Zhou, Linyuan Lü, and Yi-Cheng Zhang, the Resource Allocation algorithm is a link prediction measure that simulates how resources would be distributed through shared neighbors between two nodes.

This algorithm sums the inverse of the degree of common neighbors, similar to the Adamic-Adar algorithm but applied in a different context. N(u) represents the set of nodes adjacent to node u.

Key Points:

Purpose: Assesses the closeness of nodes based on how resources would be allocated through their common connections.
Interpretation: Higher values suggest a closer relationship due to efficient resource sharing.
Usage: Function available in the library for calculating resource allocation efficiency.

👫 Same Community Algorithm

The Same Community algorithm determines whether two nodes belong to the same community, which is crucial for understanding the structure of networks and predicting future connections based on community membership.

Related Techniques and Metrics:

Production-quality: Ensures that the algorithm is suitable for use in real-world applications.
- Conductance metric: Measures the quality of a graph partitioning into communities.
- K-Core Decomposition: Identifies the core structure of the network by peeling off layers of the graph.
- K-1 Coloring: A graph coloring method that helps in distinguishing communities.
- K-Means Clustering: A partitioning method that can be used to group nodes into communities based on their features.
- Label Propagation: Utilizes node labels to propagate and determine communities rapidly.
- Leiden: An algorithm that refines the partitioning of a network into communities for higher quality results.
- Local Clustering Coefficient: Measures the degree to which nodes in a graph tend to cluster together.
- Louvain: A popular method for detecting communities based on modularity optimization.
- Modularity metric: Quantifies the strength of division of a network into communities.
- Modularity Optimization: Aims to maximize the modularity metric for better community detection.
- Strongly Connected Components: Identifies groups of nodes where every node is reachable from every other node in the group.
- Triangle Count: Can indicate tightly-knit groups of nodes, suggesting potential communities.
- Weakly Connected Components: Finds groups of nodes connected by paths regardless of direction.
Alpha: A parameter in some algorithms that influences the detection of communities.
- Approximate Maximum k-cut: A method for dividing the graph into k groups to maximize the weight of the edges within each group.
- Speaker-Listener Label Propagation (SLPA): An algorithm that considers the dynamics of speaking and listening behaviors of nodes for community detection.

This method provides a binary outcome: 0 indicates nodes are in different communities, while 1 signifies that both nodes are within the same community.

Key Points:

Purpose: Identifies whether two nodes are part of the same social or functional group.
Interpretation: A value of 1 indicates a strong likelihood of connection due to shared community membership.
Usage: Function in the library for detecting community-based connections.

🌐 Total Neighbors Algorithm

Total Neighbors measures the closeness of nodes based on the total number of unique neighbors they have, underlining the idea that nodes with more connections are more likely to form new links.

This formula calculates the combined unique set of neighbors for two nodes, where N(x) and N(y) represent the sets of nodes adjacent to nodes x and y, respectively.

Key Points:

Purpose: Evaluates potential connections based on the breadth of an individual's network.
Interpretation: Higher values indicate a greater likelihood of forming new connections.
Usage: Library function for assessing the extent of node connections.

🚀 Let's Run the Algorithms

MATCH (p1:Person {name: 'Hank'}), (p2:Person {name: 'Jack'})
WITH p1, p2
RETURN 
    gds.alpha.linkprediction.adamicAdar(p1, p2) AS AdamicAdar,
    gds.alpha.linkprediction.commonNeighbors(p1, p2) AS CommonNeighbors,
    gds.alpha.linkprediction.preferentialAttachment(p1, p2) AS PreferentialAttachment,
    gds.alpha.linkprediction.resourceAllocation(p1, p2) AS ResourceAllocation,
    gds.alpha.linkprediction.sameCommunity(p1, p2) AS SameCommunity,
    gds.alpha.linkprediction.totalNeighbors(p1, p2) AS TotalNeighbors

This Cypher Query is accessible by link prediction algorithms Query.cypher

Results in a single CSV file which is downloadable at link prediction Algorithms RESULT csv

AdamicAdar	CommonNeighbors	PreferentialAttachment	ResourceAllocation	SameCommunity	TotalNeighbors
2.352934267515801	2.0	20.0	0.8333333333333333	0.0	6.0

Tip

One of the interesting topics nowadays is using Neo4j as a Graph Database for ✅ Event Extraction by LLM models.
So, keep researching this topic for your next step!
I also want to work on this topic for my Master's thesis.
I'd be really happy if we could connect if you're working in this area.
hello@sattari.org

·

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
Images		Images
Database Creation Query.cypher		Database Creation Query.cypher
README.md		README.md
link prediction Algorithms RESULT.csv		link prediction Algorithms RESULT.csv
link prediction algorithms Query.cypher		link prediction algorithms Query.cypher

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Table of Contents

⚛︎ Graph Database Creation in Neo4j

🧠 Query to Show the Graph:

⛓🖇 Link Prediction Algorithms

⚙️ Adamic-Adar Algorithm

Key Points:

👥 Common Neighbors Algorithm

Key Points:

🔗 Preferential Attachment Algorithm

Key Points:

💡 Resource Allocation Algorithm

Key Points:

👫 Same Community Algorithm

Related Techniques and Metrics:

Key Points:

🌐 Total Neighbors Algorithm

Key Points:

🚀 Let's Run the Algorithms

Results in a single CSV file which is downloadable at link prediction Algorithms RESULT csv

About

Releases

Packages

Languages

pouyasattari/Graph-Link-Prediction-algorithms-by-Cypher-Query-in-Neo4j

Folders and files

Latest commit

History

Repository files navigation

Table of Contents

⚛︎ Graph Database Creation in Neo4j

🧠 Query to Show the Graph:

⛓🖇 Link Prediction Algorithms

⚙️ Adamic-Adar Algorithm

Key Points:

👥 Common Neighbors Algorithm

Key Points:

🔗 Preferential Attachment Algorithm

Key Points:

💡 Resource Allocation Algorithm

Key Points:

👫 Same Community Algorithm

Related Techniques and Metrics:

Key Points:

🌐 Total Neighbors Algorithm

Key Points:

🚀 Let's Run the Algorithms

Results in a single CSV file which is downloadable at link prediction Algorithms RESULT csv

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages