Graph Data - CSMODEL PDF
Document Details

Uploaded by PatientPermutation
Tags
Summary
This presentation provides an overview of graph data and related concepts such as graph embeddings. The discussion includes algorithms like DeepWalk and Node2Vec, and their applications in various fields. It also covers representations such as adjacency lists and matrices.
Full Transcript
Graph Data CSMODEL Graphs Graphs are a general language for describing systems of interacting entities. A graph is a collection of objects where some pairs of objects are connected by links 2 Graphs Networks (Natural Graphs): C...
Graph Data CSMODEL Graphs Graphs are a general language for describing systems of interacting entities. A graph is a collection of objects where some pairs of objects are connected by links 2 Graphs Networks (Natural Graphs): Communication systems link electronic devices Interaction between genes/proteins regulate life Thoughts are connected through neurons in our brain. Information Graphs: Information/knowledge are organized and linked Scene graphs: how objects in a scene relate Similarity networks: take data, connect similar points. 3 Graphs Components of a Graph Objects: nodes, vertices Interactions: links, edges System: network, graph 4 Graphs A graph is connected if every two vertex has a path between them. G1 is a connected graph, while G2 is not. 5 Networks or Graphs Network often refers to real systems Web, social network, metabolic network Jargon: Network, node, link Graph is a mathematical representation of a network Web graph, social graph, knowledge graph Jargon: Graph, vertex, edge 6 Undirected Graph 7 Undirected Graph Undirected graphs are composed of edges which do not have any specific direction. These edges, instead, represents a two-way relationship between the vertices. πΊπΊπΊπΊπΊπΊπΊπΊπΊ πΊπΊ1 8 Undirected Graph The undirected graph πΊπΊ can be represented as πΊπΊ = (ππ, πΈπΈ), where ππ is the set of vertices and πΈπΈ is the set of edges πΊπΊ1 = (ππ1 , πΈπΈ1 ) ππ1 = π΄π΄, π΅π΅, πΆπΆ, π·π·, πΈπΈ πΈπΈ1 = {(π΄π΄, πΆπΆ), (π΄π΄, π·π·), (π΅π΅, π·π·), (π΅π΅, πΈπΈ), (πΆπΆ, πΈπΈ)} πΊπΊπΊπΊπΊπΊπΊπΊπΊ πΊπΊ1 9 Undirected Graph Degree refers to the number of edges at a vertex. In πΊπΊ1 , all vertices have a degree 2. In πΊπΊ2 , all vertices have a degree 3. πΊπΊπΊπΊπΊπΊπΊπΊπΊ πΊπΊ1 πΊπΊπΊπΊπΊπΊπΊπΊπΊ πΊπΊ2 10 Undirected Graph The maximum number of edges in any n-vertex ππ ππβ1 undirected graph is. 2 An n-vertex undirected graph ππ ππβ1 with exactly edges is a 2 complete undirected graph. Graph πΊπΊ1 is a complete undirected graph 11 Directed Graph 12 Directed Graph Directed graphs are composed of edges which have specified direction. Thus, at most 2 edges of different directions might be used to connect 2 different vertices. πΊπΊπΊπΊπΊπΊπΊπΊπΊ πΊπΊ1 13 Directed Graph The directed graph πΊπΊ can be represented as πΊπΊ = (ππ, πΈπΈ), where ππ is the set of vertices and πΈπΈ is the set of edges πΊπΊ1 = (ππ1 , πΈπΈ1 ) ππ1 = π΄π΄, π΅π΅, πΆπΆ, π·π·, πΈπΈ, πΉπΉ, πΊπΊ, π»π» πΈπΈ1 = {< π΄π΄, πΆπΆ >, < π΅π΅, πΈπΈ >, < πΆπΆ, π΄π΄ >, < πΆπΆ, πΉπΉ >, < πΆπΆ, πΊπΊ >, < π·π·, πΉπΉ >, < π·π·, π»π» >, < πΈπΈ, π΅π΅ >, < πΈπΈ, πΊπΊ >, < πΈπΈ, π»π» >, < πΉπΉ, π·π· >, < πΊπΊ, πΉπΉ >, < πΊπΊ, π»π» >, < π»π», π·π· >} πΊπΊπΊπΊπΊπΊπΊπΊπΊ πΊπΊ1 14 Directed Graph Out-degree β number of arrows originating from a vertex Out-degree of vertex π΄π΄ is 1. Out-degree of vertex π·π· is 2. Out-degree of vertex πΆπΆ is 3. πΊπΊπΊπΊπΊπΊπΊπΊπΊ πΊπΊ1 15 Directed Graph In-degree β number of arrows pointing to a vertex In-degree of vertex π΅π΅ is 1. In-degree of vertex πΊπΊ is 2. In-degree of vertex πΉπΉ is 3. πΊπΊπΊπΊπΊπΊπΊπΊπΊ πΊπΊ1 16 Directed Graph The maximum number of edges in any n-vertex directed graph is ππ(ππ β 1). An n-vertex directed graph with exactly ππ(ππ β 1) edges is a complete directed graph. Graph πΊπΊ1 is a complete directed graph. πΊπΊπΊπΊπΊπΊπΊπΊπΊ πΊπΊ1 17 Weighted Graphs 18 Weighted Graphs Graphs for which each edge may have an associated weight, typically given by a weighted function π€π€: πΈπΈ β π π 19 Graph Representations 20 Graph Representations Collection of Adjacency Lists Adjacency list representation is usually preferred since it provides a compact way to represent sparse graphs (i.e. πΈπΈ < ππ 2 ). Adjacency Matrix Adjacency matrix representation is preferred if the graph is dense (i.e. πΈπΈ is close to ππ 2 ). 21 Graph Representations Adjacency List The adjacency list representation of graph πΊπΊ = (ππ, πΈπΈ) consists of an array π΄π΄ with ππ number of lists, one for each vertex in ππ. For each vertex π’π’ β ππ, the adjacency list π΄π΄ π’π’ contains all vertices π£π£ such that there is an edge π’π’, π£π£ β πΈπΈ. The adjacency list representationβs memory requirement is ππ ππ + πΈπΈ. 22 Graph Representations Adjacency List - Undirected In an undirected graph, the sum of the lengths of all adjacency lists is 2 πΈπΈ. Since if π’π’, π£π£ is an undirected edge, then π’π’ appears in π£π£βs adjacency list and vice-versa. 23 Graph Representations Adjacency List - Directed In a directed graph, the sum of the lengths of all the adjacency lists is πΈπΈ. Since an edge of the form < π’π’, π£π£ > is represented by having π£π£ appear in π΄π΄ π’π’. 24 Graph Representations Adjacency List β Weighted Undirected The weight π€π€ π’π’, π£π£ of the edge π’π’, π£π£ β πΈπΈ is stored with vertex π£π£ in π’π’βs adjacency list and vice versa 25 Graph Representations Adjacency List β Weighted Directed The weight π€π€ π’π’, π£π£ of the edge π’π’, π£π£ β πΈπΈ is stored with vertex π£π£ in π’π’βs adjacency list. 26 Graph Representations Adjacency Matrix The adjacency matrix representation of graph πΊπΊ = ππ, πΈπΈ consists of a ππ Γ ππ matrix π΄π΄ = ππππππ such that: 1 ππππ ππ, ππ ππ πΈπΈ ππππππ = 0 πππππππππππππππππ The adjacency matrix representation of a graph requires ππ ππ 2 memory, independent of the number of edges in the graph. 27 Adjacency Matrix β Undirected A B C D E F A 0 0 1 1 0 0 B 0 0 0 1 1 0 C 1 0 0 1 0 1 D 1 1 1 0 1 0 E 0 1 0 1 0 1 F 0 0 1 0 1 0 28 Adjacency Matrix β Undirected Weighted A B C D E F A 0 0 2 2 0 0 B 0 0 0 2 2 0 C 2 0 0 3 0 5 D 2 2 3 0 3 0 E 0 2 0 3 0 5 F 0 0 5 0 5 0 29 Adjacency Matrix - Directed A B C D E F G A 0 0 1 0 0 0 0 B 0 0 0 1 0 0 0 C 1 0 0 0 1 1 0 D 0 1 0 0 0 1 1 E 0 0 0 0 0 0 0 F 1 1 0 0 1 0 1 G 0 0 0 0 0 0 0 30 Adjacency Matrix β Directed Weighted A B C D E F G A 0 0 1 0 0 0 0 B 0 0 0 1 0 0 0 C 1 0 0 0 3 5 0 D 0 1 0 0 0 5 3 E 0 0 0 0 0 0 0 F 6 6 0 0 3 0 3 G 0 0 0 0 0 0 0 31 Graph Embeddings 32 Graph Embedding A graph embedding transforms graphs to a lower dimensional representation of the graph while preserving its topology. Its goal is to turn graphs into a format that machine learning algorithms can understand and process. Machine learning algorithms are tuned for continuous data; thus we need to convert graphs, which are discrete by nature, in a continuous vector space. 33 Graph Embeddings Recent algorithms used to produce graph embeddings: DeepWalk (Perozzi et al., 2014) Node2Vec (Grover & Leskovec, 2016) 34 Graph Embeddings DeepWalk (Perozzi et al., 2014) Deepwalk belongs to the family of graph embedding techniques that uses walks. 35 Graph Embeddings DeepWalk (Perozzi et al., 2014) Graphs are like texts. the dog is cute cat also red but blue not βcatβ 0 0 0 0 1 0 0 0 0 0 A B C D E F G H I J A 0 0 0 1 0 1 0 0 0 0 36 Graph Embeddings DeepWalk (Perozzi et al., 2014) Language modeling estimates the likelihood of a specific sequence of words appearing in a corpus. Suppose we have a sequence of words: ππ1ππ = (π€π€0 , π€π€1 , β¦ , π€π€ππ ) We want to estimate the likelihood of: P(π€π€ππ |π€π€0 , π€π€1 , β¦ , π€π€ππβ1 ) 37 Graph Embeddings DeepWalk (Perozzi et al., 2014) DeepWalk generalizes language modeling to explore the graph through a stream of short random walks Suppose we have a sequence of visited vertices: ππ1ππ = (π£π£0 , π£π£1 , β¦ , π£π£ππ ) We want to estimate the likelihood of: P(π£π£ππ |π£π£0 , π£π£1 , β¦ , π£π£ππβ1 ) 38 Graph Embeddings DeepWalk (Perozzi et al., 2014) The goal is to learn a latent representation, not only a probability distribution of node co-occurrences. Thus, they introduced the mapping function: Ξ¦: π£π£ β ππ β β|ππ|Γππ which represents the latent social representation associated with each vertex π£π£ in the graph. 39 Graph Embeddings DeepWalk (Perozzi et al., 2014) Thus, DeepWalk estimates the likelihood: P (Ξ¦(π£π£ππ )|Ξ¦(π£π£0 ), Ξ¦(π£π£1 ), β¦ , Ξ¦(π£π£ππβ1 )) The goal is to estimate the likelihood of observing node π£π£ππ given all the previous nodes visited so far in the random walk. 40 Graph Embeddings DeepWalk (Perozzi et al., 2014) 41 Graph Embeddings Node2Vec (Grover & Leskovec, 2016) Node2vec is one of the first Deep Learning attempts to learn embedding from graph data. Node2Vec, like DeepWalk, utilizes walks to learn graph embeddings. Compared to DeepWalk, Node2vec incorporates a search bias variable πΌπΌ, which is parameterized by ππ and ππ, which allows it to interpolate between BFS and DFS. 42 Graph Embeddings Node2Vec (Grover & Leskovec, 2016) Formally, given a source node π’π’, simulate a random walk of fixed length ππ. Let ππππ denote the ππth node in the walk, starting with ππ0 = π’π’. Nodes ππππ are generated by the following distribution: πππ£π£π£π£ , ππππ π£π£, π₯π₯ β πΈπΈ ππ ππππ = π₯π₯ ππππβ1 = π£π£) = ππ 0, πππππππππππππππππ where ππ π₯π₯, π£π£ is the transition probability between π£π£ and π₯π₯ 43 Graph Embeddings Node2Vec (Grover & Leskovec, 2016) Suppose the walk has just traversed the edge (π‘π‘, π£π£) and now resides at node π£π£. The walk needs to decide on the next step to evaluate the transition probability πππ£π£π£π£ on edges (π£π£, π₯π₯) leading to π£π£. πππ£π£π£π£ = πΌπΌππππ (π‘π‘, π₯π₯) Γ π€π€π£π£π£π£ where π€π€π£π£π£π£ is the weight of the edge going from π£π£ to π₯π₯ 44 Graph Embeddings Node2Vec (Grover & Leskovec, 2016) Search Bias 1 ππππ ππ = 0 ππ π‘π‘π‘π‘ πΌπΌππππ π‘π‘, π₯π₯ = 1 ππππ πππ‘π‘π‘π‘ = 1 1 ππππ ππ = 2 ππ π‘π‘π‘π‘ where πππ‘π‘π‘π‘ denotes the shortest path distance between nodes π‘π‘ and π₯π₯ 45 Graph Embeddings Node2Vec (Grover & Leskovec, 2016) BFS is ideal for learning local neighbors. The neighborhood πππ π is restricted to nodes which are immediate neighbors of the source. 46 Graph Embeddings Node2Vec (Grover & Leskovec, 2016) DFS is better for learning global variables. The neighborhood consists of nodes sequentially sampled at increasing distances from the source node. 47 Applications 48 Applications Network Compression 49 Applications Clustering 50 Applications Link Prediction 51 Applications Node Classification 52 Graph Data CSMODEL