A single directed edge consisting of a source id, target id, and the data associated with the edge.
A single directed edge consisting of a source id, target id, and the data associated with the edge.
type of the edge attribute
The vertex id of the source vertex
The vertex id of the target vertex
The attribute associated with the edge
Represents an edge along with its neighboring vertices and allows sending messages along the edge.
Represents an edge along with its neighboring vertices and allows sending messages along the edge. Used in Graph#aggregateMessages.
The direction of a directed edge relative to a vertex.
EdgeRDD[ED, VD]
extends RDD[Edge[ED]]
by storing the edges in columnar format on each
partition for performance.
EdgeRDD[ED, VD]
extends RDD[Edge[ED]]
by storing the edges in columnar format on each
partition for performance. It may additionally store the vertex attributes associated with each
edge to provide the triplet view. Shipping of the vertex attributes is managed by
impl.ReplicatedVertexView
.
An edge triplet represents an edge along with the vertex attributes of its neighboring vertices.
An edge triplet represents an edge along with the vertex attributes of its neighboring vertices.
the type of the vertex attribute.
the type of the edge attribute
The Graph abstractly represents a graph with arbitrary objects associated with vertices and edges.
The Graph abstractly represents a graph with arbitrary objects associated with vertices and edges. The graph provides basic operations to access and manipulate the data associated with vertices and edges as well as the underlying structure. Like Spark RDDs, the graph is a functional data-structure in which mutating operations return new graphs.
the vertex attribute type
the edge attribute type
GraphOps contains additional convenience operations and graph algorithms.
Contains additional functionality for Graph.
Contains additional functionality for Graph. All operations are expressed in terms of the efficient GraphX API. This class is implicitly constructed for each Graph object.
the vertex attribute type
the edge attribute type
Integer identifier of a graph partition.
Integer identifier of a graph partition. Must be less than 2^30.
Represents the way edges are assigned to edge partitions based on their source and destination vertex IDs.
A 64-bit vertex identifier that uniquely identifies a vertex within a graph.
A 64-bit vertex identifier that uniquely identifies a vertex within a graph. It does not need to follow any ordering or any constraints other than uniqueness.
Extends RDD[(VertexId, VD)]
by ensuring that there is only one entry for each vertex and by
pre-indexing the entries for fast, efficient joins.
Extends RDD[(VertexId, VD)]
by ensuring that there is only one entry for each vertex and by
pre-indexing the entries for fast, efficient joins. Two VertexRDDs with the same index can be
joined efficiently. All operations except reindex preserve the index. To construct a
VertexRDD
, use the VertexRDD object.
Additionally, stores routing information to enable joining the vertex attributes with an EdgeRDD.
the vertex attribute associated with each vertex in the set.
Construct a VertexRDD
from a plain RDD:
// Construct an initial vertex set val someData: RDD[(VertexId, SomeType)] = loadData(someFile) val vset = VertexRDD(someData) // If there were redundant values in someData we would use a reduceFunc val vset2 = VertexRDD(someData, reduceFunc) // Finally we can use the VertexRDD to index another dataset val otherData: RDD[(VertexId, OtherType)] = loadData(otherFile) val vset3 = vset2.innerJoin(otherData) { (vid, a, b) => b } // Now we can construct very fast joins between the two sets val vset4: VertexRDD[(SomeType, OtherType)] = vset.leftJoin(vset3)
A set of EdgeDirections.
The Graph object contains a collection of routines used to construct graphs from RDDs.
Provides utilities for loading Graphs from files.
Collection of built-in PartitionStrategy implementations.
Implements a Pregel-like bulk-synchronous message-passing API.
Implements a Pregel-like bulk-synchronous message-passing API.
Unlike the original Pregel API, the GraphX Pregel API factors the sendMessage computation over edges, enables the message sending computation to read both vertex attributes, and constrains messages to the graph structure. These changes allow for substantially more efficient distributed execution while also exposing greater flexibility for graph-based computation.
We can use the Pregel abstraction to implement PageRank:
val pagerankGraph: Graph[Double, Double] = graph // Associate the degree with each vertex .outerJoinVertices(graph.outDegrees) { (vid, vdata, deg) => deg.getOrElse(0) } // Set the weight on the edges based on the degree .mapTriplets(e => 1.0 / e.srcAttr) // Set the vertex attributes to the initial pagerank values .mapVertices((id, attr) => 1.0) def vertexProgram(id: VertexId, attr: Double, msgSum: Double): Double = resetProb + (1.0 - resetProb) * msgSum def sendMessage(id: VertexId, edge: EdgeTriplet[Double, Double]): Iterator[(VertexId, Double)] = Iterator((edge.dstId, edge.srcAttr * edge.attr)) def messageCombiner(a: Double, b: Double): Double = a + b val initialMessage = 0.0 // Execute Pregel for a fixed number of iterations. Pregel(pagerankGraph, initialMessage, numIter)( vertexProgram, sendMessage, messageCombiner)
The VertexRDD singleton is used to construct VertexRDDs.
Various analytics functions for graphs.
Collections of utilities used by graphx.
ALPHA COMPONENT GraphX is a graph processing framework built on top of Spark.