pyspark.RDD.fullOuterJoin#

RDD.fullOuterJoin(other, numPartitions=None)[source]#

Perform a right outer join of self and other.

For each element (k, v) in self, the resulting RDD will either contain all pairs (k, (v, w)) for w in other, or the pair (k, (v, None)) if no elements in other have key k.

Similarly, for each element (k, w) in other, the resulting RDD will either contain all pairs (k, (v, w)) for v in self, or the pair (k, (None, w)) if no elements in self have key k.

Hash-partitions the resulting RDD into the given number of partitions.

New in version 1.2.0.

Parameters
otherRDD

another RDD

numPartitionsint, optional

the number of partitions in new RDD

Returns
RDD

a RDD containing all pairs of elements with matching keys

Examples

>>> rdd1 = sc.parallelize([("a", 1), ("b", 4)])
>>> rdd2 = sc.parallelize([("a", 2), ("c", 8)])
>>> sorted(rdd1.fullOuterJoin(rdd2).collect())
[('a', (1, 2)), ('b', (4, None)), ('c', (None, 8))]