RDD.
subtract
Return each value in self that is not contained in other.
New in version 0.9.1.
RDD
another RDD
the number of partitions in new RDD
a RDD with the elements from this that are not in other
See also
RDD.subtractByKey()
Examples
>>> rdd1 = sc.parallelize([("a", 1), ("b", 4), ("b", 5), ("a", 3)]) >>> rdd2 = sc.parallelize([("a", 3), ("c", None)]) >>> sorted(rdd1.subtract(rdd2).collect()) [('a', 1), ('b', 4), ('b', 5)]