pyspark.sql.functions.sum_distinct

pyspark.sql.functions.sum_distinct(col: ColumnOrName) → pyspark.sql.column.Column[source]

Aggregate function: returns the sum of distinct values in the expression.

New in version 3.2.0.

Changed in version 3.4.0: Supports Spark Connect.

Parameters
colColumn or str

target column to compute on.

Returns
Column

the column for computed results.

Examples

>>> df = spark.createDataFrame([(None,), (1,), (1,), (2,)], schema=["numbers"])
>>> df.select(sum_distinct(col("numbers"))).show()
+---------------------+
|sum(DISTINCT numbers)|
+---------------------+
|                    3|
+---------------------+