pyspark.sql.functions.sum_distinct¶
-
pyspark.sql.functions.
sum_distinct
(col: ColumnOrName) → pyspark.sql.column.Column[source]¶ Aggregate function: returns the sum of distinct values in the expression.
New in version 3.2.0.
Changed in version 3.4.0: Supports Spark Connect.
- Parameters
- col
Column
or str target column to compute on.
- col
- Returns
Column
the column for computed results.
Examples
>>> df = spark.createDataFrame([(None,), (1,), (1,), (2,)], schema=["numbers"]) >>> df.select(sum_distinct(col("numbers"))).show() +---------------------+ |sum(DISTINCT numbers)| +---------------------+ | 3| +---------------------+