pyspark.sql.DataFrame.drop#
- DataFrame.drop(*cols)[source]#
Returns a new
DataFrame
without specified columns. This is a no-op if the schema doesn’t contain the given column name(s).New in version 1.4.0.
Changed in version 3.4.0: Supports Spark Connect.
- Parameters
- cols: str or :class:`Column`
A name of the column, or the
Column
to be dropped.
- Returns
Notes
When an input is a column name, it is treated literally without further interpretation. Otherwise, it will try to match the equivalent expression. So dropping a column by its name drop(colName) has a different semantic with directly dropping the column drop(col(colName)).
Examples
Example 1: Drop a column by name.
>>> df = spark.createDataFrame( ... [(14, "Tom"), (23, "Alice"), (16, "Bob")], ["age", "name"]) >>> df.drop('age').show() +-----+ | name| +-----+ | Tom| |Alice| | Bob| +-----+
Example 2: Drop a column by
Column
object.>>> df.drop(df.age).show() +-----+ | name| +-----+ | Tom| |Alice| | Bob| +-----+
Example 3: Drop the column that joined both DataFrames on.
>>> df2 = spark.createDataFrame([(80, "Tom"), (85, "Bob")], ["height", "name"]) >>> df.join(df2, df.name == df2.name).drop('name').sort('age').show() +---+------+ |age|height| +---+------+ | 14| 80| | 16| 85| +---+------+
>>> df3 = df.join(df2) >>> df3.show() +---+-----+------+----+ |age| name|height|name| +---+-----+------+----+ | 14| Tom| 80| Tom| | 14| Tom| 85| Bob| | 23|Alice| 80| Tom| | 23|Alice| 85| Bob| | 16| Bob| 80| Tom| | 16| Bob| 85| Bob| +---+-----+------+----+
Example 4: Drop two column by the same name.
>>> df3.drop("name").show() +---+------+ |age|height| +---+------+ | 14| 80| | 14| 85| | 23| 80| | 23| 85| | 16| 80| | 16| 85| +---+------+
Example 5: Can not drop col(‘name’) due to ambiguous reference.
>>> from pyspark.sql import functions as sf >>> df3.drop(sf.col("name")).show() Traceback (most recent call last): ... pyspark.errors.exceptions.captured.AnalysisException: [AMBIGUOUS_REFERENCE] Reference...
Example 6: Can not find a column matching the expression “a.b.c”.
>>> from pyspark.sql import functions as sf >>> df4 = df.withColumn("a.b.c", sf.lit(1)) >>> df4.show() +---+-----+-----+ |age| name|a.b.c| +---+-----+-----+ | 14| Tom| 1| | 23|Alice| 1| | 16| Bob| 1| +---+-----+-----+
>>> df4.drop("a.b.c").show() +---+-----+ |age| name| +---+-----+ | 14| Tom| | 23|Alice| | 16| Bob| +---+-----+
>>> df4.drop(sf.col("a.b.c")).show() +---+-----+-----+ |age| name|a.b.c| +---+-----+-----+ | 14| Tom| 1| | 23|Alice| 1| | 16| Bob| 1| +---+-----+-----+