pyspark.pandas.Index.drop_duplicates¶
-
Index.
drop_duplicates
(keep: Union[bool, str] = 'first') → pyspark.pandas.indexes.base.Index[source]¶ Return Index with duplicate values removed.
- Parameters
- keep{‘first’, ‘last’,
False
}, default ‘first’ Method to handle dropping duplicates: - ‘first’ : Drop duplicates except for the first occurrence. - ‘last’ : Drop duplicates except for the last occurrence. -
False
: Drop all duplicates.
- keep{‘first’, ‘last’,
- Returns
- deduplicatedIndex
See also
Series.drop_duplicates
Equivalent method on Series.
DataFrame.drop_duplicates
Equivalent method on DataFrame.
Examples
Generate an Index with duplicate values.
>>> idx = ps.Index(['lama', 'cow', 'lama', 'beetle', 'lama', 'hippo'])
>>> idx.drop_duplicates().sort_values() Index(['beetle', 'cow', 'hippo', 'lama'], dtype='object')