Catalog.
refreshByPath
Invalidates and refreshes all the cached data (and the associated metadata) for any DataFrame that contains the given data source path.
New in version 2.2.0.
the path to refresh the cache.
Examples
The example below caches a table, and then removes the data.
>>> import tempfile >>> with tempfile.TemporaryDirectory() as d: ... _ = spark.sql("DROP TABLE IF EXISTS tbl1") ... _ = spark.sql( ... "CREATE TABLE tbl1 (col STRING) USING TEXT LOCATION '{}'".format(d)) ... _ = spark.sql("INSERT INTO tbl1 SELECT 'abc'") ... spark.catalog.cacheTable("tbl1") ... spark.table("tbl1").show() +---+ |col| +---+ |abc| +---+
Because the table is cached, it computes from the cached data as below.
>>> spark.table("tbl1").count() 1
After refreshing the table by path, it shows 0 because the data does not exist anymore.
>>> spark.catalog.refreshByPath(d) >>> spark.table("tbl1").count() 0
>>> _ = spark.sql("DROP TABLE tbl1")