pyspark.pandas.
read_json
Convert a JSON string to DataFrame.
File path
Read the file as a JSON object per line. It should be always True for now.
Index column of table in Spark.
All other options passed directly into Spark’s data source.
Examples
>>> df = ps.DataFrame([['a', 'b'], ['c', 'd']], ... columns=['col 1', 'col 2'])
>>> df.to_json(path=r'%s/read_json/foo.json' % path, num_files=1) >>> ps.read_json( ... path=r'%s/read_json/foo.json' % path ... ).sort_values(by="col 1") col 1 col 2 0 a b 1 c d
>>> df.to_json(path=r'%s/read_json/foo.json' % path, num_files=1, lineSep='___') >>> ps.read_json( ... path=r'%s/read_json/foo.json' % path, lineSep='___' ... ).sort_values(by="col 1") col 1 col 2 0 a b 1 c d
You can preserve the index in the roundtrip as below.
>>> df.to_json(path=r'%s/read_json/bar.json' % path, num_files=1, index_col="index") >>> ps.read_json( ... path=r'%s/read_json/bar.json' % path, index_col="index" ... ).sort_values(by="col 1") col 1 col 2 index 0 a b 1 c d