Reading and Writing the Apache Parquet Format

To Nha Notes | Sept. 15, 2022, 9:40 p.m.

Reading and Writing the Apache Parquet Format

https://arrow.apache.org/docs/python/parquet.html
 

Elapsed time

chunk size :      100 rows, elapsed time :   52.745 s
chunk size :     1000 rows, elapsed time :   23.624 s
chunk size :    10000 rows, elapsed time :   21.460 s
chunk size :   100000 rows, elapsed time :   21.470 s
chunk size :  1000000 rows, elapsed time :   21.929 s
Maximum memory usage
chunk size :      100 rows, max memory usage :  114.285 MB
chunk size :     1000 rows, max memory usage :  116.504 MB
chunk size :    10000 rows, max memory usage :  145.227 MB
chunk size :   100000 rows, max memory usage :  424.836 MB
chunk size :  1000000 rows, max memory usage : 2111.645 MB
 
How to merge parquet files

https://gist.github.com/l1x/76dab6445b6d55396c622f915c755a17

https://gist.github.com/NickCrews/7a47ef4083160011e8e533531d73428c

https://splunktool.com/merge-parquet-files-with-different-schema-using-pandas-and-dask

Using pyarrow how do you append to parquet file?

https://www.appsloveworld.com/pandas/100/440/pandas-merge-parquet-files-with-different-column-dtypes-write-parquet-with-pre