To Nha Notes | Sept. 15, 2022, 9:40 p.m.
https://arrow.apache.org/docs/python/parquet.html
Elapsed time
chunk size : 100 rows, elapsed time : 52.745 s chunk size : 1000 rows, elapsed time : 23.624 s chunk size : 10000 rows, elapsed time : 21.460 s chunk size : 100000 rows, elapsed time : 21.470 s chunk size : 1000000 rows, elapsed time : 21.929 s
chunk size : 100 rows, max memory usage : 114.285 MB chunk size : 1000 rows, max memory usage : 116.504 MB chunk size : 10000 rows, max memory usage : 145.227 MB chunk size : 100000 rows, max memory usage : 424.836 MB chunk size : 1000000 rows, max memory usage : 2111.645 MB
https://gist.github.com/l1x/76dab6445b6d55396c622f915c755a17
https://gist.github.com/NickCrews/7a47ef4083160011e8e533531d73428c
https://splunktool.com/merge-parquet-files-with-different-schema-using-pandas-and-dask