How can I use AWS DMS to migrate data to Amazon S3 in Parquet format?

To Nha Notes | Oct. 5, 2022, 10:20 a.m.

How can I use AWS Database Migration Service (AWS DMS) to migrate data in Apache Parquet (.parquet) format to Amazon Simple Storage Service (Amazon S3)?

u can use AWS DMS to migrate data to an S3 bucket in Apache Parquet format if you use replication 3.1.3 or a more recent version. The default Parquet version is Parquet 1.0.

Create a target Amazon SE endpoint from the AWS DMS Console, and then add an extra connection attribute (ECA), as follows. Also, check the other extra connection attributes that you can use for storing parquet objects in an S3 target.

dataFormat=parquet;

Use the following extra connection attribute to specify the Parquet version of output file:

parquetVersion=PARQUET_2_0;

After you have the output in Parquet format, you can parse the output file by installing the Apache Parquet command line tool:

pip install parquet-cli --user

Then, inspect the file format:

parq LOAD00000001.parquet 
 # Metadata 
 <pyarrow._parquet.FileMetaData object at 0x10e948aa0>
  created_by: AWS
  num_columns: 2
  num_rows: 2
  num_row_groups: 1
  format_version: 1.0
  serialized_size: 169
References

https://aws.amazon.com/premiumsupport/knowledge-center/dms-s3-parquet-format/

https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Target.S3.html#CHAP_Target.S3.Configuring