Querying S3 Data Directly via DuckDB: A Quick Guide

To Nha Notes | Aug. 9, 2024, 1:43 p.m.

DuckDB allows direct querying of S3 data, making it efficient for large-scale data analysis. Here’s a quick guide:

  1. Connect to DuckDB: Launch DuckDB from your terminal by running:

    duckdb
    
  2. Create Secrets Configuration: Securely store your S3 credentials in DuckDB using:

    CREATE SECRET secret1 ( TYPE S3, KEY_ID 'your-access-key-id', SECRET 'your-secret-access-key', REGION 'your-region' );

  3. Query S3 Data: Use the stored secret to query data directly from an S3 URI:

    SELECT * FROM read_parquet('r2://your-bucket/your-file.parquet');

References

https://duckdb.org/docs/extensions/httpfs/s3api

https://github.com/davidgasquez/awesome-duckdb?tab=readme-ov-file