Categories
amazon-kinesis-firehose amazon-s3 amazon-web-services json parquet

Write parquet from AWS Kinesis firehose to AWS S3

I would like to ingest data into s3 from kinesis firehose formatted as parquet. So far I have just find a solution that implies creating an EMR, but I am looking for something cheaper and faster like store the received json as parquet directly from firehose or use a Lambda function.

Thank you very much,
Javi.

Good news, this feature was released today!

Amazon Kinesis Data Firehose can convert the format of your input data
from JSON to Apache Parquet or Apache ORC before storing the data in
Amazon S3. Parquet and ORC are columnar data formats that save space
and enable faster queries

To enable, go to your Firehose stream and click Edit. You should see Record format conversion section as on screenshot below:

enter image description here

See the documentation for details: https://docs.aws.amazon.com/firehose/latest/dev/record-format-conversion.html