Exporting Data from RDS Postgres to S3 with NodeJs & Lambda
This article provides an approach to streaming query output from RDS Postgres to an S3 CSV file via a NodeJs Lambda function.
Alternatives to removing the function’s dependency on
pg-copy-streams were considered. These options and associated limitations will be briefly discussed.
A sample Lambda function is also included to assist in assessing the validity of the solution for a given use case.
The following alternatives to using
pg-copy-streams were considered but were disqualified for a variety of reasons.
“Streamless” Approach using Lambda Filesystem/Memory
In this scenario, interim data would be pre-processed/parsed in memory, with final CSV file saved to the Lambda’s local file system
/tmp before uploading to S3.
Developers with limited NodeJs/streaming experience may find this approach easier to work with despite its limitations. Regular review of memory consumption/allocation would be required to account for fluctuations in data volumes. Lambda’s
/tmp storage also has limit of 512MB. These factors, along with the performance overhead of in-memory parsing would only be suited for exports where projected growth in volumes is expected to be minimal.
AWS RDS PostgresQL Extension (aws_s3)
The aws_s3 RDS Postgres extension can be used to export to S3 via function
aws_s3.query_export_to_s3, however, the extension is only supported for PG database engines 10.14, 11.9, 12.4 and later. Engines running versions < 10.14 are not supported.
Proposed Solution — Limitations
Before deciding to adopt the outlined approach, assess whether the components/services used would fulfil processing/capacity needs of your requirements. For example, if the RDS query submitted by the Lambda function has an expected execution time of 30 mins, then Lambda may not be an appropriate service, given that timeout quota of 15 mins applies to functions. Refer to the list of quotas before making the final decision.
If applicable, consider implications of scaling and concurrency in context with RDS/database performance. Lambdas run in their own isolated environment. Multiple invocations of the function in response to scaling have the potential to overwhelm the database with connection requests, leading to performance degradation. To avoid this scenario, consider using RDS Proxy for connection pooling.
Sample Lambda Function
The function below streams output of query results from RDS Postgres to a CSV file in S3.
Source code assumes the following (mock values):
- Postgres Export Query:
select * from public.pg_sample_tbl
- RDS Endpoint:
- Database Name:
- Database Credentials: Note, these should be stored in secretsmanager and retrieved at runtime
- S3 Bucket for CSV Export:
- CSV Filename:
Lambda Source Code
Deployment and Testing — AWS Cloud
Separate knowledge articles will be posted in due course to cover environment creation and function deployment/testing using the AWS Free tier.
For now, we’ll assume this environment exists.
Prior to deployment, update mock values to suit the target environment.
Typical Deployment Overview
High level overview of deployment steps and components required are as follows:
- Create S3 bucket (location for CSV exports)
- Create Lambda zip deployment package
- Create new IAM Lambda Execution Role, with trust relationship allowing Lambda service to assume this role
- Create new IAM policy with permissions to resources accessed by the function
- Attach this policy to the Execution Role
- During function creation, specify the Execution Role and path to the zip package
To evaluate performance of the function, populate the source Postgres table with test data to reflect expected volumes. Include iterations to cover longer term data volume forecasts.
During the testing phase, relevant performance/resource metrics for function invocations should be checked to ensure values are acceptable when reviewed against AWS service quotas.
Several factors influence selection of services/languages used for a particular solution.
In summary, the decision to use the approach described should factor in relevant AWS service quotas and expected data volumes.
The Postgres to S3 export extension/method,
aws_s3.query_export_to_s3, is an option worth exploring, provided the database engine supports this feature.