2024-05-25
[Node.js] How to Handle Large Files on AWS S3
When handling large files with Node.js, it is essential to leverage stream processing to minimize memory usage.
By using stream processing, you can process data in chunks without loading the entire file into memory at once, enabling efficient and stable upload and download of large files.
About highWaterMark
in Node.js Stream Processing
In Node.js stream processing, you can specify a parameter called highWaterMark
. highWaterMark
sets the internal buffer size for the stream, specified in bytes. By default, it is set to 16KB, but increasing this value can improve performance.
Increasing the highWaterMark
value enhances read and write performance but also increases memory consumption. Therefore, it is necessary to set an appropriate value considering the system's resources. In this article, 256KB (256 * 1024 bytes) is used.
Steps
Prerequisites
- Node.js is installed
- You have an AWS account and an S3 bucket is created
Installing Required Packages
Use aws-sdk
and @aws-sdk/lib-storage
.
npm install aws-sdk @aws-sdk/lib-storage
Downloading Files
Use ReadStream for downloading from S3 and combine it with WriteStream to write the file.
const AWS = require('aws-sdk');
const fs = require('fs');
const path = require('path');
// S3 configuration
AWS.config.update({
accessKeyId: 'your_access_key_id',
secretAccessKey: 'your_secret_access_key',
region: 'your_aws_region'
});
const s3 = new AWS.S3();
const downloadFile = (s3Key, downloadFilePath) => {
const params = {
Bucket: 'your_s3_bucket_name',
Key: s3Key
};
const fileStream = fs.createWriteStream(downloadFilePath, { highWaterMark: 256 * 1024 }); // 256KB chunk size
return new Promise((resolve, reject) => {
s3.getObject(params).createReadStream({ highWaterMark: 256 * 1024 }) // 256KB chunk size
.on('error', (err) => {
console.error(err);
reject(err);
})
.pipe(fileStream)
.on('close', () => {
console.log('File downloaded: ' + downloadFilePath);
resolve();
});
});
};
downloadFile('file/key/in/s3', 'path/to/downloaded/file');
Uploading Files
When uploading, in addition to stream processing, using @aws-sdk/lib-storage
enables multipart upload (splitting a file into multiple parts for upload).
During multipart upload, you can specify the chunk size with partSize
and the concurrency level with queueSize
. Each part is uploaded independently, and once all parts are uploaded, S3 assembles the file.
Here are the specific steps for uploading:
const AWS = require('aws-sdk');
const { S3Client } = require('@aws-sdk/client-s3');
const { Upload } = require('@aws-sdk/lib-storage');
const fs = require('fs');
const path = require('path');
// S3 configuration
AWS.config.update({
accessKeyId: 'your_access_key_id',
secretAccessKey: 'your_secret_access_key',
region: 'your_aws_region'
});
const s3Client = new S3Client({ region: 'your_aws_region' });
const uploadFile = (filePath) => {
const uploadParams = {
Bucket: 'your_s3_bucket_name',
Key: path.basename(filePath),
Body: fs.createReadStream(filePath, { highWaterMark: 256 * 1024 }) // 256KB chunk size
};
return new Promise((resolve, reject) => {
const parallelUploads3 = new Upload({
client: s3Client,
params: uploadParams,
leavePartsOnError: false, // Abort multipart upload if any part fails
queueSize: 4, // Maximum number of concurrent uploads
partSize: 1024 * 1024 * 20 // 20MB per part
});
parallelUploads3.on('httpUploadProgress', (progress) => {
console.log(progress);
});
parallelUploads3.done().then(() => {
console.log('File uploaded: ' + uploadParams.Key);
resolve();
}).catch(err => {
console.error(err);
reject(err);
});
});
};
uploadFile('path/to/large/file');
Conclusion
By leveraging stream processing and multipart upload, it is possible to efficiently handle large files.
Adjust the various parameters according to your system resources to achieve efficient file processing.