我将需要100或1000的微小PDF文件,我需要将其拉动到一个大型ZIP文件中并上传到S3。我目前的解决方案如下:
-
nodejs服务发送请求,其中包含我需要创建的所有PDF文件的JSON数据,然后zip缩回到lambda函数
lambda函数处理数据,将每个PDF文件创建为缓冲区,将缓冲区推入Zip Archiver,最终确定存档,然后最终Zip Archive Archive Archive Archive Archive Archive使用块中的Passhroughstream流到S3。
我基本上复制了以下解决方案。
现在尽管这是一个有效的解决方案,但它不可扩展,并且所有创建PDF buffer,归档zip并上传到S3中发生在单个lambda执行中,该执行时间为20-30秒或更多,取决于20-30秒最终存档的邮政编码文件的大小。我已经设置了具有10GB内存的lambda和最大15分钟的超时。因为在每100MB的ZIP中,它需要1GB的资源,否则由于使用的最大资源而耗尽。我的邮政编码可能是800MB或更多,这意味着它需要8GB内存或更多。
我想使用AWS Multipart上传,并以某种方式调用多个并行Lambda函数来实现这一目标。如果我必须将创建PDF缓冲区的创建,拉链和S3上传到其他lambdas,那就可以了。但是我需要以某种方式优化它并使其平均运行。
我看到了这篇文章的答案,其中包含一些不错的细节和示例,但似乎是一个大文件。
stream> stream> stream,然后从AWS lambda node.js < /a>
我可以优化这个?任何想法和建议都会很棒。请记住,最终结果需要是一个大型ZIP文件。谢谢
I'll have 100's or 1000's of tiny pdf files that I'll need to zip into one big zip file and upload to S3. My currently solution is as follows:
-
NodeJS service send request with JSON data of all the pdf files I need to create and zip to a Lambda function
-
Lambda function processes data, creates each pdf file as buffer, pushes buffer into zip archiver, finalizes archive and then finally zip archive is streamed using PassThroughStream in chunks to S3.
I basically copied the below solution.
https://gist.github.com/amiantos/16bacc9ed742c91151fcf1a41012445e?permalink_comment_id=3804034#gistcomment-3804034
Now although this is a working solution, its not scalable and all of creating pdf buffer, archiving zip and upload to S3 happens in a single lambda execution which takes 20-30 seconds or more depending on the size of the final archived zip file. I have set the Lambda with 10GB memory and max 15 min timeout. Because for every 100MB of zip it requires 1GB of resources otherwise it times out due to max resources used. My zip could be 800MB or more sometimes which means it will require 8GB memory or more.
I want to use AWS multipart upload and somehow invoke multiple parallel lambda functions to achieve this. Its fine if I have to separate the creating the pdf buffers, zipping and s3 uploading into other lambdas. But I need to somehow optimize this and make it run parallely.
I see this post's answer with some nice details and example but it seems to be for a single large file.
Stream and zip to S3 from AWS Lambda Node.JS
https://gist.github.com/vsetka/6504d03bfedc91d4e4903f5229ab358c
Any way I can optimize this? Any ideas and suggestions would be great. Keep in mind the end result needs to be one big zip file. Thanks
发布评论