在 Google App Engine 上存储之前压缩数据
我正在尝试将 30 秒的用户 mp3 录音作为 Blob 存储在我的应用程序引擎数据存储中。但是,为了启用此功能(App Engine 每次上传的大小限制为 1MB)并降低成本,我想在上传之前压缩文件,并在每次请求时解压缩文件。你建议我如何完成这个(它可以通过任务队列在后台发生,但有效的解决方案总是好的)
基于我自己的测试和研究 - 我看到两种可能的方法来完成这个
- Zlib
为此我需要使用 While 循环一次压缩一定数量的块。但是,App Engine 不允许您写入文件系统。 解压缩内容时,我没有运气使用这种方法从
- 我考虑过使用临时文件来完成此任务,但在尝试从临时文件Gzip
网上阅读,似乎应用程序引擎 url 获取功能请求已经 gzip 压缩的内容,然后将其解压。有没有办法阻止该功能解压缩内容,以便我可以将其以 gzipped 格式放入数据存储中,然后在需要按需播放给用户时解压缩?
让我知道您建议如何使用 zlib 或 gzip 或其他一些解决方案来实现此目的。谢谢
I im trying to store 30 second user mp3 recordings as Blobs in my app engine data store. However, in order to enable this feature (App Engine has a 1MB limit per upload) and to keep the costs down I would like to compress the file before upload and decompress the file every time it is requested. How would you suggest I accomplish this (It can happen in the background by the way via a task queue but an efficient solution is always good)
Based on my own tests and research - I see two possible approaches to accomplish this
- Zlib
For this I need to compress a certain number of blocks at a time using a While loop. However, App Engine doesnt allow you to write to the file system. I thought about using a Temporary File to accomplish this but I havent had luck with this approach when trying to decompress the content from a Temporary File
- Gzip
From reading around the web, it appears that the app engine url fetch function requests content gzipped already and then decompresses it. Is there a way to stop the function from decompressing the content so that I can just put it in the datastore in gzipped format and then decompress it when I need to play it back to a user on demand?
Let me know how you would suggest using zlib or gzip or some other solution to accmoplish this. Thanks
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
“上传之前压缩”意味着在用户的浏览器中进行压缩 - 但您的问题中没有任何文本解决这个问题!这似乎与 GAE 应用程序中的压缩有关,当然数据只会在上传后出现。您可以使用 Firefox 扩展(或其他浏览器的等效项)来完成此操作,如果您可以开发这些扩展并说服您的用户安装它们,但这与 GAE 没有太大关系!-)更不用说,正如 @RageZ 的评论提到,MP3 本质上已经被压缩了,所以几乎没有什么好处(尽管也许您可以再次使用用户的浏览器扩展来降低 MP3 的比特率,从而降低文件的尺寸,这可能会影响音频质量,取决于您对这些音频文件的预期用途)。
因此,总的来说,我必须赞同 @jldupont 的建议(也在评论中)——使用不同的服务器来存储大文件(Amazon 的 S3 服务肯定是一种可能性,但不是唯一的)。
"Compressing before upload" implies doing it in the user's browser -- but no text in your question addresses that! It seems to be about compression in your GAE app, where of course the data will only be after the upload. You could do it with a Firefox extension (or other browsers' equivalents), if you can develop those and convince your users to install them, but that has nothing much to do with GAE!-) Not to mention that, as @RageZ's comment mentions, MP3 is, essentially, already compressed, so there's little or nothing to gain (though maybe you could, again with a browser extension for the user, reduce the MP3's bit rate and thus the file's dimension, that could impact the audio quality, depending on your intended use for those audio files).
So, overall, I have to second @jldupont's suggestion (also in a comment) -- use a different server for storage of large files (S3, Amazon's offering, is surely a possibility though not the only one).
虽然通过标准压缩或以较低比特率重新编码来压缩 MP3 文件的技术限制(在其他答案中提到)是正确的,但您的目标是存储 30 秒 MP3 编码数据。假设您可以对用户强制执行此操作,如果 MP3 比特率为 256kbit 恒定比特率 (CBR) 或更低,则无需应用额外的压缩技术就可以了。在 256kbit CBR 下,需要 30 秒的音频:
最大标准比特率为 320kbit,相当于 1.14MB,因此您必须使用 256 或更少。最常用的比特率是 128kbits。
还有一些额外的开销会增加最终文件的大小,例如 ID3 标签和框架,但应该没问题。如果不是,请降至 224kbits 作为最大值(30 秒 = 0.80MB)。还有其他复杂性,例如可变比特率编码,文件大小不太可预测,我忽略这些。
因此,您的问题不再是如何压缩 MP3 文件,而是如何确保您的用户知道他们不能上传以 256kbits CBR 编码的超过 30 秒的文件,以及如何执行该策略。
While the technical limitations (mentioned in other answers) of compressing MP3 files via standard compression or reencoding at a lower bitrate are correct, your aim is to store 30 seconds of MP3 encoded data. Assuming that you can enforce that on your users, you should be alright without applying additional compression techniques if the MP3 bitrate is 256kbit constant bitrate (CBR) or lower. At 256kbit CBR, 30 seconds of audio would require:
The maximum standard bitrate is 320kbit which equates to 1.14MB, so you'd have to use 256 or less. The most commonly used bitrate in the wild is 128kbits.
There are additional overheads that will increase the final file size such as ID3 tags and framing, but you should be OK. If not, drop down to 224kbits as your maximum (30 secs = 0.80MB). There are other complexities such as variable bit rate encoding for which the file size is not so predictable and I am ignoring these.
So your problem is no longer how to compress MP3 files, but how to ensure that your users are aware that they can not upload more than 30 seconds encoded at 256kbits CBR, and how to enforce that policy.
您可以尝试新的 Blobstore API,允许存储和提供高达 50MB 的文件
http://www.cloudave.com/link/the-new-google-app-engine-blobstore-api-first-thoughts
http://code.google.com/appengine/docs/python/blobstore/
http://code.google.com/appengine/docs/java/blobstore/
You could try the new Blobstore API allowing the storage and serving of files up to 50MB
http://www.cloudave.com/link/the-new-google-app-engine-blobstore-api-first-thoughts
http://code.google.com/appengine/docs/python/blobstore/
http://code.google.com/appengine/docs/java/blobstore/
正如 Aneto 在评论中提到的,您将无法使用 gzip 或 zlib 等标准压缩库来压缩 MP3 数据。但是,您可以使用 LAME 以低得多的比特率重新编码 MP3 。
As Aneto mentions in a comment, you will not be able to compress MP3 data with a standard compression library like gzip or zlib. However, you could reencode the MP3 at a MUCH lower bitrate, possible with LAME.
您最多可以存储 10Mb 的 Blob 列表。搜索
google 文件服务
。在我看来,它比 BlobStore 更通用,因为我昨天才开始使用 BlobStore Api,而且我仍在弄清楚是否可以按字节访问数据.. 就像将 doc 更改为 pdf、jpeg 更改为 gif 一样。
您可以1Mb * 10 = 10 Mb 的存储 Blob(我认为最大实体大小),或者您可以使用 BlobStore API 并获得相同的 10Mb 或如果启用计费则获得 50Mb(您可以启用它,但如果您没有通过免费配额)你不付钱)。
You can store up to 10Mb with a list of Blobs. Search for
google file service
.It's much more versatile than BlobStore in my opinion, since I just started using BlobStore Api yesterday and I'm still figuring out if it is possible to access the data bytewise.. as in changing doc to pdf, jpeg to gif..
You can storage Blobs of 1Mb * 10 = 10 Mb (max entity size I think), or you can use BlobStore API and get the same 10Mb or get 50Mb if you enable billing (you can enable it but if you don't pass the free quota you don't pay).