如何处理云上的大量存储(或其他方式?)
我编写了一个进行视频编码的应用程序。编码是一个管道化过程:首先获取视频,然后使用 ffmpeg 对其进行编码,然后将视频分割成多个部分,等等。
在此过程中,1 GB 的视频会膨胀为几 GB 的中间数据。编写此服务是为了让不同的程序(通过 RabbitMQ)可以处理管道的每个部分。当然,该过程不必以这种方式运行,这引出了我的问题。
我正在研究使应用程序“上线”的存储要求。对于云提供商,您可以按 GB 存储和每 GB 传输付费。到目前为止,一切都很好。
当我将此 1 GB 视频 blob 从一个云 VM 实例传输到另一个云 VM 实例,或从 VM 传输到公共存储服务时,这是否会占用我的带宽? (我意识到这个答案会根据主机的服务条款而改变。)
让 1 个虚拟机执行整个过程,然后启动多个实例是否更有意义?与 1 个虚拟机仅在管道中执行一项任务相反?我从优化成本的角度提出这个问题(最低的存储成本、最低的虚拟机启动成本。因为编码将批量进行,所以我不太关心快速推出请求)。
这种情况有点独特,因为我有大量的二进制数据,无法有效地存储在数据库中。这就提出了一个类似的问题:对于有经验的人来说,当您的数据库虚拟机将其结果发送回您的网络应用程序时,您是否需要为中间传输付费?
我是否提出了正确的问题?除了致电托管提供商并询问他们如何定价之外,是否有我应该阅读的指南?
I have written an application which does video encoding. The encoding is a pipelined process: first you fetch the video, then you encode it using ffmpeg, then you split the video into multiple parts, etc.
During the course of this, a 1 GB video balloons into several GB of intermediate data. This service is written so that a different program (via RabbitMQ) can handle each piece of the pipeline. Of course, the process doesn't have to run this way, which brings me to my question.
I'm looking at storage requirements for making the app "live". With cloud providers, you pay per GB of storage and per GB of transfer. So far so good.
When I transfer this 1 GB video blob from one cloud VM instance to another, or from the VM to the common storage service, does that count against my bandwidth? (I realize this answer will change depending on the host's terms of service.)
Would it make more sense to have 1 VM perform the entire process, and then spin up multiple instances of that? As opposed to 1 VM only performing a single task in the pipeline? I ask this question in terms of optimizing for cost (lowest storage cost, lowest cost of spinning up VMs. Because the encoding will happen in batch, I am less concerned about pushing out requests quickly).
This scenario is a little bit unique in that I have huge amounts of binary data which cannot be stored efficiently in, say, a database. Which raises a similar question: for those with experience, when your DB VM sends its results back to your web app, are you charged for that intermediate transfer?
Am I even asking the right questions? Is there a guide that I should read, short of calling hosting providers and asking them about pricing myself?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我想说,你的场景的独特性使它相当有趣!
关于在云中的虚拟机之间传输数据,这取决于提供商和位置。 例如,Amazon,在 EC2 中,不对同一位置上的 Web 服务之间的传输数据收费。因此,您可以最大限度地降低“大量二进制数据”的初始上传/下载之前的传输成本。
现在,您的任务可以有效地并行化吗?如果是,请考虑同时启动大量虚拟机以更快地完成工作。如果时间=金钱,这肯定是具有成本效益的,但我对你的情况不太情愿,因为你提到你不太关心快速推动变革。您仍然可以使用主虚拟机来处理请求和协调批次,并启动/关闭其他将处理部分工作负载的虚拟机。只要您的虚拟机正在运行,您就需要付费,就像实用程序一样。
在您的场景中,好的一点是,这些批处理任务对于云计算来说是理想,而且它们的定价模型非常简单。此类任务是资源密集型的(CPU / RAM),因此它们的“贪婪”可以通过云可以提供的几乎无限的资源来满足。
The uniqueness of your scenario makes it rather interesting I'd say!
About transferring data between Virtual Machines in the cloud, that depends on the provider and the locations. Amazon for example, in EC2, does not charge data for transfers between Web Services on the same location. So, you can minimize your transferring costs up to the initial upload/download of your "big bunch of binary data".
Now, can your task be parallelized efficiently? If yes, consider spinning up lots of VMs at the same time to get the job done faster. This is cost effective for sure if time = money, but I am reluctant about your case, because you mention that you are less concerned for pushing changes quickly. You can still have a main VM handling requests and coordinating batches, and startup-shutdown other VMs that will handle some of the work load. You are paying as long as your VM is running, like an utility.
The good thing in your scenario, is that these kind of batch tasks are ideal for cloud computing, and their pricing model is pretty much straightforward. Such tasks are resource intensive (CPU / RAM) so their "greediness" can be satisfied by the virtually unlimited resources a cloud can offer.