S3 到 EC2 获取大量小文件的性能

发布于 2024-07-18 04:37:31 字数 313 浏览 2 评论 0原文

我有大量大小为 1kB 的数据块(大约数亿),并且需要一种方法来存储和查询这些数据块。 数据块被添加,但从未被删除或更新。 我们的服务部署在S3、EC2平台上。

我知道 Amazon SimpleDB 的存在,但我想要一个与平台无关的解决方案(例如,以防我们需要迁移出 AWS)。

所以我的问题是,这两个存储和检索数据块的选项的优缺点是什么。 性能比较如何?

  • 将数据块作为文件存储在 S3 上,并在需要时获取它们
  • 将数据块存储在 MySQL 服务器集群上

会有那么大的性能差异吗?

I have a large collection of data chunks sized 1kB (in the order of several hundred million), and need a way to store and query these data chunks. The data chunks are added, but never deleted or updated. Our service is deployed on the S3, EC2 platform.

I know Amazon SimpleDB exists, but I want a solution that is platform agnostic (in case we need to move out of AWS for example).

So my question is, what are the pro's and con's of these two options for storing and retrieving data chunks. How would the performance compare?

  • Store the data chunks as files on S3 and GET them when needed
  • Store the data chunks on a MySQL Server cluster

Would there be that much of a performance difference?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

浮萍、无处依 2024-07-25 04:37:31

我尝试使用 S3 作为一种“数据库”,使用微小的 XML 文件来保存结构化数据对象,并依靠 S3“键”来查找这些对象。

即使从 EC2 来看,性能也是不可接受的 - S3 的延迟太高了。

即使有如此多的记录,在 EBS 设备上运行 MySQL 也会快一个数量级。

I tried using S3 as a sort of "database" using tiny XML files to hold my structured data objects, and relying on the S3 "keys" to look up these objects.

The performance was unacceptable, even from EC2 - the latency to S3 is just too high.

Running MySQL on an EBS device will be an order of magnitude faster, even with so many records.

舂唻埖巳落 2024-07-25 04:37:31

您是否需要直接向应用程序的用户提供对这些数据块的访问? 如果不是,那么 S3 和 HTTP GET 请求就太过分了。 还要记住,S3 是一项安全服务,每个 GET 请求(仅 1KB 数据)的开销将相当大。

MySQL 服务器集群是一个更好的主意,但要在 EC2 中运行,您需要使用弹性块存储。 最后,不排除SimpleDB。 这也许是解决您问题的最佳方案。 仔细设计您的系统,您将来将能够轻松迁移到其他数据库系统(分布式或关系型)。

Do you need to provide access to these data chunks directly to the users of your application? If not, then S3 and HTTP GET requests are an overkill. Having also in mind that S3 is a secured service, the overhead for every GET request (for just 1KB of data) will be considerably large.

MySQL server cluster would be a better idea, but to run in EC2 you need to employ Elastic Block Storage. Finally, do not rule out SimpleDB. It is perhaps the best solution for your problem. Design your system carefully and you would be able to easily migrate in other database systems (distributed or relational) in the future.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文