我可以使用 Terracotta 来扩展 RAM 密集型应用程序吗?

发布于 2024-07-05 01:38:47 字数 362 浏览 8 评论 0原文

我正在评估 Terracotta 来帮助我扩展当前受 RAM 限制的应用程序。 它是一个协作过滤器,每个用户存储大约 2 KB 的数据。 我想使用 Amazon 的 EC2,这意味着我的 RAM 被限制为 14GB,这为我提供了大约 700 万用户的有效每服务器上限。 我需要能够超越这个范围。

根据我到目前为止的阅读,我认为 Terracotta 可以拥有比每个服务器上的可用 RAM 更大的集群堆。 拥有 30GB 或更多的有效集群堆是否可行,而每台服务器仅支持 14GB?

每个用户的数据(其中大部分是浮点数组)变化非常频繁,可能每分钟数十万次。 这些更改中的每一项都不需要在发生时同步到集群中的其他节点。 是否可以只定期同步某些对象字段?

I'm evaluating Terracotta to help me scale up an application which is currently RAM-bounded. It is a collaborative filter and stores about 2 kilobytes of data per-user. I want to use Amazon's EC2, which means I'm limited to 14GB of RAM, which gives me an effective per-server upper-bound of around 7 million users. I need to be able to scale beyond this.

Based on my reading so-far I gather that Terracotta can have a clustered heap larger than the available RAM on each server. Would it be viable to have an effective clustered heap of 30GB or more, where each of the servers only supports 14GB?

The per-user data (the bulk of which are arrays of floats) changes very frequently, potentially hundreds of thousands of times per minute. It isn't necessary for every single one of these changes to be synchronized to other nodes in the cluster the moment they occur. Is it possible to only synchronize some object fields periodically?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

听你说爱我 2024-07-12 01:38:47

我想说,对此的答案是肯定的。 Terracotta 确实允许您使用大于单个 JVM 大小的集群堆,尽管这不是最常见的用例。

您仍然需要记住 a) 工作集大小和 b) 数据流量。 对于a),内存中必须存在一些数据集才能在任何给定时间执行工作,并且如果该工作集大小> 1。 堆大小,性能显然会受到影响。 对于b),聚簇堆中添加/更新的每条数据都必须发送到服务器。 当您更改 pojo 图中的细粒度字段时,Terracotta 是最好的选择。 使用大数组并不能充分利用 Terracotta 功能(这并不是说人们有时不那样使用它)。

如果您创建了大量垃圾,那么 Terracotta 内存管理器和分布式垃圾收集器必须能够跟上。 如果不尝试,很难说您的数据量是否超出了可用带宽。

如果您运行多个服务器并且数据按服务器分区或具有一定量的引用位置,您的应用程序将受益匪浅。 在这种情况下,您只需要堆中一台服务器分区的数据,其余的不需要故障到内存中。 如果其他服务器出现故障,如果需要故障转移/可用性,它当然会出现故障。 这意味着,在分区数据的情况下,您不会向所有节点广播,而仅将事务发送到服务器。

从数字的角度来看,可以索引 30GB 的数据,因此这还没有接近任何硬限制。

I'd say the answer is a qualified yes for this. Terracotta does allow you to work with clustered heaps larger than the size of a single JVM although that's not the most common use case.

You still need to keep in mind a) the working set size and b) the amount of data traffic. For a), there is some set of data that must be in memory to perform the work at any given time and if that working set size > heap size, performance will obviously suffer. For b), each piece of data added/updated in the clustered heap must be sent to the server. Terracotta is best when you are changing fine-grained fields in pojo graphs. Working with big arrays does not take the best advantage of the Terracotta capabilities (which is not to say that people don't use it that way sometimes).

If you are creating a lot of garbage, then the Terracotta memory managers and distributed garbage collector has to be able to keep up with that. It's hard to say without trying it whether your data volumes exceed the available bandwidth there.

Your application will benefit enormously if you run multiple servers and data is partitioned by server or has some amount of locality of reference. In that case, you only need the data for one server's partition in heap and the rest does not need to be faulted into memory. It will of course be faulted if necessary for failover/availability if other servers go down. What this means is that in the case of partitioned data, you are not broadcasting to all nodes, only sending transactions to the server.

From a numbers point of view, it is possible to index 30GB of data, so that's not close to any hard limit.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文