相干拓扑建议
要缓存的数据:
- 100 Gb 数据
- 大小为 500-5000 字节的对象
- 平均每分钟更新/插入 1000 个对象(峰值 5000)
需要生产和测试中的一致性拓扑建议(与备份一起分发)
- 服务器节点数
- 每个服务器
- 堆大小的 每个节点
问题
- 与缓存数据使用的内存相比,每个节点需要多少可用内存(假设 100% 使用率是不可能的)
- 会产生多少开销每个缓存元素生成 1-2 个附加索引?
我们不知道将执行多少个读取操作,该解决方案将由低响应时间至关重要(不仅仅是数据一致性)的客户端使用,并且取决于每个用例。缓存将从数据库通过以固定频率轮询并填充缓存来更新(因为缓存是数据主控,而不是使用缓存的系统)。
Data to be cached:
- 100 Gb data
- Objects of size 500-5000 bytes
- 1000 objects updated/inserted in average per minute (peak 5000)
Need suggestion for Coherence topology in production and test (distributed with backup)
- number of servers
- nodes per server
- heap size per node
Questions
- How much free available memory is needed per node compared to memory used by cached data (assume 100% usage is not possible)
- How much overhead will 1-2 additional indexes per cache element generate?
We do not know how many read operations will be done, the solution will be used by clients where low response times are critical (more than data consistency) and depend on each use-case. The cache will be updated from DB by polling at a fixed frequency and populating the cache (since cache is data master, not the system using the cache).
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
为 Coherence 调整 JVM 大小的经验法则是,假设进行 1 次备份,数据为堆的 1/3:1/3 用于缓存数据,1/3 用于备份,1/3 用于索引和开销。
调整大小的最大困难是没有好的方法来估计索引大小。您必须尝试使用真实世界的数据并进行测量。
JDK 1.6 JVM 的经验法则是从 4GB 堆开始,因此您需要 75 个缓存服务器节点。有些人已经成功地使用了更大的堆(16GB),因此值得尝试。对于大堆(例如 16GB),您不需要 1/3 的开销,并且可以容纳超过 1/3 的数据。对于大于 16GB 的堆,垃圾收集器调整变得至关重要。
为了获得最佳性能,每个节点应该有 1 个核心。
服务器计算机的数量取决于可管理性、容量(核心和内存)和故障的实际限制。例如,即使您有一台可以处理 32 个节点的服务器,当一台机器发生故障时,您的集群会发生什么情况?集群将是机器安全的(备份不在同一台机器上),但考虑到需要将大量数据移动到新备份,恢复速度会非常慢。另一方面,75 台机器很难管理。
我发现 Coherence 对于 1K 对象放置(包括网络跃点和备份)的延迟为 250 微秒(不是毫秒)。因此,您正在寻找的插入和更新的数量应该是可以实现的。使用多线程插入/更新进行测试,并确保您的测试客户端不是瓶颈。
The rule of thumb for sizing a JVM for Coherence is that the data is 1/3 the heap assuming 1 backup: 1/3 for cache data, 1/3 for backup, and 1/3 for index and overhead.
The biggest difficulty in sizing is that there are no good ways to estimate index sizes. You have to try with real-world data and measure.
A rule of thumb for JDK 1.6 JVMs is start with 4GB heaps, so you would need 75 cache server nodes. Some people have been successful with much larger heaps (16GB), so it is worth experimenting. With large heaps (e.g, 16GB) you should not need as much as 1/3 for overhead and can hold more than 1/3 for data. With heaps greater than 16GB, garbage collector tuning becomes critical.
For maximum performance, you should have 1 core per node.
The number of server machines depends on practical limits of manageability, capacity (cores and memory), and failure. For example, even if you have a server that can handle 32 nodes, what happens to your cluster when a machine fails? The cluster will be machine safe (backups are not on the same machine) but recovery would be very slow given the massive amount of data to be moved to new backups. On the other hand 75 machines is hard to manage.
I've seen Coherence have latencies of 250 micro seconds (not millis) for a 1K object put, including network hops and backup. So, the number of inserts and updates you are looking for should be achievable. Test with multiple threads inserting/updating and make sure your test client is not the bottleneck.
还有一些“经验法则”:
1) 对于高可用性,三个节点是一个好的最低限度。
2) 使用Java 7,您可以使用更大的堆(例如27GB)和G1 垃圾收集器。
3) 对于 100GB 的数据,根据 David 的指南,您将需要总共 300GB 的堆。在具有 128GB 内存的服务器上,可以使用 3 台物理服务器来完成,每台物理服务器运行 4 个 JVM,每个 JVM 具有 27GB 堆(总共约 324GB)。
4) 索引内存使用量随数据类型和数量的不同而显着变化。最好使用具有代表性的数据集(带索引和不带索引)进行测试,以了解内存使用情况的差异。
A few more "rules of thumb":
1) For high availability, three nodes is a good minimum.
2) With Java 7, you can use larger heaps (e.g. 27GB) and the G1 garbage collector.
3) For 100GB of data, using David's guidelines, you will want 300GB total of heap. On servers with 128GB of memory, that could be done with 3 physical servers, each running 4 JVMs with 27GB heap each (~324GB total).
4) Index memory usage varies significantly with data type and arity. It is best to test with a representative data set, both with and without indexes, to see what the memory usage difference is.