生产中的超大型 Mnesia 表
我们使用 Mnesia 作为一个非常大的系统的主数据库。 Mnesia Fragmented Tables 在测试期间表现良好。系统有大约 15 个表,每个表跨 2 个站点(节点)复制,并且每个表都高度碎片化。在测试阶段(重点关注可用性、效率和负载测试),我们接受了 Mnesia 及其复杂结构的许多优势,因为我们在该服务之上运行的所有应用程序都是 Erlang/OTP 应用程序。我们正在运行 Yaws 1.91 作为主 Web 服务器。
为了有效地配置碎片表,我们参考了许多在大型系统中使用过 mnesia 的参考资料:
它们是:Mnesia 一年后博客 ,博客的第 2 部分,即使在这里也遵循它,关于哈希。这些博客文章帮助我们到处进行微调以获得更好的性能。
现在,问题来了。 Mnesia 有表格大小限制,是的,我们同意。然而,任何地方都没有提到对碎片数量的限制。出于性能原因,并为了满足大数据的需要,大约有多少片段可以让 mnesia 保持“正常”?
在我们的一些表中,我们有 64 个片段。将 n_disc_only_copies 设置为集群中的节点数,以便每个节点的每个片段都有一个副本。这帮助我们解决了如果给定节点瞬间无法访问时 mnesia 写入失败的问题。同样在上面的博客中,他建议片段的数量应该是 2 的幂
,这个说法(他说)是根据 mnesia 对记录进行哈希处理的方式进行调查的。然而,我们需要对此进行更多解释,以及这里讨论的是哪个二的幂:2,4,16,32,64,128,...?
该系统旨在在 HP Proliant G6 上运行,包含 Intel 处理器(2 个处理器,每个 4 个内核,每个内核速度为 2.4 GHz,8 MB 缓存大小)、20 GB RAM 大小、1.5 TB 磁盘空间。现在,我们可以使用 2 台这样的大功率机器。系统数据库应在两者之间复制。每台服务器都运行 Solaris 10 64 位。
在碎片数量达到多少时,mnesia 的性能可能会开始下降?如果我们将给定表的碎片数量从 64 个增加到 128 个,可以吗? 65536 个片段 (2 ^ 16) 怎么样?我们如何通过碎片扩展我们的 mnesia 以利用 TB 空间?
请务必提供问题的答案,并且您可以就可能增强系统的任何其他参数提供建议。
注意:所有要保存数百万条记录的表都是以 disc_only_copies
类型创建的,因此不存在 RAM 问题。 RAM 对于我们运行的几个 RAM 表来说已经足够了。其他 DBMS(例如 MySQL Cluster 和 CouchDB)也将包含数据,并与我们的 Mnesia DBMS 使用相同的硬件。 MySQL 集群在两台服务器之间进行复制(每台服务器拥有两个 NDB 节点,一个 MySQL 服务器),管理节点位于不同的主机上。
We are using Mnesia as a primary Database for a very large system. Mnesia Fragmented Tables have behaved so well over the testing period. System has got about 15 tables, each replicated across 2 sites (nodes), and each table is highly fragmented. During the testing phase, (which focused on availability, efficiency and load tests), we accepted the Mnesia with its many advantages of complex structures will do for us, given that all our applications running on top of the service are Erlang/OTP apps. We are running Yaws 1.91 as the main WebServer.
For efficiently configuring Fragmented Tables, we used a number of references who have used mnesia in large systems:
These are: Mnesia One Year Later Blog, Part 2 of the Blog, Followed it even here, About Hashing. These blog posts have helped us fine tune here and there to a better performance.
Now, the problem. Mnesia has table size limits, yes we agree. However, limits on number of fragments have not been mentioned anywhere. For performance reasons, and to cater for large data, about how many fragments would keep mnesia "okay" ?.
In some of our tables, we have 64 fragments. with n_disc_only_copies
set to the number of nodes in the cluster so that each node has a copy per fragment. This has helped us solve issues of mnesia write failure if a given node is out of reach at an instant. Also in the blog above, he suggests that the number of fragments should be a power of 2
, this statement (he says) was investigated from the way mnesia does its hashing of records. We however need more explanation on this, and which power of two are being talked about here: 2,4,16,32,64,128,...?
The system is intended to run on HP Proliant G6, containing Intel processors (2 processors, each 4 cores, 2.4 GHz speed each core, 8 MB Cache size), 20 GB RAM size, 1.5 Terabytes disk space. Now, 2 of these high power machines are in our disposal. System Database should be replicated across the two. Each server runs Solaris 10, 64 bit.
At what number of fragments may mnesia's performance start to de-grade? Is it okay if we increase the number of fragments from 64 to 128 for a given table? how about 65536 fragments (2 ^ 16) ? How do we scale out our mnesia to make use of the Terabyte space by using fragmentation?
Please do provide the answers to the questions and you may provide advice on any other parameters that may enhance the System.
NOTE: All tables that are to hold millions of records are created in disc_only_copies
type, so no RAM problems. The RAM will be enough for the few RAM Tables we run. Other DBMS like MySQL Cluster and CouchDB will also contain data and are using the same hardware with our Mnesia DBMS. MySQL Cluster is replicated across the two servers (each holding two NDB Nodes, a MySQL server), the Management Node being on a different HOST.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
具有两倍数量的碎片的幂的提示与默认碎片模块 mnesia_frag 使用线性散列这一事实相关,因此使用 2^n 碎片可确保记录均匀分布(或多或少,显然) 片段之间。
关于可用的硬件,更多的是性能测试的问题。
降低性能的因素有很多,配置像 Mnesia 这样的数据库只是一般问题的一部分。
我只是建议您对一台服务器进行压力测试,然后在两台服务器上测试算法,以了解它是否可以正确扩展。
谈论Mnesia片段数量缩放记住,通过使用disc_only_copies,大部分时间都花在两个操作上:
决定哪个片段保存哪个记录
从相应的dets表中检索记录(Mnesia 后端)
第一个并不真正依赖于默认情况下 Mnesia 使用线性哈希的片段数量。
第二个因素与硬盘延迟的关系大于其他因素。
最后,一个好的解决方案可能是拥有更多的片段和每个片段更少的记录,但同时尝试找到中间立场,并且不失去某些硬盘性能提升(如缓冲区和高速缓存)的优势。
The hint of having a power of two number of fragments is simply related to the fact the default fragmentation module
mnesia_frag
uses linear hashing so using 2^n fragments assures that records are equally distributed (more or less, obviously) between fragments.Regarding the hardware at disposal, it's more a matter of performance testing.
The factors that can reduce performance are many and configuring a database like Mnesia is just one single part of the general problem.
I simply advice you to stress test one server and then test the algorithm on both servers to understand if it scales correctly.
Talking about Mnesia fragments number scaling remember that by using disc_only_copies most of the time is spent in two operations:
decide which fragment holds which record
retrieve the record from corresponding dets table (Mnesia backend)
The first one is not really dependent from the number of fragments considered that by default Mnesia uses linear hashing.
The second one is more related to hard disk latency than to other factors.
In the end a good solution could be to have more fragments and less records per fragment but trying at the same time to find the middle ground and not lose the advantages of some hard disk performance boosts like buffers and caches.