Erlang远程过程调用模块内部结构
我在节点 A 上有几个 Erlang 应用程序,它们正在对节点 B 进行 rpc 调用,在节点 B 上我有 Mnesia 存储过程(数据库查询函数)和我的 Mnesia DB。现在,偶尔,对节点 B 进行 rpc 调用以获取数据的同时进程数可能会上升到 150 个。现在,我有几个问题:
问题 1: 对于对远程节点的每个 rpc 调用,节点 A 是否建立了一个全新的(例如 TCP/IP 或 UDP 连接或它们在传输中使用的任何连接)连接?或者只有一个连接,并且所有 rpc 调用共享此连接(因为节点 A 和节点 B 已连接[与该 epmd 进程有关])?
问题 2: 如果我在一个节点上有以数据为中心的应用程序,在另一个节点上有一个集中管理的 Mnesia 数据库,这些应用程序的表共享相同的模式,可以复制、分段、索引等,这是一个更好的选择:使用 rpc 调用来从数据节点获取数据到应用程序节点,或者使用 TCP/IP 开发一个全新的框架(Scalaris 的方式 伙计们这样做是为了他们的故障检测器)来解决网络延迟问题?
问题 3: 有没有人测试过或基准测试过 RPC 调用效率可以回答以下问题吗?
(a) 一个 Erlang 节点在不崩溃的情况下可以推送到另一个节点的同时 rpc 调用的最大数量是多少?
(b) 有没有办法通过系统配置或操作系统设置来增加这个数字? (请参阅答案中的 Open Solaris for x86)
(c) 除了 rpc 之外,应用程序是否还有其他方式可以从远程 Erlang 节点上运行的 Mnesia 请求数据? (例如 CORBA、REST [需要 HTTP 端到端]、Megaco、SOAP 等)
I have Several Erlang applications on Node A and they are making rpc calls to Node B onto which i have Mnesia Stored procedures (Database querying functions) and my Mnesia DB as well. Now, occasionally, the number of simultaneous processes making rpc calls to Node B for data can rise to 150. Now, i have several questions:
Question 1: For each rpc call to a remote Node, does Node A make a completely new (say TCP/IP or UDP connection or whatever they use at the transport) CONNECTION? or there is only one connection and all rpc calls share this one (since Node A and Node B are connected [got to do with that epmd process])?
Question 2: If i have data centric applications on one node and i have a centrally managed Mnesia Database on another and these Applications' tables share the same schema which may be replicated, fragmented, indexed e.t.c, which is a better option: to use rpc calls in order to fetch data from Data Nodes to Application nodes or to develope a whole new framework using say TCP/IP (the way Scalaris guys did it for their Failure detector) to cater for network latency problems?
Question 3: Has anyone out there ever tested or bench marked the rpc call efficiency in a way that can answer the following?
(a) What is the maximum number of simultaneous rpc calls an Erlang Node can push onto another without breaking down?
(b) Is there a way of increasing this number, either by a system configuration or operating system setting? (refer to Open Solaris for x86 in your answer)
(c) Is there any other way of applications to request data from Mnesia running on remote Erlang Nodes other than rpc? (say CORBA, REST [requires HTTP end-to-end], Megaco, SOAP e.t.c)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
Mnesia 运行在 Erlang 发行版上,在 Erlang 发行版中,任何一对节点之间都只有一个 tcp/ip 连接(通常采用全网状排列,因此每对节点都有一个连接)。所有 rpc/节点间通信都将通过此分发连接进行。
此外,还可以保证在分发过程中的任何一对通信进程之间保留消息顺序。未定义两个以上进程之间的顺序。
Mnesia 为您提供了多种数据放置选项。如果您希望在节点 B 上进行持久存储,但在节点 A 上完成处理,则可以在 B 上拥有表的 Disc_only_copies,在节点 A 上拥有 ram_copies。这样,节点 A 上的应用程序就可以快速访问数据,并且您仍然可以在节点 B 上获取持久副本。
我假设 A 和 B 之间的网络是一个可靠的 LAN,很少会发生分区(否则您将花费大量时间在分区后使 mnesia 重新上线)。
如果 A 和 B 都运行 mnesia,那么我会让 mnesia 为我完成所有 RPC - 这就是 mnesia 的构建目的,并且它有许多优化。如果没有充分的理由,我不会推出自己的 RPC 或分发机制。
至于基准测试,这完全取决于您的硬件、mnesia 架构和节点之间的网络(以及应用程序的数据访问模式)。没有人可以给你这些基准,你必须自己运行它们。
至于用于访问 mnesia 的其他 RPC 机制,我认为没有任何开箱即用的机制,但是您可以使用许多 RPC 库将 mnesia API 呈现给网络,而您只需付出少量的努力。
Mnesia runs over erlang distribution, and in Erlang distribution there is only one tcp/ip connection between any pair of nodes (usually in a fully mesh arrangement, so one connection for every pair of nodes). All rpc/internode communication will happen over this distribution connection.
Additionally, it's guaranteed that message ordering is preserved between any pair of communicating processes over distribution. Ordering between more than two processes is not defined.
Mnesia gives you a lot of options for data placement. If you want your persistent storage on node B, but processing done on node A, you could have disc_only_copies of your tables on B and ram_copies on node A. That way applications on node A can get quick access to data, and you'll still get durable copies on node B.
I'm assuming that the network between A and B is a reliable LAN that is seldom going to partition (otherwise you're going to spend a bunch of time getting mnesia back online after a partition).
If both A and B are running mnesia, then I would let mnesia do all the RPC for me - this is what mnesia is built for and it has a number of optimizations. I wouldn't roll my own RPC or distribution mechanism without a very good reason.
As for benchmarks, this is entirely dependent on your hardware, mnesia schema and network between nodes (as well as your application's data access patterns). No one can give you these benchmarks, you have to run them yourself.
As for other RPC mechanisms for accessing mnesia, I don't think there are any out of the box, but there are many RPC libraries you could use to present the mnesia API to the network with a small amount of effort on your part.