优化 Erlang 进程的性能
在我在这里构建的测试中,我的目标是创建一个解析器。因此,我构建了一个概念证明,可以从文件中读取所有消息,并将所有消息推送到内存后,我将生成一个进程来解析每条消息。在那之前,一切都很好,并且我得到了一些不错的结果。但我可以看到 erlang VM 并没有使用我所有的处理器能力(我有一个四核),事实上,在进行测试时它使用了大约 25% 的处理器。我使用 C++ 进行了一个反测试,它使用四个线程,显然它使用了 100%,从而产生了更好的结果(我尊重 erlang 使用的相同队列模型)。
所以我想知道什么可能“减慢”我的 erlang 测试?我知道这不是序列化问题,因为我为每条消息生成一个进程。我想到的一件事是,也许我的消息太小(每个消息大约 10k),因此制作这么多进程无助于实现出色的性能。
有关测试的一些事实:
106k 消息 在 erlang 上(使用了 25% 处理器功率)- 204 毫秒 在我的 C++ 测试中(使用 100% 处理器功率)- 80 毫秒
是的,差别不是很大,但如果有更多可用功率,肯定还有更大的改进空间,对吗?
啊,我已经做了一些分析,但无法找到另一种优化方法,因为函数调用很少,而且大多数都是字符串到对象的转换。
更新:
哇哦!遵循 Hassan Syed 的想法,我成功实现了 35 毫秒,而 C++ 为 80 毫秒!这太棒了!
In a test I'm building here my goal is to create a parser. So I've built a concept proof that reads all messages from a file, and after pushing all of them to memory I'm spawning one process to parse each message. Until that, everything is fine, and I've got some nice results. But I could see that the erlang VM is not using all my processor power (I have a quad core), in fact it is using about 25% percent of my processor when doing my test. I've made a counter-test using c++ that uses four threads and obviously it is using 100% thus producing a better result (I've respected the same queue model erlang uses).
So I'm wondering what could be "slowing" my erlang test? I know it's not a serialization matter as I'm spawning one process per message. One thing I've thought is that maybe my message is too small (about 10k each), and so making that much of processes is not helping achieve a great performance.
Some facts about the test:
106k messages
On erlang (25% processor power used) - 204 msecs
On my C++ test (100% processor power used) - 80 msecs
Yes the difference isn't that great but if there is more power available certainly there is more room for improvement, right?
Ah, I've done some profilling and wasn't able to find another way to optimize, since there are few function calls and most of them are string to object convertion.
Update:
Woooow! Following Hassan Syed idea, I've managed to achieve 35 msecs against 80 from c++! This is awesome!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
看来您的 erlang VM 仅使用一个核心。
尝试像这样启动它:
-smp 启用标志告诉 Erlang 在启用 SMP 支持的情况下启动运行时系统
使用 +S 4 启动 4 个 Erlang 调度程序(每个核心 1 个)
您可以在启动 shell 时查看是否启用了 SMP:
[smp:2:2] 告诉它正在启用 smp 的情况下运行 2 个调度程序 2 个在线调度程序
It seems your erlang VM is using only one core.
Try starting it like this:
The -smp enable flag tells Erlang to start the runtime system with SMP support enabled
With +S 4 you start 4 Erlang schedulers (1 for each core)
You can see if you have SMP enabled when you start the shell:
[smp:2:2] tells it is running with smp enabled 2 schedulers 2 schesulers online
如果您只有一次源文件,并且为每个“表达式”生成一个进程,您真的不知道何时进行并行化。生成、处理和处理一个表达式的成本远高于仅用一个进程来处理整个文件。一种合适的策略是每个文件一个进程,而不是每个表达式一个进程。
另一种替代策略是将文件分割为两个、三个或 x 个块,并处理这些块。当然,这假设源不是线性相关的,并且块的处理时间需要超过创建和生成进程的时间(通常到目前为止,因为进程 X 中的时间浪费是从机器的其余部分夺走的时间) 。
-- 讨论 C++ 与 Erlang 以及您的发现 --
Erlang 有一个用户空间内核,可以模拟操作系统内核的许多原语。特别是调度程序和阻塞原语。这意味着在比较过程原始语言(例如 C++)中使用的相同策略时会产生一些开销。您必须根据其属性将任务分区调整为实现空间(CPU/内存/操作系统/编程语言)中的每个条目。
If you have once source file and you spawn one process per "expression" you really do not understand when to parallelise. It costs FAR more to spawn and process and process an expression than to just have one process to process an entire file. A suitable strategy would be to have one process per file rather than one process per expression.
Another alternative strategy would be to split the file in two,three or x chunks, and process those chunks. This of course assumes the source isn't linearly dependant and the chunks' processing time needs to exceed the time to create and spawn a process (ussualy by far, because time waste in process X is time taken away from the rest of the machine).
-- Discussion C++ vs Erlang and your findings --
Erlang has a user-space kernel that emulates a lot of the primitives of the OS kernel. Especially the scheduler and blocking primitives. This means that there is some overhead when comparing the same strategy used in a procedural raw language such as C++. You must tune your task partitioning to every entry from the implementation space (CPU/memory/OS/programming language) according to its properties.
您应该将调度程序绑定到 CPU 核心:
You should bind the schedulers to the CPU cores: