Erlang/OTP 框架的 error_logger 在相当高的负载下挂起

发布于 2024-09-16 13:50:45 字数 607 浏览 3 评论 0原文

我的应用程序基本上是一个基于内容的路由器,它将路由彩信事件。

我使用的记录器是 SASL 模式下 OTP 框架附带的记录器“error_logger

问题是::

我正在使用客户端生成默认的 MMS 事件价值观。该客户端(Java 中)能够在多个线程中发送高负载事件

我正在 10 个线程中发送 100 个事件(每个线程发送 10 个 MMS 事件)到我的路由器是用 Erlang/OTP 编写的。

问题是,当我的路由器收到如此高的负载时,我的记录器挂起,即它停止更新我的日志文件。但路由器仍然能够路由事件。

我得出的结论是::

  1. 当接收到如此高的事件负载时,Erlang 中的调度问题(每个事件都有一个单独的进程)。

  2. 极不可能的死锁状态。

  3. 可能是由于在多个线程中发送事件而不是按顺序发送事件。但我猜想一个路由器会连接到多个服务提供商盒子,所以我想到了在线程中发送事件。

有人可以帮助 mw 揭开这个问题的神秘面纱吗?

My application is basically a content based router which will route MMS events.

The logger I am using is the one that comes with the OTP framework in SASL mode "error_logger"

The issue is ::

I am using a client to generate MMS events with default values. This client (in Java) has the ability to send high load of events in multiple THREADS

I am sending 100 events in 10 threads (each thread sending 10 MMS events) to the my router written in Erlang/OTP.

The problem is, when such high load is received by my router , my Logger hangs i.e it stops updating my Log file. But the router is still able to route the events.

The conclusions that I have come up with is ::

  1. Scheduling problem in Erlang when such high load of events is received (a separate process for each event).

  2. A very unlikely dead-loack state.

  3. Might be due to sending events in multiple threads rather than sending them sequentially. But I guess a router will be connected to multiple service provider boxes, so I thought of sending events in threads.

Can anybody help mw in demystifying the problem?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

清音悠歌 2024-09-23 13:50:45

您已经有了一个很好的答案,但我会添加到讨论中。

error_logger 默认使用缓存的磁盘写入操作。因此一种可能性是,在低负载下您并没有真正注意到这一点,但在高负载下您的写入会在缓存中卡住一段时间。

附带说明:多个线程调用 Erlang 应该没有问题。

另一种测试方法是将您自己的记录器添加到 error_logger,然后看看会发生什么。可能打印到外壳或其他“快速”的东西。

You already have a good answer, but I'll add to the discussion.

The error_logger is by default using cached write operations to disk. So one possibility is that you don't really notice this while under low load, but under high load your writes get stuck in the cache for a while.

On a side note: there should be no problem having multiple threads doing calls to Erlang.

Another way of testing this is to add your own logger to error_logger, and see what happens. Possibly printing to the shell or something else that is "fast".

聽兲甴掵 2024-09-23 13:50:45

您使用的是哪个版本的 Erlang?在 R14A(也许是 R13B4?)之前,当消息队列包含大量消息时调用选择性接收时,会出现性能损失。这种行为意味着,在接收大量消息的进程中(error_logger 是典型的示例),如果它勉强跟上负载,那么负载的小峰值可能会导致处理成本飙升并留在那里,因为新的处理成本高于流程的承受能力。这个问题在R14A中已经得到解决。

其次 - 为什么要向文本记录器发送大量事件/调用/日志?例如,格式化字符串以输出到人类可读的日志文件比使用二进制 disk_log 昂贵得多。降低日志记录成本会有所帮助,但减少日志量会更有帮助。也许调查一下为什么你需要记录这些东西,看看你是否不能以另一种(更便宜的)方式记录它们。

error_logger 的问题通常是其他一些过载问题的症状。发生此问题时,尝试查看所有进程的消息队列大小,并查看是否也备份了其他内容。以下 erlang shellcode 可能会有所帮助:

[ { P, element(2, process_info(P, message_queue_len)) } 
  || P <- erlang:processes(), is_process_alive(P) ]

Which version of Erlang are you using? Prior to R14A (R13B4 maybe?), there was a performance penalty when you invoked a selective receive when the message queue contained a lot of messages. This behaviour meant that in a process that receives lots of messages (error_logger being the canonical example), if it was barely keeping up with the load then a small spike in load could cause the cost of processing to spike up and stay there as the new processing cost was higher than the process could bear. This problem has been solved in R14A.

Secondly - why are you sending a high volume of events/calls/logs to a text logger? Formatting strings for output to a human readable log file is a lot more expensive than using a binary disk_log for instance. Reducing the cost of logging will help, but reducing the volume of logs will help even more. Maybe investigate exactly why you need to log these things and see if you can't record them another (less expensive) way.

Problems with error_logger are often symptoms of some other overload problem. Try looking at the message queue sizes for all your processes when this problem occurs and see if something else is backed up too. The following erlang shellcode might help:

[ { P, element(2, process_info(P, message_queue_len)) } 
  || P <- erlang:processes(), is_process_alive(P) ]
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文