Python/Redis 多处理

发布于 2024-10-07 17:42:21 字数 301 浏览 2 评论 0原文

我使用多处理库中的 Pool.map 来迭代大型 XML 文件,并将单词和 ngram 计数保存到一组三个 Redis 服务器中。 (完全位于内存中)但由于某种原因,所有 4 个 cpu 核心始终处于大约 60% 的空闲状态。服务器有足够的 RAM,iotop 显示没有发生磁盘 IO。

我有 4 个 python 线程和 3 个 redis 服务器作为守护进程在三个不同的端口上运行。每个 Python 线程都连接到所有三个服务器。

每台服务器上的 Redis 操作数量远低于其基准测试能力。

我找不到这个程序的瓶颈?可能的候选人是什么?

I'm using Pool.map from the multiprocessing library to iterate through a large XML file and save word and ngram counts into a set of three redis servers. (which sit completely in memory) But for some reason all 4 cpu cores sit around 60% idle the whole time. The server has plenty of RAM and iotop shows that there is no disk IO happening.

I have 4 python threads and 3 redis servers running as daemons on three different ports. Each Python thread connects to all three servers.

The number of redis operations on each server is well below what it's benchmarked as capable of.

I can't find the bottleneck in this program? What would be likely candidates?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

娇妻 2024-10-14 17:42:21

网络延迟可能会导致 Python 客户端应用程序中的 CPU 空闲时间增加。如果客户端到服务器之间的网络延迟只有 2 毫秒,并且您执行 10,000 个 redis 命令,则无论任何其他组件的速度如何,您的应用程序都必须空闲至少 20 秒。

使用多个 python 线程会有所帮助,但是当阻塞命令发送到服务器时,每个线程仍然会处于空闲状态。除非您有很多线程,否则它们通常会同步并且全部阻塞等待响应。由于每个线程都连接到所有三个服务器,因此发生这种情况的可能性会降低,除非所有线程都因等待同一服务器而被阻塞。

假设您在服务器之间拥有统一的随机分布式访问来服务您的请求(通过对键名称进行散列来实现分片或分区),那么三个随机请求散列到同一台 Redis 服务器的几率成反比与服务器数量成正比。对于 1 个服务器,100% 的时间您将散列到同一台服务器,对于 2 个服务器,则为 50%,对于 3 个服务器,则为 33%。可能发生的情况是,有 1/3 的时间,所有线程都被阻塞等待同一台服务器。 Redis在处理数据操作时是单线程的,因此它必须一个接一个地处理每个请求。您观察到您的 CPU 利用率仅达到 60%,这与您的请求因同一服务器的网络延迟而全部被阻止的可能性是一致的。

继续假设您通过对键名称进行哈希来实现客户端分片,您可以通过为每个线程分配一个服务器连接来消除线程之间的争用,并在将请求传递给工作线程之前评估分区哈希。这将确保所有线程都在等待不同的网络延迟。但使用管道可能会有更好的改进。

您可以使用redis-py 模块的管道功能来减少网络延迟的影响,如果您不需要服务器立即提供结果。这对您来说可能是可行的,因为您似乎将数据处理的结果存储到 redis 中。要使用 redis-py 实现此功能,请使用 .pipeline() 方法定期获取现有 Redis 连接对象的管道句柄,并针对该新句柄调用多个存储命令,就像对主句柄一样redis.Redis 连接对象。然后调用 .execute() 来阻止回复。通过使用管道将数十或数百个命令一起批处理,您可以获得数量级的改进。在您对管道句柄发出最终的.execute() 方法之前,您的客户端线程不会阻塞。

如果您应用这两项更改,并且每个工作线程仅与一台服务器通信,将多个命令一起通过管道传输(至少 5-10 个命令才能看到显着的结果),您可能会看到客户端中的 CPU 使用率更高(接近 100%)。 cpython GIL 仍然会将客户端限制为一个核心,但听起来您已经通过使用多处理模块使用其他核心进行 XML 解析。

redis.io 网站上有一篇关于管道的很好的文章

Network latency may be contributing to your idle CPU time in your python client application. If the network latency between client to server is even as little as 2 milliseconds, and you perform 10,000 redis commands, your application must sit idle for at least 20 seconds, regardless of the speed of any other component.

Using multiple python threads can help, but each thread will still go idle when a blocking command is sent to the server. Unless you have very many threads, they will often synchronize and all block waiting for a response. Because each thread is connecting to all three servers, the chances of this happening are reduced, except when all are blocked waiting for the same server.

Assuming you have uniform random distributed access across the servers to service your requests (by hashing on key names to implement sharding or partitioning), then the odds that three random requests will hash to the same redis server is inversely proportional to the number of servers. For 1 server, 100% of the time you will hash to the same server, for 2 it's 50% of the time, for 3 it's 33% of the time. What may be happening is that 1/3 of the time, all of your threads are blocked waiting for the same server. Redis is a single-threaded at handling data operations, so it must process each request one after another. Your observation that your CPU only reaches 60% utilization agrees with the probability that your requests are all blocked on network latency to the same server.

Continuing the assumption that you are implementing client-side sharding by hashing on key names, you can eliminate the contention between threads by assigning each thread a single server connection, and evaluate the partitioning hash before passing a request to a worker thread. This will ensure all threads are waiting on different network latency. But there may be an even better improvement by using pipelining.

You can reduce the impact of network latency by using the pipeline feature of the redis-py module, if you don't need an immediate result from the server. This may be viable for you, since you are storing the results of data processing into redis, it seems. To implent this using redis-py, periodically obtain a pipeline handle to an existing redis connection object using the .pipeline() method and invoke multiple store commands against that new handle the same as you would for the primary redis.Redis connection object. Then invoke .execute() to block on the replies. You can get orders of magnitude improvement by using pipelining to batch tens or hundreds of commands together. Your client thread won't block until you issue the final .execute() method on the pipeline handle.

If you apply both changes, and each worker thread communicates to just one server, pipelining multiple commands together (at least 5-10 to see a significant result), you may see greater CPU usage in the client (nearer to 100%). The cpython GIL will still limit the client to one core, but it sounds like you are already using other cores for the XML parsing by using the multiprocessing module.

There is a good writeup about pipelining on the redis.io site.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文