当前位置：文江博客话题详情

我需要进行哪些分析来优化多步骤生产者-消费者模型？

发布于 2024-10-31 05:13:20 字数 1239 浏览 1 评论 0原文

我有一个三步生产者/消费者设置。

Client 创建 JSON 编码的字典并通过命名管道将它们发送到 PipeServer

以下是我的 threading.Thread 子类：

PipeServer 创建一个命名管道并将消息放入队列未处理的消息

处理器从未处理的消息获取项目，处理它们（通过 lambda 函数参数），然后将它们放入队列已处理的消息

Printers 从已处理的消息获取项目，获取锁，打印消息，然后释放锁。

在测试脚本中，我有一台 PipeServer、一台处理器和 4 台打印机：

pipe_name = '\\\\.\\pipe\\testpipe'
pipe_server = pipetools.PipeServer(pipe_name, unprocessed_messages)

json_loader = lambda x: json.loads(x.decode('utf-8'))
processor = threadedtools.Processor(unprocessed_messages,
                                    processed_messages,
                                    json_loader)

print_servers = []
for i in range(4):
    print_servers.append(threadedtools.Printer(processed_messages,
                                         output_lock,
                                         'PRINTER {0}'.format(i)))

pipe_server.start()
processor.start()
for print_server in print_servers:
    print_server.start()

问题：在这种多步骤设置中，我如何考虑优化我应该拥有的打印机与处理器线程的数量？例如，我如何知道 4 是否是最佳的打印机线程数？我应该有更多的处理器吗？

我通读了 Python Profilers 文档，但没有看到任何可以帮助我思考此类权衡的内容。

原文

I have a 3-step producer/consumer setup.

Client creates JSON-encoded dictionaries and sends them to PipeServer via a named pipe

Here are my threading.Thread subclasses:

PipeServer creates a named pipe and places messages into a queue unprocessed messages

Processor gets items from unprocessed messages, processes them (via a lambda function argument), and puts them into a queue processed messages

Printers gets items from processed messages, acquires a lock, prints the message, and releases the lock.

In the test script, I have one PipeServer, one Processor, and 4 Printers:

pipe_name = '\\\\.\\pipe\\testpipe'
pipe_server = pipetools.PipeServer(pipe_name, unprocessed_messages)

json_loader = lambda x: json.loads(x.decode('utf-8'))
processor = threadedtools.Processor(unprocessed_messages,
                                    processed_messages,
                                    json_loader)

print_servers = []
for i in range(4):
    print_servers.append(threadedtools.Printer(processed_messages,
                                         output_lock,
                                         'PRINTER {0}'.format(i)))

pipe_server.start()
processor.start()
for print_server in print_servers:
    print_server.start()

Question: in this kind of multi-step setup, how do I think through optimizing the number of Printer vs. Processor threads I should have? For example, how do I know if 4 is the optimal number of Printer threads to have? Should I have more processors?

I read through the Python Profilers docs, but didn't see anything that would help me think through these kinds of tradeoffs.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

红玫瑰 2024-11-07 05:13:20

一般来说，您希望优化最慢组件的最大吞吐量。在这种情况下，它听起来像是客户端或打印机。如果是客户端，您需要足够的打印机和处理器来跟上新消息（也许这只是其中之一！）。否则，您将在不需要的线程上浪费资源。

如果是打印机，那么您需要针对正在发生的 IO 进行优化。需要考虑的一些变量：

您可以同时拥有多少个锁？
您是否必须在打印事务期间保持锁定？
打印操作需要多长时间？

如果你只能拥有一把锁，那么你就应该只有一个线程，依此类推。

然后，您想要测试真实世界的操作（很难预测 RAM、磁盘和网络活动的哪种组合会减慢您的速度）。检测您的代码，以便您可以查看在任何给定时间有多少线程处于空闲状态。然后创建一个测试用例，以最大吞吐量将数据处理到系统中。从每个组件的任意数量的线程开始。如果客户端、处理器或打印机线程始终繁忙，请添加更多线程。如果某些线程始终处于空闲状态，请删除一些线程。

如果将代码移动到不同的硬件环境，您可能需要重新调整 - 不同数量的处理器、更多的内存、不同的磁盘都会产生影响。

回复收藏 0 原文

~没有更多了~