生产者完成后通知消费者
我正在从 ldap 读取大量数据,需要将这些数据与数据库中的相应记录进行比较。 为了最大限度地减少 SQL 查询的数量,我想将多个 ldap 记录批处理到一个查询中。
所有这一切都非常简单:一个线程生成 ldap 结果,一个线程使用这些结果并运行 SQL 查询。
ldap_results = Queue.Queue(10) def producer(): for result in ldap_results(): ldap_results.put(result) def consumer(): buffer = [] buffer_size = 5 while True: record = ldap_results.get() buffer.append(record) if len(buffer) >= buffer_size: do_sql(buffer) buffer = []
问题是:如果 ldap 仅返回 3 个结果,而 buffer_size 为 5,那么它将永远阻塞。 我意识到我可以将一些特殊的标记放入缓冲区,例如 None
或 "EOF"
,但这似乎是糟糕的设计:“迭代直到完成,哦,除非您看到这个特殊值,否则意味着您也完成了”。
我想出了两个替代想法。 第一个是有一个共享的 eof 变量,但我不知道如何正确同步它。
def producer(): while data: buffer.put() eof = True def consumer(): while not eof: buffer.get()
第二种是为生产者提供一个 ProduceChunks(chunk_size)
方法,它将处理结果的批处理,但我不喜欢这样,因为它假设生产者知道如何最好地缓冲结果,我真的认为这是消费者的责任。
有人有任何指导吗?
I'm reading in a lot of data from ldap which needs to be compared to the respective records in the database. To minimize the number of SQL queries, I want to batch multiple ldap records into a single query.
All this is pretty simple: A thread to produce ldap results, and a thread to consume those results and run the SQL query.
ldap_results = Queue.Queue(10) def producer(): for result in ldap_results(): ldap_results.put(result) def consumer(): buffer = [] buffer_size = 5 while True: record = ldap_results.get() buffer.append(record) if len(buffer) >= buffer_size: do_sql(buffer) buffer = []
The problem is: If ldap only returns, say, 3 results and buffer_size
is 5, it'll end up blocking forever. I realize I could put some special token into the buffer, like None
, or "EOF"
, but that seems like bad design: "iterate until you're done, oh, unless you see this special value, that means you're done, too".
I came up with two alternative ideas. The first is to have a shared eof
variable, but I don't know how to properly synchronize it.
def producer(): while data: buffer.put() eof = True def consumer(): while not eof: buffer.get()
The second is to have a ProduceChunks(chunk_size)
method for the producer, and it'll handle the batching up of results, but I don't like that because it assumes the producer will know how best to buffer up results, when, really, I think that is the responsibility of the consumer.
Does anyone have any guidance?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我会遵循“让它运行、让它正确、让它快速、让它简单”的模式。
如果没有特殊的“EOF”标记,您能否正确实现此功能? 如果没有,那么您只需使用 EOF 令牌即可,不用担心。 是的,终止条件更复杂,但现在是“正确”。
I would follow the "Make it Run, Make it Right, Make it Fast, Make it Simple" pattern.
Can you implement this correctly without an special "EOF" token? If not, then you just have to use the EOF token, do not sweat it. Yes, the termination condition is more complex, but now it is "Right."
“EOF”方法是完全值得尊敬的。 让我们看一下 ANSI 字符串的缩影。 空值是 EOF。 这有什么不好呢?
或者,让我们看一下 BSTR 的缩影。 第一个字节不是尾随空值,而是告诉您字节是如何到来的。
无论哪种方式都可以。
没关系。
The "EOF" approach is perfectly respectable. Let's look at the microcosm of an ANSI string. The null is the EOF. What's bad about that?
Alternatively, let's look at the microcosm of the BSTR. Instead of a trailing null, the first bytes tell you how bytes are coming.
Either way is fine.
Doesn't matter.