如何使用多个参数的多处理池。Starmap
我有一个问题..用于使用pool.Starmap ..
p1 = pd.dataframe(example1)
p2 = pd.dataframe(example2)
pairs = itertools.product(p1.iterrows(), p2.iterrows())
pairs_len = len(p1) * len(p2)
tpairs = tqdm(pairs, desc='Make pair data..', total=pairs_len)
def mkpair(p1, p2, ext=False):
result = {}
if not ext:
for idx, xcol in enumerate(p1.columns):
result[f"D_{idx}"] = float(p1[xcol]) - float(p2[xcol])
return result
pool = Pool(process=4)
pool.starmap(mkpair, tpairs)
pool.close()
pool.join()
我想在池中的tpairs中获得一个P1。
但是发生“ typeError:元组索引必须是整数或切片,而不是str”,
我也希望我还想知道是否可以通过添加ext = true参数将其放入[p1,p2,ext]的表达式[P1,P2,Ext]中。
I have a question.. for using Pool.starmap..
p1 = pd.dataframe(example1)
p2 = pd.dataframe(example2)
pairs = itertools.product(p1.iterrows(), p2.iterrows())
pairs_len = len(p1) * len(p2)
tpairs = tqdm(pairs, desc='Make pair data..', total=pairs_len)
def mkpair(p1, p2, ext=False):
result = {}
if not ext:
for idx, xcol in enumerate(p1.columns):
result[f"D_{idx}"] = float(p1[xcol]) - float(p2[xcol])
return result
pool = Pool(process=4)
pool.starmap(mkpair, tpairs)
pool.close()
pool.join()
I want to get one of P1.iterrows and one of P2.iterrows in tpairs in the pool and put them as p1 and p2 arguments.
but occur "TypeError: tuple indices must be integers or slices, not str"
and i want I'm also wondering if it's possible to put it in the expression [p1, p2, ext] by adding the ext=True argument.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您的代码有几个问题:
pandas.iterrows()
返回迭代器时长度2其中t [0]
是行索引,t [1]
是pandas.Series
实例。您的工作功能mkpair
将通过其中两个元组传递,一个来自每个数据框架。作为参数p1
和p2
。但是您正在调用p1.columns
其中p1
是元组,而元组没有列
的属性。因此,这应该提出一个完全不同的异常(并且pandas.Series
也没有这样的方法)。因此,我看不出您是如何获得您发布的实际代码所要求的例外的。此外,您的语句pool = pool(process = 4)
是不正确的,因为正确的关键字参数为 processes not process 。因此,您不可能执行此代码(我会忽略缺少的导入语句,我认为您实际上确实有)。您需要做的事情,假设您希望进度栏在完成任务完成后进行进度,就是使用
imap_unordered
或apply_ashnc
life pownback诸如允许的方法您要在任务完成后更新栏。如果要将结果存储在任务提交顺序中的列表中而不是完成顺序,并且您使用的是imap_unordered
,则需要将索引传递给Worker函数然后,它可以通过结果返回。如果您使用apply_async
,则代码更简单,但是此方法不允许您在“块”中提交任务,如果提交的总任务的数量非常大(请参见方法方法imap_unordered
参数。这是您将如何使用每种方法:使用imap_unordered
使用apply_async
You have several issues with your code:
pandas.iterrows()
you are passed a tuplet
of length 2 wheret[0]
is the row index andt[1]
is apandas.Series
instance. Your worker function,mkpair
, will be passed two of these tuples, one from each dataframe. as argumentsp1
andp2
. But you are callingp1.columns
wherep1
is a tuple and tuples have no such attribute ascolumns
. So this should have raised an altogether different exception (and apandas.Series
has no such method either). So I don't see how you are getting the exception you claim from the actual code you posted. Moreover, your statementpool = Pool(process=4)
is incorrect as the correct keyword argument is processes not process. So you could not possibly be executing this code (I will overlook the missing import statements, which I assume you actually have).What you need to do, assuming you want the progress bar to progress as tasks are completed, is to use a method such as
imap_unordered
orapply_async
with a callback, which will allow you to update the bar as tasks complete. If you want to store the results in a list in task submission order rather than in completion order and you are usingimap_unordered
, then you need to pass an index to the worker function that it then returns back with the result. The code is simpler if you useapply_async
but this method does not allow you to submit tasks in "chunks", which becomes an issue if the number of total tasks being submitted is very large (see the chunksize argument for methodimap_unordered
). Here is how you would use each method:Using imap_unordered
Using apply_async