在Python多处理管理器名称空间中,为什么不能直接分配
谁能帮忙弄清楚为什么我们不能直接更改数据框架? add_new_derived_column_not_work不正常工作
#-*- coding: UTF-8 -*-'
import pandas as pd
import numpy as np
from multiprocessing import *
import multiprocessing.sharedctypes as sharedctypes
import ctypes
def add_new_derived_column_work(ns):
dataframe2 = ns.df
dataframe2['new_column']=dataframe2['A']+dataframe2['B'] / 2
print (dataframe2.head())
ns.df = dataframe2
def add_new_derived_column_NOT_work(ns):
ns.df['new_column']=ns.df['A']+ns.df['B'] / 2
print (ns.df.head())
if __name__ == "__main__":
mgr = Manager()
ns = mgr.Namespace()
dataframe = pd.DataFrame(np.random.randn(100000, 2), columns=['A', 'B'])
ns.df = dataframe
print (dataframe.head())
# then I pass the "shared_df_obj" to Mulitiprocessing.Process object
process=Process(target=add_new_derived_column_work, args=(ns,))
process.start()
process.join()
print (ns.df.head())
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
根据Python文档:
a “可写属性”
但关于 ,您可以读到“如果标准(非Proxy)列表或dict对象包含在指称中,则不会通过经理对这些可变值进行修改,因为代理无法知道何时包含的值包含的值在内部进行了修改。
,我认为这是为什么,
add_new_derived_column_work
有效,因为它 nistion a ns.df的新数据帧,而add_new_derived_column_not_work
失败,因为它试图通过突变 IT将列添加到dataframe IT,但是这种突变并不是实际影响ns.df。As per Python documentation:
a namespace object has "writable attributes"
but about proxy objects, you can read that "If standard (non-proxy) list or dict objects are contained in a referent, modifications to those mutable values will not be propagated through the manager because the proxy has no way of knowing when the values contained within are modified. However, storing a value in a container proxy does propagate through the manager and so to effectively modify such an item"
This why, I think,
add_new_derived_column_work
works, as it assign a new dataframe to ns.df, whereasadd_new_derived_column_NOT_work
fails, as it tries to add a column to the dataframe by mutating it, but this mutation does not actualy affect ns.df.