在Python多处理管理器名称空间中,为什么不能直接分配

发布于 2025-02-03 20:00:11 字数 957 浏览 3 评论 0 原文

谁能帮忙弄清楚为什么我们不能直接更改数据框架? add_new_derived_column_not_work不正常工作

#-*- coding: UTF-8 -*-'
import pandas as pd
import numpy as np
from multiprocessing import *
import multiprocessing.sharedctypes as sharedctypes
import ctypes

def add_new_derived_column_work(ns):
    dataframe2 = ns.df
    dataframe2['new_column']=dataframe2['A']+dataframe2['B'] / 2
    print (dataframe2.head())
    ns.df = dataframe2

def add_new_derived_column_NOT_work(ns):
    ns.df['new_column']=ns.df['A']+ns.df['B'] / 2
    print (ns.df.head())

if __name__ == "__main__":

    mgr = Manager()
    ns = mgr.Namespace()

    dataframe = pd.DataFrame(np.random.randn(100000, 2), columns=['A', 'B'])
    ns.df = dataframe
    print (dataframe.head())

    # then I pass the "shared_df_obj" to Mulitiprocessing.Process object
    process=Process(target=add_new_derived_column_work, args=(ns,))
    process.start()
    process.join()

    print (ns.df.head())

Can anyone do a favor to figure out why we cant change dataframe directly?
add_new_derived_column_NOT_work dont work as I expected

#-*- coding: UTF-8 -*-'
import pandas as pd
import numpy as np
from multiprocessing import *
import multiprocessing.sharedctypes as sharedctypes
import ctypes

def add_new_derived_column_work(ns):
    dataframe2 = ns.df
    dataframe2['new_column']=dataframe2['A']+dataframe2['B'] / 2
    print (dataframe2.head())
    ns.df = dataframe2

def add_new_derived_column_NOT_work(ns):
    ns.df['new_column']=ns.df['A']+ns.df['B'] / 2
    print (ns.df.head())

if __name__ == "__main__":

    mgr = Manager()
    ns = mgr.Namespace()

    dataframe = pd.DataFrame(np.random.randn(100000, 2), columns=['A', 'B'])
    ns.df = dataframe
    print (dataframe.head())

    # then I pass the "shared_df_obj" to Mulitiprocessing.Process object
    process=Process(target=add_new_derived_column_work, args=(ns,))
    process.start()
    process.join()

    print (ns.df.head())

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

木格 2025-02-10 20:00:11

根据Python文档:

,我认为这是为什么, add_new_derived_column_work 有效,因为它 nistion a ns.df的新数据帧,而 add_new_derived_column_not_work 失败,因为它试图通过突变 IT将列添加到dataframe IT,但是这种突变并不是实际影响ns.df。

As per Python documentation:

  • a namespace object has "writable attributes"

  • but about proxy objects, you can read that "If standard (non-proxy) list or dict objects are contained in a referent, modifications to those mutable values will not be propagated through the manager because the proxy has no way of knowing when the values contained within are modified. However, storing a value in a container proxy does propagate through the manager and so to effectively modify such an item"

This why, I think, add_new_derived_column_work works, as it assign a new dataframe to ns.df, whereas add_new_derived_column_NOT_workfails, as it tries to add a column to the dataframe by mutating it, but this mutation does not actualy affect ns.df.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文