Pandas DataFrame浅副本不对数据更改反应?

发布于 01-18 16:56 字数 1970 浏览 4 评论 0原文

我有一个包装类来处理特定的数据帧和一些修饰函数/可调用函数来操作它。

class PhoneNumberCleaner:
    def __init__(self, data: pd.DataFrame, pattern: str):
        self.data = data  # shallow copy?
        self.pattern = pattern

    def __call__(self, *args, **kwargs) -> pd.DataFrame:
        drop_mask = self.data['phoneNumber'].apply(
            lambda pn: not re.fullmatch(self.pattern, pn)
        )
        drop_mask_index = drop_mask[drop_mask].index
        return self.data.drop(drop_mask_index)


class Wrapper:
    def __init__(self, data: pd.DataFrame):
        self.data = data

    def modify(self, modifier: Callable, *args, **kwargs):
        self.data = modifier(*args, **kwargs)

现在,假设我有以下数据:

df_data = {
    'name': ['Mickey', 'Anna', 'Todd', 'Lee', 'Amanda', 'Jake'],
    'phoneNumber': [
        '0321111444---',
        '0335555666',
        '0330001234',
        '0330123456789',
        '0328888999',
        '0999999999999',
    ]
}
df = pd.DataFrame(df_data)

并且我想删除电话号码模式不正确的行:

wrapper = Wrapper(df)
number_cleaner = PhoneNumberCleaner(wrapper.data, r'\d{10}')
wrapper.modify(number_cleaner)

打印包装数据工作正常:

print(wrapper.data)

     name phoneNumber
1    Anna  0335555666
2    Todd  0330001234
4  Amanda  0328888999

但是,当我想通过 PhoneNumberCleaner 对象访问相同的数据时(即应该引用相同的数据帧),我得到旧数据:

print(number_cleaner.data)

     name    phoneNumber
0  Mickey  0321111444---
1    Anna     0335555666
2    Todd     0330001234
3     Lee  0330123456789
4  Amanda     0328888999
5    Jake  0999999999999

我尝试在 Wrapper 中分配数据时添加 .copy(deep=False)PhoneNumberCleaner 类,但没有帮助。我在这里缺少什么?

I have a wrapper class to work with a specific dataframe and some modifier functions/callables to operate with it.

class PhoneNumberCleaner:
    def __init__(self, data: pd.DataFrame, pattern: str):
        self.data = data  # shallow copy?
        self.pattern = pattern

    def __call__(self, *args, **kwargs) -> pd.DataFrame:
        drop_mask = self.data['phoneNumber'].apply(
            lambda pn: not re.fullmatch(self.pattern, pn)
        )
        drop_mask_index = drop_mask[drop_mask].index
        return self.data.drop(drop_mask_index)


class Wrapper:
    def __init__(self, data: pd.DataFrame):
        self.data = data

    def modify(self, modifier: Callable, *args, **kwargs):
        self.data = modifier(*args, **kwargs)

Now, let's say I have following data:

df_data = {
    'name': ['Mickey', 'Anna', 'Todd', 'Lee', 'Amanda', 'Jake'],
    'phoneNumber': [
        '0321111444---',
        '0335555666',
        '0330001234',
        '0330123456789',
        '0328888999',
        '0999999999999',
    ]
}
df = pd.DataFrame(df_data)

and I want to drop rows where person has incorrect phone number pattern:

wrapper = Wrapper(df)
number_cleaner = PhoneNumberCleaner(wrapper.data, r'\d{10}')
wrapper.modify(number_cleaner)

Printing wrapper data works fine:

print(wrapper.data)

     name phoneNumber
1    Anna  0335555666
2    Todd  0330001234
4  Amanda  0328888999

However, when I want to access same data through PhoneNumberCleaner object (that is supposed to refer to the same dataframe), I get the old data:

print(number_cleaner.data)

     name    phoneNumber
0  Mickey  0321111444---
1    Anna     0335555666
2    Todd     0330001234
3     Lee  0330123456789
4  Amanda     0328888999
5    Jake  0999999999999

I tried to add .copy(deep=False) when assigning data in Wrapper and PhoneNumberCleaner classes, but it doesn't help. What am I missing here?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

熟人话多 2025-01-25 16:56:06

此行:

class PhoneNumberCleaner:
    def __call__(self, *args, **kwargs) -> pd.DataFrame:
        ...
        return self.data.drop(drop_mask_index)

/a>返回新的数据框。未修改原始dataFrame(self.data)。

将其更改为:

class PhoneNumberCleaner:
    def __call__(self, *args, **kwargs) -> pd.DataFrame:
        ...
        self.data.drop(drop_mask_index, inplace=True)
        return self.data

This line:

class PhoneNumberCleaner:
    def __call__(self, *args, **kwargs) -> pd.DataFrame:
        ...
        return self.data.drop(drop_mask_index)

DataFrame.drop returns a new dataframe. The original dataframe (self.data) was not modified.

Change it to:

class PhoneNumberCleaner:
    def __call__(self, *args, **kwargs) -> pd.DataFrame:
        ...
        self.data.drop(drop_mask_index, inplace=True)
        return self.data
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文