Pandas DataFrame浅副本不对数据更改反应?
我有一个包装类来处理特定的数据帧和一些修饰函数/可调用函数来操作它。
class PhoneNumberCleaner:
def __init__(self, data: pd.DataFrame, pattern: str):
self.data = data # shallow copy?
self.pattern = pattern
def __call__(self, *args, **kwargs) -> pd.DataFrame:
drop_mask = self.data['phoneNumber'].apply(
lambda pn: not re.fullmatch(self.pattern, pn)
)
drop_mask_index = drop_mask[drop_mask].index
return self.data.drop(drop_mask_index)
class Wrapper:
def __init__(self, data: pd.DataFrame):
self.data = data
def modify(self, modifier: Callable, *args, **kwargs):
self.data = modifier(*args, **kwargs)
现在,假设我有以下数据:
df_data = {
'name': ['Mickey', 'Anna', 'Todd', 'Lee', 'Amanda', 'Jake'],
'phoneNumber': [
'0321111444---',
'0335555666',
'0330001234',
'0330123456789',
'0328888999',
'0999999999999',
]
}
df = pd.DataFrame(df_data)
并且我想删除电话号码模式不正确的行:
wrapper = Wrapper(df)
number_cleaner = PhoneNumberCleaner(wrapper.data, r'\d{10}')
wrapper.modify(number_cleaner)
打印包装数据工作正常:
print(wrapper.data)
name phoneNumber
1 Anna 0335555666
2 Todd 0330001234
4 Amanda 0328888999
但是,当我想通过 PhoneNumberCleaner
对象访问相同的数据时(即应该引用相同的数据帧),我得到旧数据:
print(number_cleaner.data)
name phoneNumber
0 Mickey 0321111444---
1 Anna 0335555666
2 Todd 0330001234
3 Lee 0330123456789
4 Amanda 0328888999
5 Jake 0999999999999
我尝试在 Wrapper
中分配数据时添加 .copy(deep=False)
和PhoneNumberCleaner
类,但没有帮助。我在这里缺少什么?
I have a wrapper class to work with a specific dataframe and some modifier functions/callables to operate with it.
class PhoneNumberCleaner:
def __init__(self, data: pd.DataFrame, pattern: str):
self.data = data # shallow copy?
self.pattern = pattern
def __call__(self, *args, **kwargs) -> pd.DataFrame:
drop_mask = self.data['phoneNumber'].apply(
lambda pn: not re.fullmatch(self.pattern, pn)
)
drop_mask_index = drop_mask[drop_mask].index
return self.data.drop(drop_mask_index)
class Wrapper:
def __init__(self, data: pd.DataFrame):
self.data = data
def modify(self, modifier: Callable, *args, **kwargs):
self.data = modifier(*args, **kwargs)
Now, let's say I have following data:
df_data = {
'name': ['Mickey', 'Anna', 'Todd', 'Lee', 'Amanda', 'Jake'],
'phoneNumber': [
'0321111444---',
'0335555666',
'0330001234',
'0330123456789',
'0328888999',
'0999999999999',
]
}
df = pd.DataFrame(df_data)
and I want to drop rows where person has incorrect phone number pattern:
wrapper = Wrapper(df)
number_cleaner = PhoneNumberCleaner(wrapper.data, r'\d{10}')
wrapper.modify(number_cleaner)
Printing wrapper data works fine:
print(wrapper.data)
name phoneNumber
1 Anna 0335555666
2 Todd 0330001234
4 Amanda 0328888999
However, when I want to access same data through PhoneNumberCleaner
object (that is supposed to refer to the same dataframe), I get the old data:
print(number_cleaner.data)
name phoneNumber
0 Mickey 0321111444---
1 Anna 0335555666
2 Todd 0330001234
3 Lee 0330123456789
4 Amanda 0328888999
5 Jake 0999999999999
I tried to add .copy(deep=False)
when assigning data in Wrapper
and PhoneNumberCleaner
classes, but it doesn't help. What am I missing here?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
此行:
/a>返回新的数据框。未修改原始dataFrame(
self.data
)。将其更改为:
This line:
DataFrame.drop
returns a new dataframe. The original dataframe (self.data
) was not modified.Change it to: