pandas 将函数应用于具有条件的多个列并创建新列
我有一个像这样的多列的 df(还有更多的列和行):
df = pd.DataFrame([
{'ID': 1,'date': '2022-01-01', 'fruit_code':'[100,99,300]', 'vegetable_code':'[1000,2000,3000]','supermarket':'xy',},
{'ID': 2,'date': '2022-01-01', 'fruit_code':'[67,200,87]', 'vegetable_code':'[5000]','supermarket':'z, m'},
{'ID': 3,'date': '2021-01-01', 'fruit_code':'[100,5,300,78]', 'vegetable_code':'[7000,2000,3000]','supermarket':'wf, z'},
{'ID': 4,'date': '2020-01-01', 'fruit_code':'[77]', 'vegetable_code':'[1000]','supermarkt':'wf'},
{'ID': 5,'date': '2022-15-01', 'fruit_code':'[100,200,546,33]', 'vegetable_code':'[4000,2000,3000]','supermarket':'t, wf'},
{'ID': 6,'date': '2002-12-01', 'fruit_code':'[64,2]', 'vegetable_code':'[6000,8000,1000]','supermarket':'k' },
{'ID': 7,'date': '2018-12-01', 'fruit_code':'[5]', 'vegetable_code':'[6000,8000,1000]','supermarket':'p' }
])
我预期的 df 最终应该如下所示:
df = pd.DataFrame([
{'ID': 1,'date': '2022-01-01', 'fruit_code':'[100,99,300]', 'vegetable_code':'[1000,2000,3000]','supermarket':'xy','new_col_1':'all'},
{'ID': 2,'date': '2022-01-01', 'fruit_code':'[67,200,87]', 'vegetable_code':'[5000]','supermarket':'z, m','new_col_1':'[5000]'},
{'ID': 3,'date': '2021-01-01', 'fruit_code':'[100,5,300,78]', 'vegetable_code':'[7000,2000,3000]','supermarket':'wf, z','new_col_1':'all'},
{'ID': 4,'date': '2020-01-01', 'fruit_code':'[77]', 'vegetable_code':'[1000]','supermarket':'wf','new_col_1':'[77]'},
{'ID': 5,'date': '2022-15-01', 'fruit_code':'[100,200,546,33]', 'vegetable_code':'[4000,2000,3000]','supermarket':'t, wf','new_col_1':'all'},
{'ID': 6,'date': '2002-12-01', 'fruit_code':'[64,2]', 'vegetable_code':'[6000,8000,1000]','supermarket':'k', 'new_col_1':'[64]', 'new_col_2':'[2]'},
{'ID': 7,'date': '2018-12-01', 'fruit_code':'[5]', 'vegetable_code':'[6000,8000,1000]','supermarket':'p' ,'new_col_1':'all'}
])
这里是我想在 colsfruit_code 和 colsfruit_code 上应用的多个条件。 Vegetable_code 来获取两个新列:
更新
def fruits_vegetable(row):
if len(str(row['fruit_code'])) == 1: # fruit_code in new_col_1
row['new_col_1'] = row['fruit_code']
elif len(str(row['fruit_code'])) == 1 and len(str(row['vegetable_code'])) > 1: # write "all" in new_col_1
row['new_col_1'] = 'all'
elif len(str(row['fruit_code'])) > 2 and len(str(row['vegetable_code'])) == 1: # vegetable_code in new_col_1
row['new_col_1'] = row['vegetable_code']
elif len(str(row['fruit_code'])) > 3 and len(str(row['vegetable_code'])) > 1: # write "all" in new_col_1
row['new_col_1'] = 'all'
elif len(str(row['fruit_code'])) == 2 and len(str(row['vegetable_code'])) >= 0: # fruit 1 new_col_1 & fruit 2 new_col_2
row['new_col_1'] = row['fruit_code'][0]
row['new_col_2'] = row['fruit_code'][1]
return row
df = df.apply(fruits_vegetable, axis=1)
我仍然陷入困境,现在我在第一列的某些行中得到“全部”,但第二列没有改变。
如果有人有一些见解,那就太好了。
谢谢,非常感谢
I have a df with multiple columns like this (there are many more cols & rows):
df = pd.DataFrame([
{'ID': 1,'date': '2022-01-01', 'fruit_code':'[100,99,300]', 'vegetable_code':'[1000,2000,3000]','supermarket':'xy',},
{'ID': 2,'date': '2022-01-01', 'fruit_code':'[67,200,87]', 'vegetable_code':'[5000]','supermarket':'z, m'},
{'ID': 3,'date': '2021-01-01', 'fruit_code':'[100,5,300,78]', 'vegetable_code':'[7000,2000,3000]','supermarket':'wf, z'},
{'ID': 4,'date': '2020-01-01', 'fruit_code':'[77]', 'vegetable_code':'[1000]','supermarkt':'wf'},
{'ID': 5,'date': '2022-15-01', 'fruit_code':'[100,200,546,33]', 'vegetable_code':'[4000,2000,3000]','supermarket':'t, wf'},
{'ID': 6,'date': '2002-12-01', 'fruit_code':'[64,2]', 'vegetable_code':'[6000,8000,1000]','supermarket':'k' },
{'ID': 7,'date': '2018-12-01', 'fruit_code':'[5]', 'vegetable_code':'[6000,8000,1000]','supermarket':'p' }
])
my expected df should look like this in the end:
df = pd.DataFrame([
{'ID': 1,'date': '2022-01-01', 'fruit_code':'[100,99,300]', 'vegetable_code':'[1000,2000,3000]','supermarket':'xy','new_col_1':'all'},
{'ID': 2,'date': '2022-01-01', 'fruit_code':'[67,200,87]', 'vegetable_code':'[5000]','supermarket':'z, m','new_col_1':'[5000]'},
{'ID': 3,'date': '2021-01-01', 'fruit_code':'[100,5,300,78]', 'vegetable_code':'[7000,2000,3000]','supermarket':'wf, z','new_col_1':'all'},
{'ID': 4,'date': '2020-01-01', 'fruit_code':'[77]', 'vegetable_code':'[1000]','supermarket':'wf','new_col_1':'[77]'},
{'ID': 5,'date': '2022-15-01', 'fruit_code':'[100,200,546,33]', 'vegetable_code':'[4000,2000,3000]','supermarket':'t, wf','new_col_1':'all'},
{'ID': 6,'date': '2002-12-01', 'fruit_code':'[64,2]', 'vegetable_code':'[6000,8000,1000]','supermarket':'k', 'new_col_1':'[64]', 'new_col_2':'[2]'},
{'ID': 7,'date': '2018-12-01', 'fruit_code':'[5]', 'vegetable_code':'[6000,8000,1000]','supermarket':'p' ,'new_col_1':'all'}
])
and here are multiple conditions I want to apply on cols fruit_code & vegetable_code to get two new columns:
UPDATE
def fruits_vegetable(row):
if len(str(row['fruit_code'])) == 1: # fruit_code in new_col_1
row['new_col_1'] = row['fruit_code']
elif len(str(row['fruit_code'])) == 1 and len(str(row['vegetable_code'])) > 1: # write "all" in new_col_1
row['new_col_1'] = 'all'
elif len(str(row['fruit_code'])) > 2 and len(str(row['vegetable_code'])) == 1: # vegetable_code in new_col_1
row['new_col_1'] = row['vegetable_code']
elif len(str(row['fruit_code'])) > 3 and len(str(row['vegetable_code'])) > 1: # write "all" in new_col_1
row['new_col_1'] = 'all'
elif len(str(row['fruit_code'])) == 2 and len(str(row['vegetable_code'])) >= 0: # fruit 1 new_col_1 & fruit 2 new_col_2
row['new_col_1'] = row['fruit_code'][0]
row['new_col_2'] = row['fruit_code'][1]
return row
df = df.apply(fruits_vegetable, axis=1)
I'm still stuck, now I get "all" in some of the rows for the first column, but the second does not change.
If someone has some insights, that would be great.
Thanks, much appreciated
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
首先需要将
ast.literal_eval
列表的字符串 repr 转换为列表,然后为了检查长度而删除对字符串的转换。如果需要一个元素列表而不是标量,请在fruit[0]
和fruit[1]
中使用[]
以及条件的最后更改顺序>len(fruit) == 1
,同时更改len(fruit) > 3
到len(fruit) > 2
匹配第一行:First is necessary convert strings repr of lists by
ast.literal_eval
to lists, then for chceck length remove casting to strings. If need one element lists instead scalars use[]
infruit[0]
andfruit[1]
and last change order of condition forlen(fruit) == 1
, also changelen(fruit) > 3
tolen(fruit) > 2
for match first row: