在熊猫DF中创建新列,其中每个行的值都取决于其上方行中不同列的值
假设熊猫DF:
# Import dependency.
import pandas as pd
# Create data for df.
data = {'Value': [1000, 1020, 1011, 1010, 1030, 950, 1001, 1100, 1121, 1131],
'Dummy_Variable': [0,0,1,0,0,0,1,0,1,1]
}
# Create DataFrame
df = pd.DataFrame(data)
display(df)
我想在称为“占位符”的DF中添加一个新列。占位符的值将基于以下规则基于“ dummy_variable”列:
- 如果所有以前的行的'dummy_variable'值为0,则该行的“占位符”值等于“值”那排。
- 如果行的“ dummy_variable”值等于1,则该行的“占位符”值将等于该行的“值”。
- 如果行的“ dummy_variable”值等于0,但是紧接其上方的行的“占位符”值是> 0,则行的“占位符”值将等于该行的“占位符”值上方。
所需的结果是一个带有新“占位符”列的DF,看起来像是通过运行以下代码生成的DF:
desired_data = {'Value': [1000, 1020, 1011, 1010, 1030, 950, 1001, 1100, 1121, 1131],
'Dummy_Variable': [0,0,1,0,0,0,1,0,1,1],
'Placeholder': [1000,1020,1011,1011,1011,1011,1001,1001,1121,1131]}
df1 = pd.DataFrame(desired_data)
display(df1)
我可以在Excel中轻松执行此操作,但是我不知道在不使用循环的情况下在熊猫中做到这一点。任何帮助将不胜感激。谢谢!
Assume the following Pandas df:
# Import dependency.
import pandas as pd
# Create data for df.
data = {'Value': [1000, 1020, 1011, 1010, 1030, 950, 1001, 1100, 1121, 1131],
'Dummy_Variable': [0,0,1,0,0,0,1,0,1,1]
}
# Create DataFrame
df = pd.DataFrame(data)
display(df)
I want to add a new column to the df called 'Placeholder.' The value of Placeholder would be based on the 'Dummy_Variable' column based on the following rules:
- If all previous rows had a 'Dummy_Variable' value of 0, then the 'Placeholder' value for that row would be equal to the 'Value' for that row.
- If the 'Dummy_Variable' value for a row equals 1, then the 'Placeholder' value for that row would be equal to the 'Value' for that row.
- If the 'Dummy_Variable' value for a row equals 0 but the 'Placeholder' value for the row immediately above it is >0, then the 'Placeholder' value for the row would be equal to the 'Placeholder' value for the row immediately above it.
The desired result is a df with new 'Placeholder' column that looks like the df generated by running the code below:
desired_data = {'Value': [1000, 1020, 1011, 1010, 1030, 950, 1001, 1100, 1121, 1131],
'Dummy_Variable': [0,0,1,0,0,0,1,0,1,1],
'Placeholder': [1000,1020,1011,1011,1011,1011,1001,1001,1121,1131]}
df1 = pd.DataFrame(desired_data)
display(df1)
I can do this easily in Excel, but I cannot figure out how to do it in Pandas without using a loop. Any help is greatly appreciated. Thanks!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您可以使用 np.np.where 为此:
You can use np.where for this: