在熊猫DF中创建新列，其中每个行的值都取决于其上方行中不同列的值

发布于 2025-02-14 01:07:55 字数 960 浏览 5 评论 0原文

假设熊猫DF：

# Import dependency.
import pandas as pd

# Create data for df.
data = {'Value': [1000, 1020, 1011, 1010, 1030, 950, 1001, 1100, 1121, 1131],
        'Dummy_Variable': [0,0,1,0,0,0,1,0,1,1]
       }

# Create DataFrame
df = pd.DataFrame(data)
display(df)

我想在称为“占位符”的DF中添加一个新列。占位符的值将基于以下规则基于“ dummy_variable”列：

如果所有以前的行的'dummy_variable'值为0，则该行的“占位符”值等于“值”那排。
如果行的“ dummy_variable”值等于1，则该行的“占位符”值将等于该行的“值”。
如果行的“ dummy_variable”值等于0，但是紧接其上方的行的“占位符”值是＆gt; 0，则行的“占位符”值将等于该行的“占位符”值上方。

所需的结果是一个带有新“占位符”列的DF，看起来像是通过运行以下代码生成的DF：

desired_data = {'Value': [1000, 1020, 1011, 1010, 1030, 950, 1001, 1100, 1121, 1131],
        'Dummy_Variable': [0,0,1,0,0,0,1,0,1,1],
        'Placeholder': [1000,1020,1011,1011,1011,1011,1001,1001,1121,1131]}

df1 = pd.DataFrame(desired_data)
display(df1)

我可以在Excel中轻松执行此操作，但是我不知道在不使用循环的情况下在熊猫中做到这一点。任何帮助将不胜感激。谢谢！

原文

Assume the following Pandas df:

# Import dependency.
import pandas as pd

# Create data for df.
data = {'Value': [1000, 1020, 1011, 1010, 1030, 950, 1001, 1100, 1121, 1131],
        'Dummy_Variable': [0,0,1,0,0,0,1,0,1,1]
       }

# Create DataFrame
df = pd.DataFrame(data)
display(df)

I want to add a new column to the df called 'Placeholder.' The value of Placeholder would be based on the 'Dummy_Variable' column based on the following rules:

If all previous rows had a 'Dummy_Variable' value of 0, then the 'Placeholder' value for that row would be equal to the 'Value' for that row.
If the 'Dummy_Variable' value for a row equals 1, then the 'Placeholder' value for that row would be equal to the 'Value' for that row.
If the 'Dummy_Variable' value for a row equals 0 but the 'Placeholder' value for the row immediately above it is >0, then the 'Placeholder' value for the row would be equal to the 'Placeholder' value for the row immediately above it.

The desired result is a df with new 'Placeholder' column that looks like the df generated by running the code below:

desired_data = {'Value': [1000, 1020, 1011, 1010, 1030, 950, 1001, 1100, 1121, 1131],
        'Dummy_Variable': [0,0,1,0,0,0,1,0,1,1],
        'Placeholder': [1000,1020,1011,1011,1011,1011,1001,1001,1121,1131]}

df1 = pd.DataFrame(desired_data)
display(df1)

I can do this easily in Excel, but I cannot figure out how to do it in Pandas without using a loop. Any help is greatly appreciated. Thanks!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

霞映澄塘 2025-02-21 01:07:55

您可以使用 np.np.where 为此：

import pandas as pd
import numpy as np

data = {'Value': [1000, 1020, 1011, 1010, 1030, 950, 1001, 1100, 1121, 1131],
        'Dummy_Variable': [0,0,1,0,0,0,1,0,1,1]
       }

df = pd.DataFrame(data)

df['Placeholder'] = np.where((df.Dummy_Variable.cumsum() == 0) | (df.Dummy_Variable == 1), df.Value, np.nan)

# now forward fill the remaining NaNs
df['Placeholder'].fillna(method='ffill', inplace=True)

df

   Value  Dummy_Variable  Placeholder
0   1000               0       1000.0
1   1020               0       1020.0
2   1011               1       1011.0
3   1010               0       1011.0
4   1030               0       1011.0
5    950               0       1011.0
6   1001               1       1001.0
7   1100               0       1001.0
8   1121               1       1121.0
9   1131               1       1131.0


# check output:
desired_data = {'Value': [1000, 1020, 1011, 1010, 1030, 950, 1001, 1100, 1121, 1131],
        'Dummy_Variable': [0,0,1,0,0,0,1,0,1,1],
        'Placeholder': [1000,1020,1011,1011,1011,1011,1001,1001,1121,1131]}

df1 = pd.DataFrame(desired_data)

check = df['Placeholder'] == df1['Placeholder']
check.sum()==len(df1)
# True

You can use np.where for this:

import pandas as pd
import numpy as np

data = {'Value': [1000, 1020, 1011, 1010, 1030, 950, 1001, 1100, 1121, 1131],
        'Dummy_Variable': [0,0,1,0,0,0,1,0,1,1]
       }

df = pd.DataFrame(data)

df['Placeholder'] = np.where((df.Dummy_Variable.cumsum() == 0) | (df.Dummy_Variable == 1), df.Value, np.nan)

# now forward fill the remaining NaNs
df['Placeholder'].fillna(method='ffill', inplace=True)

df

   Value  Dummy_Variable  Placeholder
0   1000               0       1000.0
1   1020               0       1020.0
2   1011               1       1011.0
3   1010               0       1011.0
4   1030               0       1011.0
5    950               0       1011.0
6   1001               1       1001.0
7   1100               0       1001.0
8   1121               1       1121.0
9   1131               1       1131.0


# check output:
desired_data = {'Value': [1000, 1020, 1011, 1010, 1030, 950, 1001, 1100, 1121, 1131],
        'Dummy_Variable': [0,0,1,0,0,0,1,0,1,1],
        'Placeholder': [1000,1020,1011,1011,1011,1011,1001,1001,1121,1131]}

df1 = pd.DataFrame(desired_data)

check = df['Placeholder'] == df1['Placeholder']
check.sum()==len(df1)
# True

回复收藏 0 原文

~没有更多了~