计数直到 Pandas 达到条件

发布于 2025-01-19 05:23:28 字数 3400 浏览 3 评论 0 原文

我需要你的一些意见。我的想法是,我想看看需要多长时间(以行为单位)才能在

  1. SUB_B1 中看到新值,并
  2. SUB_B2 中看到新值

,即, 有多少步

  1. SUB_A1SUB B1 之间以及
  2. SUB A2SUB B2 之间

我已经构建了数据是这样的:(我按结果列对索引进行降序排序之后,我将索引 B 和 A 分开并将它们放置在新列中

df.sort_values(['A','result'], ascending=[True,False]).set_index(['A','B'])
结果 SUB_A1 SUB_A2 SUB_B1 SUB_B2
A B
10_125 10_173 0.903257 10 125 10 173
10_332 0.847333 10 125 10 332
10_243 0.842802 10 125 10 243
10_522 0.836335 10 125 10 522
58_941 0.810760 10 125 58 941
... ... ... ... ... ...
10_173 10_125 0.903257 10 173 10 125
58_941 0.847333 10 173 58 941
1_941 0.842802 10 173 1 941
96_512 0.836335 10 173 96 512
10_513 0.810760 10 173 10 513

这是我到目前为止所做的:(编辑:我认为我需要迭代values[] 但是,我还没有设法循环到第一行之外...)


def func(group):
        if group.SUB_A1.values[0] == group.SUB_B1.values[0]:
            group.R1.values[0] = 1
        else:
            group.R1.values[0] = 0
        if group.SUB_A1.values[0] == group.SUB_B1.values[1] and group.R1.values[0] == 1:
            group.R1.values[1] = 2
        else:
            group.R1.values[1] = 0 

df['R1'] = 0
df.groupby('A').apply(func)

预期结果:

结果 SUB_B1 SUB_B2 R1 R2
A B
10_125 10_173 0.903257 10 173 1 0
10_332 0.847333 10 332 2 0
10_243 0.842802 10 243 3 0
10_522 0.836335 10 522 4 0
58_941 0.810760 58 941 0 0
... ... ... ... ... ...

I need some input from you. The idea is that I would like to see how long (in rows) it takes before you can see

  1. a new value in column SUB_B1, and
  2. a new value in SUB_B2

i.e, how many steps is there between

  1. SUB_A1 and SUB B1, and
  2. between SUB A2 and SUB B2

I have structured the data something like this: (I sort the index in descending order by the results column. After that I separate index B and A and place them in new columns)

df.sort_values(['A','result'], ascending=[True,False]).set_index(['A','B'])
result SUB_A1 SUB_A2 SUB_B1 SUB_B2
A B
10_125 10_173 0.903257 10 125 10 173
10_332 0.847333 10 125 10 332
10_243 0.842802 10 125 10 243
10_522 0.836335 10 125 10 522
58_941 0.810760 10 125 58 941
... ... ... ... ... ...
10_173 10_125 0.903257 10 173 10 125
58_941 0.847333 10 173 58 941
1_941 0.842802 10 173 1 941
96_512 0.836335 10 173 96 512
10_513 0.810760 10 173 10 513

This is what I have done so far: (edit: I think I need to iterate over values[] However, I havent manage to loop beyond the first rows yet...)


def func(group):
        if group.SUB_A1.values[0] == group.SUB_B1.values[0]:
            group.R1.values[0] = 1
        else:
            group.R1.values[0] = 0
        if group.SUB_A1.values[0] == group.SUB_B1.values[1] and group.R1.values[0] == 1:
            group.R1.values[1] = 2
        else:
            group.R1.values[1] = 0 

df['R1'] = 0
df.groupby('A').apply(func)

Expected outcome:

result SUB_B1 SUB_B2 R1 R2
A B
10_125 10_173 0.903257 10 173 1 0
10_332 0.847333 10 332 2 0
10_243 0.842802 10 243 3
10_522 0.836335 10 522 4 0
58_941 0.810760 58 941 0 0
... ... ... ... ... ...

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

掌心的温暖 2025-01-26 05:23:28

您是否正在寻找这样的东西:

示例数据框:

df = pd.DataFrame(
    {"SUB_A": [1, -1, -2, 3, 3, 4, 3, 6, 6, 6],
     "SUB_B": [1, 2, 3, 3, 3, 3, 4, 6, 6, 6]},
    index=pd.MultiIndex.from_product([range(1, 3), range(1, 6)], names=("A", "B"))
)
     SUB_A  SUB_B
A B              
1 1      1      1
  2     -1      2
  3     -2      3
  4      3      3
  5      3      3
2 1      4      3
  2      3      4
  3      6      6
  4      6      6
  5      6      6

现在这

equal = df.SUB_A == df.SUB_B
df["R"] = equal.groupby(equal.groupby("A").diff().fillna(True).cumsum()).cumsum()

导致

     SUB_A  SUB_B  R
A B                 
1 1      1      1  1
  2     -1      2  0
  3     -2      3  0
  4      3      3  1
  5      3      3  2
2 1      4      3  0
  2      3      4  0
  3      6      6  1
  4      6      6  2
  5      6      6  3

Are you looking for something like this:

Sample dataframe:

df = pd.DataFrame(
    {"SUB_A": [1, -1, -2, 3, 3, 4, 3, 6, 6, 6],
     "SUB_B": [1, 2, 3, 3, 3, 3, 4, 6, 6, 6]},
    index=pd.MultiIndex.from_product([range(1, 3), range(1, 6)], names=("A", "B"))
)
     SUB_A  SUB_B
A B              
1 1      1      1
  2     -1      2
  3     -2      3
  4      3      3
  5      3      3
2 1      4      3
  2      3      4
  3      6      6
  4      6      6
  5      6      6

Now this

equal = df.SUB_A == df.SUB_B
df["R"] = equal.groupby(equal.groupby("A").diff().fillna(True).cumsum()).cumsum()

leads to

     SUB_A  SUB_B  R
A B                 
1 1      1      1  1
  2     -1      2  0
  3     -2      3  0
  4      3      3  1
  5      3      3  2
2 1      4      3  0
  2      3      4  0
  3      6      6  1
  4      6      6  2
  5      6      6  3
栖迟 2025-01-26 05:23:28

尝试使用 pandas.dataframe.iterrows.iterrows.iterrow href =“ https://pandas.pydata.org/docs/reference/api/pandas.dataframe.shift.html” rel =“ nofollow noreferrer”> pandas.dataframe.shift.shift

You can iterate through the dataframe and compare current row with the previous one, then add some condition:

df['SUB_A2_last'] = df['SUB_A2'].shift()
count = 0
#Fill column with zeros
df['count_series'] = 0
for index, row in df.iterrows():
    subA = row['sub_A2']
    subA_last = row['sub_A2_last']
    if subA == subA_last:
        count += 1
    else:
        count = 0
    df.loc[index, 'count_series'] = count

Then repeat for B column.可以使用和一个自定义功能。

Try using pandas.DataFrame.iterrows and pandas.DataFrame.shift.

You can iterate through the dataframe and compare current row with the previous one, then add some condition:

df['SUB_A2_last'] = df['SUB_A2'].shift()
count = 0
#Fill column with zeros
df['count_series'] = 0
for index, row in df.iterrows():
    subA = row['sub_A2']
    subA_last = row['sub_A2_last']
    if subA == subA_last:
        count += 1
    else:
        count = 0
    df.loc[index, 'count_series'] = count

Then repeat for B column. It is possible to get a better aproach using pandas.DataFrame.apply and a custom function.

水中月 2025-01-26 05:23:28

噗!极好的!感谢你们的投入


def func(group):
    for each in range(len(group)):
        if group.SUB_A1.values[0] == group.SUB_B1.values[each]:
            group.R1.values[each] = each + 1
            continue
        elif group.SUB_A1.values[0] == group.SUB_B1.values[each] and group.R1.values[each] == each + 1:
            group.R1.values[each] = each + 1
        else:
            group.R1.values[each] = 0
        return group

df['R1'] = 0
df.groupby('A').apply(func)

Puh! Super! Thanks for the input you guys


def func(group):
    for each in range(len(group)):
        if group.SUB_A1.values[0] == group.SUB_B1.values[each]:
            group.R1.values[each] = each + 1
            continue
        elif group.SUB_A1.values[0] == group.SUB_B1.values[each] and group.R1.values[each] == each + 1:
            group.R1.values[each] = each + 1
        else:
            group.R1.values[each] = 0
        return group

df['R1'] = 0
df.groupby('A').apply(func)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文