当前位置：文江博客话题详情

计数直到 Pandas 达到条件

发布于 2025-01-19 05:23:28 字数 3400 浏览 3 评论 0 原文

我需要你的一些意见。我的想法是，我想看看需要多长时间（以行为单位）才能在

列 SUB_B1 中看到新值，并
在 SUB_B2 中看到新值

，即，有多少步

SUB_A1 和 SUB B1 之间以及
SUB A2 和 SUB B2 之间

我已经构建了数据是这样的：（我按结果列对索引进行降序排序之后，我将索引 B 和 A 分开并将它们放置在新列中）

df.sort_values(['A','result'], ascending=[True,False]).set_index(['A','B'])

		结果	SUB_A1	SUB_A2	SUB_B1	SUB_B2
A	B
10_125	10_173	0.903257	10	125	10	173
	10_332	0.847333	10	125	10	332
	10_243	0.842802	10	125	10	243
	10_522	0.836335	10	125	10	522
	58_941	0.810760	10	125	58	941
	...	...	...	...	...	...
10_173	10_125	0.903257	10	173	10	125
	58_941	0.847333	10	173	58	941
	1_941	0.842802	10	173	1	941
	96_512	0.836335	10	173	96	512
	10_513	0.810760	10	173	10	513

这是我到目前为止所做的：（编辑：我认为我需要迭代values[] 但是，我还没有设法循环到第一行之外...）


def func(group):
        if group.SUB_A1.values[0] == group.SUB_B1.values[0]:
            group.R1.values[0] = 1
        else:
            group.R1.values[0] = 0
        if group.SUB_A1.values[0] == group.SUB_B1.values[1] and group.R1.values[0] == 1:
            group.R1.values[1] = 2
        else:
            group.R1.values[1] = 0 

df['R1'] = 0
df.groupby('A').apply(func)

预期结果：

		结果	SUB_B1	SUB_B2	R1	R2
A	B
10_125	10_173	0.903257	10	173	1	0
	10_332	0.847333	10	332	2	0
	10_243	0.842802	10	243	3	0
	10_522	0.836335	10	522	4	0
	58_941	0.810760	58	941	0	0
	...	...	...	...	...	...

原文

I need some input from you. The idea is that I would like to see how long (in rows) it takes before you can see

a new value in column SUB_B1, and
a new value in SUB_B2

i.e, how many steps is there between

SUB_A1 and SUB B1, and
between SUB A2 and SUB B2

I have structured the data something like this: (I sort the index in descending order by the results column. After that I separate index B and A and place them in new columns)

df.sort_values(['A','result'], ascending=[True,False]).set_index(['A','B'])

		result	SUB_A1	SUB_A2	SUB_B1	SUB_B2
A	B
10_125	10_173	0.903257	10	125	10	173
	10_332	0.847333	10	125	10	332
	10_243	0.842802	10	125	10	243
	10_522	0.836335	10	125	10	522
	58_941	0.810760	10	125	58	941
	...	...	...	...	...	...
10_173	10_125	0.903257	10	173	10	125
	58_941	0.847333	10	173	58	941
	1_941	0.842802	10	173	1	941
	96_512	0.836335	10	173	96	512
	10_513	0.810760	10	173	10	513

This is what I have done so far: (edit: I think I need to iterate over values[] However, I havent manage to loop beyond the first rows yet...)


def func(group):
        if group.SUB_A1.values[0] == group.SUB_B1.values[0]:
            group.R1.values[0] = 1
        else:
            group.R1.values[0] = 0
        if group.SUB_A1.values[0] == group.SUB_B1.values[1] and group.R1.values[0] == 1:
            group.R1.values[1] = 2
        else:
            group.R1.values[1] = 0 

df['R1'] = 0
df.groupby('A').apply(func)

Expected outcome:

		result	SUB_B1	SUB_B2	R1	R2
A	B
10_125	10_173	0.903257	10	173	1	0
	10_332	0.847333	10	332	2	0
	10_243	0.842802	10	243	3	0
	10_522	0.836335	10	522	4	0
	58_941	0.810760	58	941	0	0
	...	...	...	...	...	...

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

掌心的温暖 2025-01-26 05:23:28

您是否正在寻找这样的东西：

示例数据框：

df = pd.DataFrame(
    {"SUB_A": [1, -1, -2, 3, 3, 4, 3, 6, 6, 6],
     "SUB_B": [1, 2, 3, 3, 3, 3, 4, 6, 6, 6]},
    index=pd.MultiIndex.from_product([range(1, 3), range(1, 6)], names=("A", "B"))
)

     SUB_A  SUB_B
A B              
1 1      1      1
  2     -1      2
  3     -2      3
  4      3      3
  5      3      3
2 1      4      3
  2      3      4
  3      6      6
  4      6      6
  5      6      6

现在这

equal = df.SUB_A == df.SUB_B
df["R"] = equal.groupby(equal.groupby("A").diff().fillna(True).cumsum()).cumsum()

导致

     SUB_A  SUB_B  R
A B                 
1 1      1      1  1
  2     -1      2  0
  3     -2      3  0
  4      3      3  1
  5      3      3  2
2 1      4      3  0
  2      3      4  0
  3      6      6  1
  4      6      6  2
  5      6      6  3

Are you looking for something like this:

Sample dataframe:

df = pd.DataFrame(
    {"SUB_A": [1, -1, -2, 3, 3, 4, 3, 6, 6, 6],
     "SUB_B": [1, 2, 3, 3, 3, 3, 4, 6, 6, 6]},
    index=pd.MultiIndex.from_product([range(1, 3), range(1, 6)], names=("A", "B"))
)

     SUB_A  SUB_B
A B              
1 1      1      1
  2     -1      2
  3     -2      3
  4      3      3
  5      3      3
2 1      4      3
  2      3      4
  3      6      6
  4      6      6
  5      6      6

Now this

equal = df.SUB_A == df.SUB_B
df["R"] = equal.groupby(equal.groupby("A").diff().fillna(True).cumsum()).cumsum()

leads to

     SUB_A  SUB_B  R
A B                 
1 1      1      1  1
  2     -1      2  0
  3     -2      3  0
  4      3      3  1
  5      3      3  2
2 1      4      3  0
  2      3      4  0
  3      6      6  1
  4      6      6  2
  5      6      6  3

回复收藏 0 原文

栖迟 2025-01-26 05:23:28

尝试使用 pandas.dataframe.iterrows.iterrows.iterrow href =“ https://pandas.pydata.org/docs/reference/api/pandas.dataframe.shift.html” rel =“ nofollow noreferrer”> pandas.dataframe.shift.shift 。

You can iterate through the dataframe and compare current row with the previous one, then add some condition:

df['SUB_A2_last'] = df['SUB_A2'].shift()
count = 0
#Fill column with zeros
df['count_series'] = 0
for index, row in df.iterrows():
    subA = row['sub_A2']
    subA_last = row['sub_A2_last']
    if subA == subA_last:
        count += 1
    else:
        count = 0
    df.loc[index, 'count_series'] = count

Then repeat for B column.可以使用和一个自定义功能。

Try using pandas.DataFrame.iterrows and pandas.DataFrame.shift.

You can iterate through the dataframe and compare current row with the previous one, then add some condition:

df['SUB_A2_last'] = df['SUB_A2'].shift()
count = 0
#Fill column with zeros
df['count_series'] = 0
for index, row in df.iterrows():
    subA = row['sub_A2']
    subA_last = row['sub_A2_last']
    if subA == subA_last:
        count += 1
    else:
        count = 0
    df.loc[index, 'count_series'] = count

Then repeat for B column. It is possible to get a better aproach using pandas.DataFrame.apply and a custom function.

回复收藏 0 原文

水中月 2025-01-26 05:23:28

噗！极好的！感谢你们的投入


def func(group):
    for each in range(len(group)):
        if group.SUB_A1.values[0] == group.SUB_B1.values[each]:
            group.R1.values[each] = each + 1
            continue
        elif group.SUB_A1.values[0] == group.SUB_B1.values[each] and group.R1.values[each] == each + 1:
            group.R1.values[each] = each + 1
        else:
            group.R1.values[each] = 0
        return group

df['R1'] = 0
df.groupby('A').apply(func)

Puh! Super! Thanks for the input you guys


def func(group):
    for each in range(len(group)):
        if group.SUB_A1.values[0] == group.SUB_B1.values[each]:
            group.R1.values[each] = each + 1
            continue
        elif group.SUB_A1.values[0] == group.SUB_B1.values[each] and group.R1.values[each] == each + 1:
            group.R1.values[each] = each + 1
        else:
            group.R1.values[each] = 0
        return group

df['R1'] = 0
df.groupby('A').apply(func)

回复收藏 0 原文

~没有更多了~