使用 pandas 获取每行一个特定值的最大出现次数

发布于 2025-01-14 19:53:59 字数 923 浏览 4 评论 0原文

我有以下数据框:

   1   2   3   4   5   6   7  8  9
0  0   0   1   0   0   0   0  0  1
1  0   0   0   0   1   1   0  1  0
2  1   1   0   1   1   0   0  1  1
...

我想为每一行获取该行中值 0 的最长序列。 因此,该数据帧的预期结果将是一个如下所示的数组:

[5,4,2,...]

如第一行所示,最大序列 eof 值 0 是 5,等等。

我看过这篇帖子并尝试开始这是第一行(尽管我想立即对整个数据框执行此操作),但我收到错误:

s=df_day.iloc[0]
(~s).cumsum()[s].value_counts().max()

类型错误:输入类型不支持 ufunc 'invert',并且 根据以下规定,输入无法安全地强制为任何支持的类型 铸造规则“安全”

当我手动插入这样的值时,

s=pd.Series([0,0,1,0,0,0,0,0,1])
(~s).cumsum()[s].value_counts().max()

>>>7

:我得到 7,这是行中总 0 的数量,但不是最大序列。 但是,我不明白为什么它一开始会引发错误,而且更重要的是,我想在 while 数据帧和每行的末尾运行它。

我的最终目标:连续最大程度地连续出现 0 值。

I have the following dataframe:

   1   2   3   4   5   6   7  8  9
0  0   0   1   0   0   0   0  0  1
1  0   0   0   0   1   1   0  1  0
2  1   1   0   1   1   0   0  1  1
...

I want to get for each row the longest sequence of value 0 in the row.
so, the expected results for this dataframe will be an array that looks like this:

[5,4,2,...]

as on the first row, maximum sequenc eof value 0 is 5, ect.

I have seen this post and tried for the beginning to get this for the first row (though I would like to do this at once for the whole dataframe) but I got errors:

s=df_day.iloc[0]
(~s).cumsum()[s].value_counts().max()

TypeError: ufunc 'invert' not supported for the input types, and the
inputs could not be safely coerced to any supported types according to
the casting rule ''safe''

when I inserted manually the values like this:

s=pd.Series([0,0,1,0,0,0,0,0,1])
(~s).cumsum()[s].value_counts().max()

>>>7

I got 7 which is number of total 0 in the row but not the max sequence.
However, I don't understand why it raises the error at first, and , more important, I would like to run it on the end on the while dataframe and per row.

My end goal: get the maximum uninterrupted occurance of value 0 in a row.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

吻风 2025-01-21 19:53:59

用于对每行连续 0 进行计数的矢量化解决方案,因此为了最大程度地使用 DataFrame cmax

#more explain https://stackoverflow.com/a/52718619/2901002
m = df.eq(0)
b = m.cumsum(axis=1)
c = b.sub(b.mask(m).ffill(axis=1).fillna(0)).astype(int)
print (c)
   1  2  3  4  5  6  7  8  9
0  1  2  0  1  2  3  4  5  0
1  1  2  3  4  0  0  1  0  1
2  0  0  1  0  0  1  2  0  0

df['max_consecutive_0'] = c.max(axis=1)
print (df)
   1  2  3  4  5  6  7  8  9  max_consecutive_0
0  0  0  1  0  0  0  0  0  1                  5
1  0  0  0  0  1  1  0  1  0                  4
2  1  1  0  1  1  0  0  1  1                  2

Vectorized solution for counts consecutive 0 per rows, so for maximal use max of DataFrame c:

#more explain https://stackoverflow.com/a/52718619/2901002
m = df.eq(0)
b = m.cumsum(axis=1)
c = b.sub(b.mask(m).ffill(axis=1).fillna(0)).astype(int)
print (c)
   1  2  3  4  5  6  7  8  9
0  1  2  0  1  2  3  4  5  0
1  1  2  3  4  0  0  1  0  1
2  0  0  1  0  0  1  2  0  0

df['max_consecutive_0'] = c.max(axis=1)
print (df)
   1  2  3  4  5  6  7  8  9  max_consecutive_0
0  0  0  1  0  0  0  0  0  1                  5
1  0  0  0  0  1  1  0  1  0                  4
2  1  1  0  1  1  0  0  1  1                  2
心如狂蝶 2025-01-21 19:53:59

使用:

df = df.T.apply(lambda x: (x != x.shift()).astype(int).cumsum().where(x.eq(0)).dropna().value_counts().max())

输出

0    5
1    4
2    2

Use:

df = df.T.apply(lambda x: (x != x.shift()).astype(int).cumsum().where(x.eq(0)).dropna().value_counts().max())

OUTPUT

0    5
1    4
2    2
浊酒尽余欢 2025-01-21 19:53:59

下面的代码应该可以完成这项工作。

函数 longest_streak 将计算连续零的数量并返回最大值,您可以在 df 上使用 apply

from itertools import groupby
    def longest_streak(l):
      lst = []
      for n,c in groupby(l):
        num,count = n,sum(1 for i in c)
        if num==0:
          lst.append((num,count))

  maxx = max([y for x,y in lst])
  return(maxx)

df.apply(lambda x: longest_streak(x),axis=1)

The following code should do the job.

the function longest_streak will count the number of consecutive zeros and return the max, and you can use apply on your df.

from itertools import groupby
    def longest_streak(l):
      lst = []
      for n,c in groupby(l):
        num,count = n,sum(1 for i in c)
        if num==0:
          lst.append((num,count))

  maxx = max([y for x,y in lst])
  return(maxx)

df.apply(lambda x: longest_streak(x),axis=1)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文