我如何计算标志列?

发布于 2025-01-23 21:32:23 字数 459 浏览 3 评论 0原文

我如何计算每行重复的正或负元素的数量?

假设我有以下数据:

ski      2020    2021      2022     2023       2024      2025
book      1.2     5.6       8.4      -2         -5         6
jar       4.2      -5        -8      2          4           6
kook       -4      -5.2      -2.3    -5.6        -7        8

输出是每行的列表,它计算相似符号的数量。例如,在第一行中,我们有3个正元素,然后有2个负面元素,而1个正元素。因此输出为[3,-2,1]。对于其他2行,输出如下:

 jar   [1,-2,3]
 kook   [-5,1]

How I can count the number of repetitive positive or negative elements in each row?

Suppose I have the following data:

ski      2020    2021      2022     2023       2024      2025
book      1.2     5.6       8.4      -2         -5         6
jar       4.2      -5        -8      2          4           6
kook       -4      -5.2      -2.3    -5.6        -7        8

The output is a list for each row that counts the number of similar signs. For example in the first row we have 3 positive elements and then 2 negative and again one positive. So the output is [3,-2,1]. and for 2 other rows the output is as follows:

 jar   [1,-2,3]
 kook   [-5,1]

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

多彩岁月 2025-01-30 21:32:23

使用 使用自定义lambda function 对于计数串联值:

#if necessary
df = df.set_index('ski')
print (df)
      2020  2021  2022  2023  2024  2025
ski                                     
book   1.2   5.6   8.4  -2.0    -5     6
jar    4.2  -5.0  -8.0   2.0     4     6
kook  -4.0  -5.2  -2.3  -5.6    -7     8


from  itertools import groupby

f = lambda x: [ int(sum(key for _ in group)) for key, group in groupby( x )]
s = df.clip(upper=1, lower=-1).apply(f, 1)
print (s)
ski
book    [3, -2, 1]
jar     [1, -2, 3]
kook       [-5, 1]
dtype: object

Use DataFrame.clip with custom lambda function for count consecutive values:

#if necessary
df = df.set_index('ski')
print (df)
      2020  2021  2022  2023  2024  2025
ski                                     
book   1.2   5.6   8.4  -2.0    -5     6
jar    4.2  -5.0  -8.0   2.0     4     6
kook  -4.0  -5.2  -2.3  -5.6    -7     8


from  itertools import groupby

f = lambda x: [ int(sum(key for _ in group)) for key, group in groupby( x )]
s = df.clip(upper=1, lower=-1).apply(f, 1)
print (s)
ski
book    [3, -2, 1]
jar     [1, -2, 3]
kook       [-5, 1]
dtype: object
糖粟与秋泊 2025-01-30 21:32:23

让我们尝试:

s = np.sign(df.set_index('ski').stack())
s.groupby([pd.Grouper(level=0), s.diff().ne(0).cumsum()]).sum().groupby(level=0).agg(list)

ski
book    [3.0, -2.0, 1.0]
jar     [1.0, -2.0, 3.0]
kook         [-5.0, 1.0]
dtype: object

Let us try:

s = np.sign(df.set_index('ski').stack())
s.groupby([pd.Grouper(level=0), s.diff().ne(0).cumsum()]).sum().groupby(level=0).agg(list)

ski
book    [3.0, -2.0, 1.0]
jar     [1.0, -2.0, 3.0]
kook         [-5.0, 1.0]
dtype: object
╰沐子 2025-01-30 21:32:23

使用:

import pandas as pd
import numpy as np
data = '''ski      2020    2021      2022     2023       2024      2025
book      1.2     5.6       8.4      -2         -5         6
jar       4.2      -5        -8      2          4           6
kook       -4      -5.2      -2.3    -5.6        -7        8'''
data = np.array([x.split() for x in data.split('\n')])

import seaborn as sns

df = pd.DataFrame(data[1:,1:], columns = data[0,1:], index = data[1:,0])


output = []
import math
for i, row in df.iterrows():
    out = []
    c=0
    prev = math.copysign(1,float(row[0]))
    temp = row.append(pd.Series(-math.copysign(1,float(row[-1]))))
    for cell in temp:
        
        currrent_sign = math.copysign(1,float(cell))
        #print(prev, currrent_sign, c)
        if currrent_sign==prev:
            c+=currrent_sign
        else:
            prev = currrent_sign
            out.append(c)
            c=currrent_sign
    output.append(out)

输出:

[[3.0, -2.0, 1.0], [1.0, -2.0, 3.0], [-5.0, 1.0]]

Use:

import pandas as pd
import numpy as np
data = '''ski      2020    2021      2022     2023       2024      2025
book      1.2     5.6       8.4      -2         -5         6
jar       4.2      -5        -8      2          4           6
kook       -4      -5.2      -2.3    -5.6        -7        8'''
data = np.array([x.split() for x in data.split('\n')])

import seaborn as sns

df = pd.DataFrame(data[1:,1:], columns = data[0,1:], index = data[1:,0])


output = []
import math
for i, row in df.iterrows():
    out = []
    c=0
    prev = math.copysign(1,float(row[0]))
    temp = row.append(pd.Series(-math.copysign(1,float(row[-1]))))
    for cell in temp:
        
        currrent_sign = math.copysign(1,float(cell))
        #print(prev, currrent_sign, c)
        if currrent_sign==prev:
            c+=currrent_sign
        else:
            prev = currrent_sign
            out.append(c)
            c=currrent_sign
    output.append(out)

Output:

[[3.0, -2.0, 1.0], [1.0, -2.0, 3.0], [-5.0, 1.0]]
痞味浪人 2025-01-30 21:32:23

我认为 ski 是索引列。如果不是,请将其设置为索引,将当前的索引设置为索引。

从定义一个函数开始,要应用于每一行:

def myCounts(row):
    sgn = row.ge(0)
    return sgn.groupby(sgn.ne(sgn.shift()).cumsum()).apply(
        lambda grp: grp.count() * (1 if grp.iloc[0] else -1)).tolist()

然后应用它:

result = df.apply(myCounts, axis=1)

对于您的源数据,我得到:

ski
book    [3, -2, 1]
jar     [1, -2, 3]
kook       [-5, 1]
dtype: object

我的解决方案比另一个要短得多。

I assume that ski is the index column. If not, set it as the index, dropping the current one.

Start from defining a function, to be applied to each row:

def myCounts(row):
    sgn = row.ge(0)
    return sgn.groupby(sgn.ne(sgn.shift()).cumsum()).apply(
        lambda grp: grp.count() * (1 if grp.iloc[0] else -1)).tolist()

Then apply it:

result = df.apply(myCounts, axis=1)

For your source data, I got:

ski
book    [3, -2, 1]
jar     [1, -2, 3]
kook       [-5, 1]
dtype: object

My solution is significantly shorter than the other.

余厌 2025-01-30 21:32:23
df1.set_index('ski').stack().reset_index()\
    .assign(col1=lambda dd:np.sign(dd.iloc[:,2]))\
    .assign(col2=lambda dd2:(dd2.col1.diff()!=0).cumsum())\
    .groupby(['ski','col2'],as_index=False).col1.sum()\
    .groupby(['ski'],as_index=False).col1.agg(list).pipe(print)

  ski              col1
0  book  [3.0, -2.0, 1.0]
1   jar  [1.0, -2.0, 3.0]
2  kook       [-5.0, 1.0]
df1.set_index('ski').stack().reset_index()\
    .assign(col1=lambda dd:np.sign(dd.iloc[:,2]))\
    .assign(col2=lambda dd2:(dd2.col1.diff()!=0).cumsum())\
    .groupby(['ski','col2'],as_index=False).col1.sum()\
    .groupby(['ski'],as_index=False).col1.agg(list).pipe(print)

  ski              col1
0  book  [3.0, -2.0, 1.0]
1   jar  [1.0, -2.0, 3.0]
2  kook       [-5.0, 1.0]
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文