熊猫中是否有分组

发布于 2025-01-19 08:31:13 字数 1236 浏览 1 评论 0原文

我在我的数据框架上遇到了一些麻烦。我有以下DF。我正在尝试分组，一排被“ - ”和另一行分开。我遇到的问题是，我需要连续数量一定数量（至少4个）。

   a      b  c
0  a  Num_1  0
1  a  Num_1  1
2  a  Num_1  2
3  a  Num_2  5
4  a  Num_2  6
5  a  Num_2  7
6  a  Num_2  8
7  a  Num_2  9

我制作了以下代码：

def split_by_threshold(li):
    inds = [0]+[ind for ind,(i,j) in enumerate(zip(li,li[1:]),1) if j-i != 1]+[len(li)+1]
    rez = [li[i:j] for i,j in zip(inds,inds[1:])]
    return rez

def dropst(serie):
    serie = serie.to_numpy().tolist()
    serie = list(dict.fromkeys(serie))
    return '\n'.join(serie)

def joining_(series):
    series = series.to_numpy().tolist()
    if series:
        split_li = split_by_threshold(series)
        a=[]
        for x in split_li:
            if x[-1]-x[0]:
                a.append(str(x[0])+'-'+str(x[-1]))
        return '\n'.join(a)
    else:
        return 'None'

col_1, col_2, col_3 = d.columns
final = d.groupby([col_1], as_index = False).agg(
    {   col_1: 'first',
        col_2: dropst,
        col_3: joining_}
)

print(final)

我收到的答案是：

   a             b         c
0  a  Num_1\nNum_2  0-2\n5-9

我有点需要：

   a   b      c
0  a   Num_2  5-9

原文

i having some trouble with the Dataframe of mine. I have the following DF below. I am trying to group by , one row separated by "-" and other just simply \n. The problem that i have is that i need to has a certain amount of numbers in a row (minimum 4).

   a      b  c
0  a  Num_1  0
1  a  Num_1  1
2  a  Num_1  2
3  a  Num_2  5
4  a  Num_2  6
5  a  Num_2  7
6  a  Num_2  8
7  a  Num_2  9

And i made the following code:

def split_by_threshold(li):
    inds = [0]+[ind for ind,(i,j) in enumerate(zip(li,li[1:]),1) if j-i != 1]+[len(li)+1]
    rez = [li[i:j] for i,j in zip(inds,inds[1:])]
    return rez

def dropst(serie):
    serie = serie.to_numpy().tolist()
    serie = list(dict.fromkeys(serie))
    return '\n'.join(serie)

def joining_(series):
    series = series.to_numpy().tolist()
    if series:
        split_li = split_by_threshold(series)
        a=[]
        for x in split_li:
            if x[-1]-x[0]:
                a.append(str(x[0])+'-'+str(x[-1]))
        return '\n'.join(a)
    else:
        return 'None'

col_1, col_2, col_3 = d.columns
final = d.groupby([col_1], as_index = False).agg(
    {   col_1: 'first',
        col_2: dropst,
        col_3: joining_}
)

print(final)

The answer i receive is :

   a             b         c
0  a  Num_1\nNum_2  0-2\n5-9

and i kinda need to be:

   a   b      c
0  a   Num_2  5-9

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

新人笑 2025-01-26 08:31:13

iiuc，您可以groupby a，b，最终是一个新组来识别连续值。然后agg带有自定义功能：

def join(s, thresh=4):
    MIN = s.min()
    MAX = s.max()
    return f'{MIN}-{MAX}' if MAX-MIN >= thresh else float('nan')

consecutive = df['c'].diff().ne(1).cumsum()
# could also be
# df.groupby(['a','b'])['c'].diff().ne(1).cumsum()
# but not required as we anyway group by those later

(df
 .groupby(['a', 'b', consecutive], as_index=False)
 ['c']
 .agg(join, thresh=4)
 .dropna(subset='c')
 )

输出：

   a      b    c
2  a  Num_2  5-9

IIUC, you cangroupby a, b, and eventually a new group to identify consecutive values. Then agg with a custom function:

def join(s, thresh=4):
    MIN = s.min()
    MAX = s.max()
    return f'{MIN}-{MAX}' if MAX-MIN >= thresh else float('nan')

consecutive = df['c'].diff().ne(1).cumsum()
# could also be
# df.groupby(['a','b'])['c'].diff().ne(1).cumsum()
# but not required as we anyway group by those later

(df
 .groupby(['a', 'b', consecutive], as_index=False)
 ['c']
 .agg(join, thresh=4)
 .dropna(subset='c')
 )

Output: