根据指定的属性分组成组

发布于 2025-02-09 21:25:08 字数 530 浏览 1 评论 0原文

我需要以这样的方式对数据进行分组,如果A1列的相邻值之间的差等于相同的预指定值,则它们属于同一组。如果两个相邻元素之间的值不同,则所有后续数据都属于另一组。例如,

import pandas as pd
import numpy as np

data = [
    [5, 2],
    [100, 23],
    [101, -2],
    [303, 9],
    [304, 4],
    [709, 14],
    [710, 3],
    [711, 3],
    [988, 21]
]
columns = ['a1', 'a2']
df = pd.DataFrame(data=data, columns=columns)

如果列A1的元素等于一个,则我有这样的数据表,则它们属于同一组,并且在此示例中的答案将如下:

[[0], [1, 2], [3, 4], [5, 6, 7], [8]]

输出列表存储与行相对应的索引来自DF。

排序A1列也可能有用。感谢您的帮助!

I need to group the data in such a way that if the difference between the adjacent values from column a1 was equal to the same pre-specified value, then they belong to the same group. If the value between two adjacent elements is different, then all subsequent data belong to a different group. For example, I have such a data table

import pandas as pd
import numpy as np

data = [
    [5, 2],
    [100, 23],
    [101, -2],
    [303, 9],
    [304, 4],
    [709, 14],
    [710, 3],
    [711, 3],
    [988, 21]
]
columns = ['a1', 'a2']
df = pd.DataFrame(data=data, columns=columns)

If the difference between the elements of column a1 is equal to one, then they belong to the same group and the answer in this example will be the following:

[[0], [1, 2], [3, 4], [5, 6, 7], [8]]

The output list stores indexes that correspond to rows from df.

It may also be useful that column a1 is ordered. Thank you for your help!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

め七分饶幸 2025-02-16 21:25:08

假设您的数据框是由a1排序的,并且我正确理解了您的问题,我认为您可以做这样的事情:

import pandas as pd
import numpy as np
from numba import njit

data = [
    [5, 2],
    [100, 23],
    [101, -2],
    [303, 9],
    [304, 4],
    [709, 14],
    [710, 3],
    [711, 3],
    [988, 21]
]
columns = ['a1', 'a2']
df = pd.DataFrame(data=data, columns=columns)

@njit
def get_groups(vals):
    counter = 0
    group = []
    for i in range(len(vals)-1):
        if vals[i+1]-vals[i] == 1:
            group.append(counter)
        else:
            group.append(counter)
            counter += 1
    if vals[-1] - vals[-2] == 1: group.append(group[-1])
    else: group.append(counter + 1)
        
    return group  
    
groups = get_groups(df['a1'].values)
assert len(groups) == len(df)

df['group'] = groups
final_ls = df.reset_index().groupby(['group']).agg({'index': list})['index'].to_list()
final_ls

------------------------------------------------------------
[[0], [1, 2], [3, 4], [5, 6, 7], [8]]
------------------------------------------------------------

njit decorator numba 使循环方法有效。

Assuming that your data frame is sorted by a1 and that I understood your problem correctly, I think you could do something like this:

import pandas as pd
import numpy as np
from numba import njit

data = [
    [5, 2],
    [100, 23],
    [101, -2],
    [303, 9],
    [304, 4],
    [709, 14],
    [710, 3],
    [711, 3],
    [988, 21]
]
columns = ['a1', 'a2']
df = pd.DataFrame(data=data, columns=columns)

@njit
def get_groups(vals):
    counter = 0
    group = []
    for i in range(len(vals)-1):
        if vals[i+1]-vals[i] == 1:
            group.append(counter)
        else:
            group.append(counter)
            counter += 1
    if vals[-1] - vals[-2] == 1: group.append(group[-1])
    else: group.append(counter + 1)
        
    return group  
    
groups = get_groups(df['a1'].values)
assert len(groups) == len(df)

df['group'] = groups
final_ls = df.reset_index().groupby(['group']).agg({'index': list})['index'].to_list()
final_ls

------------------------------------------------------------
[[0], [1, 2], [3, 4], [5, 6, 7], [8]]
------------------------------------------------------------

The njit decorator from numba makes the looping approach efficient.

染墨丶若流云 2025-02-16 21:25:08

我们通过“ A1”列对数据框进行排序,然后找到相邻值的差异。现在我们有所不同,我们可以开始分组。

import pandas as pd

data = [
    [5, 2],
    [100, 23],
    [101, -2],
    [303, 9],
    [304, 4],
    [709, 14],
    [710, 3],
    [711, 3],
    [988, 21]
]
columns = ['a1', 'a2']
df = pd.DataFrame(data=data, columns=columns)

# To sort the values of "a1" column
df=df.sort_values(by=['a1'])
# To find the difference between the adjacent values
df['difference']=df['a1'].diff()

# Sorted index
indexs=df.index.tolist()

group=[]

# To check the difference, before this row
check=-1
for i,diff in enumerate(df['difference']):
    if diff==0 or diff==1:
        if check==1:
            group[-1].append(indexs[i])
        else:
            group.append([indexs[i-1],indexs[i]])
    check=diff

# For finding indexes that are not in group
z=[]
for x in group: [z.append(w) for w in x]
for t in (set(indexs)-set(z)):
    group.append([t])
print(group)

结果:

[[1, 2], [3, 4], [5, 6, 7], [0], [8]]

We are sorting the the Dataframe by "a1" column, then finding the difference of adjacent values. Now we have the difference, we can start grouping.

import pandas as pd

data = [
    [5, 2],
    [100, 23],
    [101, -2],
    [303, 9],
    [304, 4],
    [709, 14],
    [710, 3],
    [711, 3],
    [988, 21]
]
columns = ['a1', 'a2']
df = pd.DataFrame(data=data, columns=columns)

# To sort the values of "a1" column
df=df.sort_values(by=['a1'])
# To find the difference between the adjacent values
df['difference']=df['a1'].diff()

# Sorted index
indexs=df.index.tolist()

group=[]

# To check the difference, before this row
check=-1
for i,diff in enumerate(df['difference']):
    if diff==0 or diff==1:
        if check==1:
            group[-1].append(indexs[i])
        else:
            group.append([indexs[i-1],indexs[i]])
    check=diff

# For finding indexes that are not in group
z=[]
for x in group: [z.append(w) for w in x]
for t in (set(indexs)-set(z)):
    group.append([t])
print(group)

Result:

[[1, 2], [3, 4], [5, 6, 7], [0], [8]]
星星的軌跡 2025-02-16 21:25:08

上面的答案促使我找到了一个简短而简单的代码以获取答案。非常感谢!

import pandas as pd
import numpy as np

data = [
    [5, 2],
    [100, 23],
    [101, -2],
    [303, 9],
    [304, 4],
    [709, 14],
    [710, 3],
    [711, 3],
    [988, 21]
]
columns = ['a1', 'a2']
df = pd.DataFrame(data=data, columns=columns)


diff = 1
index_groups = [[df.index[0]]]
for index in df.index[1:]:
    if df.loc[index, 'a1'] - df.loc[index-1, 'a1'] == diff:
        index_groups[-1].append(index)
    else:
        index_groups.append([index])
index_groups

The answers above pushed me to a fairly short and simple code to get an answer. Thank you all very much!

import pandas as pd
import numpy as np

data = [
    [5, 2],
    [100, 23],
    [101, -2],
    [303, 9],
    [304, 4],
    [709, 14],
    [710, 3],
    [711, 3],
    [988, 21]
]
columns = ['a1', 'a2']
df = pd.DataFrame(data=data, columns=columns)


diff = 1
index_groups = [[df.index[0]]]
for index in df.index[1:]:
    if df.loc[index, 'a1'] - df.loc[index-1, 'a1'] == diff:
        index_groups[-1].append(index)
    else:
        index_groups.append([index])
index_groups
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文