如何使用DataFrames将类似的数字与范围/条件和合并ID分组?

发布于 2025-02-14 01:15:16 字数 695 浏览 4 评论 0 原文

请,我有一个按升序列出的数据框架。我的目标是平均相似的数字(在“两个方向”中彼此之间的10%以内的数字)在一起,并将其“铃”名称列出。例如,图像显示输入和输出数据框架。我尝试编码它,但我坚持如何进步。

“

   def full_data_compare(self, df_full = pd.DataFrame()):
       for k in range(df_full): #current rows
           for j in range(df_full): #future rows
               if int(df_full['Size'][k]) - int(df_full['Size'][k])*(1/10) <= int(df_full['Size'][j]) <= int(df_full['Size'][k]) + int(df_full['Size'][k])*(1/10) & int(df_full['Size'][k]) - int(df_full['Size'][k])*(1/10) <= int(df_full['Size'][j]) <= int(df_full['Size'][k]) + int(df_full['Size'][k])*(1/10):

Please, I have a dataframe that is listed in ascending order. My goal is to average similar numbers (numbers that are within 10% of each other in ‘both directions’) and concate their ‘Bell’ name together. For example, the image shows the input and output dataframe. I tried coding it but I stuck on how to progress.

dataframe

   def full_data_compare(self, df_full = pd.DataFrame()):
       for k in range(df_full): #current rows
           for j in range(df_full): #future rows
               if int(df_full['Size'][k]) - int(df_full['Size'][k])*(1/10) <= int(df_full['Size'][j]) <= int(df_full['Size'][k]) + int(df_full['Size'][k])*(1/10) & int(df_full['Size'][k]) - int(df_full['Size'][k])*(1/10) <= int(df_full['Size'][j]) <= int(df_full['Size'][k]) + int(df_full['Size'][k])*(1/10):

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

长伴 2025-02-21 01:15:17

假设您真的想检查两个方向,即连续值在10%以内,则需要使用 pct_change 计算两个系列。然后将其用于 groupby.agg

#df = df.sort_values(by='Size') for non-consecutive grouping

m1 = df['Size'].pct_change().abs().gt(0.1)
m2 = df['Size'].pct_change(-1).abs().shift().gt(0.1)

out = (df
 .groupby((m1|m2).cumsum())
 .agg({'Bell': ' '.join, 'Size': 'mean'})
)

nb。如果要分组非连续值,则首先需要对它们进行排序: sort_values(by ='size')

output:

          Bell          Size
Size                        
0        A1 A2   1493.500000
1     A1 A2 A3   5191.333333
2     A1 A3 A2  35785.333333
3           A2  45968.000000
4           A1  78486.000000
5           A3  41205.000000

Assuming you really want to check in both directions that the consecutive values are within 10%, you need to compute two Series with pct_change. Then use it to groupby.agg:

#df = df.sort_values(by='Size') for non-consecutive grouping

m1 = df['Size'].pct_change().abs().gt(0.1)
m2 = df['Size'].pct_change(-1).abs().shift().gt(0.1)

out = (df
 .groupby((m1|m2).cumsum())
 .agg({'Bell': ' '.join, 'Size': 'mean'})
)

NB. If you want to group non-consecutive values, you first need to sort them: sort_values(by='Size')

Output:

          Bell          Size
Size                        
0        A1 A2   1493.500000
1     A1 A2 A3   5191.333333
2     A1 A3 A2  35785.333333
3           A2  45968.000000
4           A1  78486.000000
5           A3  41205.000000
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文