三列时最大的五个数字

发布于 2025-01-10 23:37:15 字数 1432 浏览 0 评论 0原文

a) 当ID重复时如何找到最大的5个SNR?我还希望所有这三列作为输出。 b)我还希望消除的行作为输出。

         FIT             ID                   SNR
    1011563.fit,  J16142485-3141000 ,       36   
    1011729.fit,  J17210134-3757437 ,       18   
    1011730.fit,  J17210134-3757437 ,       20   
    1011731.fit,  J17210134-3757437 ,       20   
    1011732.fit,  J17210134-3757437 ,       13   
    1011914.fit,  J17210134-3757437 ,       38   
    1011915.fit,  J17210134-3757437 ,       26   
    1011916.fit,  J17210134-3757437 ,       19   
    1011917.fit,  J17210134-3757437 ,       47   
    1011918.fit,  J17210134-3757437 ,       25 ´´´   
  

  Expected output for a.

                   FITS                    ID  SNR
```8  1011917.fit    J17210134-3757437    47
   5  1011914.fit    J17210134-3757437    38
   0  1011563.fit    J16142485-3141000    36
   6  1011915.fit    J17210134-3757437    26
   9  1011918.fit    J17210134-3757437    25
   2  1011730.fit    J17210134-3757437    20 ´´´

Output b) 
 
```          FITS                    ID  SNR
     1  1011729.fit    J17210134-3757437    18
     6  1011915.fit    J17210134-3757437    26
     7  1011916.fit    J17210134-3757437    19
     8  1011917.fit    J17210134-3757437    47´´´
As you can see SNR "6  1011915.fit    J17210134-3757437    26" and 
                   "8  1011917.fit    J17210134-3757437    47" are repeated. But I want this only as output a and not b.

a) How to find the largest five SNRs when the ID is repeated? And also I want all these three columns as the output.
b) I also want the eliminated lines as the output.

         FIT             ID                   SNR
    1011563.fit,  J16142485-3141000 ,       36   
    1011729.fit,  J17210134-3757437 ,       18   
    1011730.fit,  J17210134-3757437 ,       20   
    1011731.fit,  J17210134-3757437 ,       20   
    1011732.fit,  J17210134-3757437 ,       13   
    1011914.fit,  J17210134-3757437 ,       38   
    1011915.fit,  J17210134-3757437 ,       26   
    1011916.fit,  J17210134-3757437 ,       19   
    1011917.fit,  J17210134-3757437 ,       47   
    1011918.fit,  J17210134-3757437 ,       25 ´´´   
  

  Expected output for a.

                   FITS                    ID  SNR
```8  1011917.fit    J17210134-3757437    47
   5  1011914.fit    J17210134-3757437    38
   0  1011563.fit    J16142485-3141000    36
   6  1011915.fit    J17210134-3757437    26
   9  1011918.fit    J17210134-3757437    25
   2  1011730.fit    J17210134-3757437    20 ´´´

Output b) 
 
```          FITS                    ID  SNR
     1  1011729.fit    J17210134-3757437    18
     6  1011915.fit    J17210134-3757437    26
     7  1011916.fit    J17210134-3757437    19
     8  1011917.fit    J17210134-3757437    47´´´
As you can see SNR "6  1011915.fit    J17210134-3757437    26" and 
                   "8  1011917.fit    J17210134-3757437    47" are repeated. But I want this only as output a and not b.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

别理我 2025-01-17 23:37:15

在排序后的数据帧上使用groupby+head来获取索引,然后切片:

idx = df.sort_values(by='SNR', ascending=False).groupby('ID').head(5).index

df2 = df.loc[idx]

输出:

            FITS                 ID  SNR
0   1004234.fits  J16355032-2814188  714
4   1004238.fits  J16355032-2814188  690
11  1004245.fits  J16355032-2814188  645
8   1004242.fits  J16355032-2814188  635
9   1004243.fits  J16355032-2814188  522
17  1005114.fits  J22154748+4954052  227
16  1005113.fits  J22154748+4954052  212
13  1004476.fits  J22152631+4958343  162
19  1005116.fits  J22154748+4954052  160
18  1005115.fits  J22154748+4954052  148
15  1004478.fits  J22152631+4958343  103
14  1004477.fits  J22152631+4958343   76
12  1004475.fits  J22152631+4958343   62

其他行:

df3 = df.loc[df.index.difference(idx)]

输出:

            FITS                 ID  SNR
1   1004235.fits  J16355032-2814188  444
2   1004236.fits  J16355032-2814188  331
3   1004237.fits  J16355032-2814188  492
5   1004239.fits  J16355032-2814188  491
6   1004240.fits  J16355032-2814188  489
7   1004241.fits  J16355032-2814188  382
10  1004244.fits  J16355032-2814188  385

Use groupby+head on the sorted dataframe to get the indices, then slice:

idx = df.sort_values(by='SNR', ascending=False).groupby('ID').head(5).index

df2 = df.loc[idx]

output:

            FITS                 ID  SNR
0   1004234.fits  J16355032-2814188  714
4   1004238.fits  J16355032-2814188  690
11  1004245.fits  J16355032-2814188  645
8   1004242.fits  J16355032-2814188  635
9   1004243.fits  J16355032-2814188  522
17  1005114.fits  J22154748+4954052  227
16  1005113.fits  J22154748+4954052  212
13  1004476.fits  J22152631+4958343  162
19  1005116.fits  J22154748+4954052  160
18  1005115.fits  J22154748+4954052  148
15  1004478.fits  J22152631+4958343  103
14  1004477.fits  J22152631+4958343   76
12  1004475.fits  J22152631+4958343   62

Other rows:

df3 = df.loc[df.index.difference(idx)]

output:

            FITS                 ID  SNR
1   1004235.fits  J16355032-2814188  444
2   1004236.fits  J16355032-2814188  331
3   1004237.fits  J16355032-2814188  492
5   1004239.fits  J16355032-2814188  491
6   1004240.fits  J16355032-2814188  489
7   1004241.fits  J16355032-2814188  382
10  1004244.fits  J16355032-2814188  385
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文