如何在Python中的不同列表中对同一位置的每个项目进行排名?

发布于 2025-01-17 01:39:48 字数 735 浏览 2 评论 0原文

我是 R 程序员,但需要使用 python 实现表格中的排名。假设我有一列“测试”,其中包含数字列表:

df = pd.DataFrame({"test":[[1,4,7], [4,2,6], [3,8,1]]})

我希望对行(列表)中同一位置的每个项目进行排名,并对所有排名进行平均以获得最终分数:

预期:

       test      rank_list    final_score
0   [1, 4, 7]    [1, 2, 3]       2
1   [4, 2, 6]    [3, 1, 2]       2
2   [3, 8, 1]    [2, 3, 1]       2

我知道这不是一个一个很好的例子,所有最终分数都是相同的,但如果有数百行,结果会有所不同。我希望我能清楚地描述问题,如果没有,请随时提问。

我不知道我是否可以在 pandas 中做到这一点,但我尝试了 zip + scipy,但是 scipy.stats.rankdata 没有给出同一索引处项目的排名:

l = list(dff["test"])
ranks_list = [scipy.stats.rankdata(x) for x in zip(*l)] # not right
estimated_rank = [sum(x) / len(x) for x in ranks_list]

我对任何各种套餐,以方便为准。谢谢你!

I am R programmer, but need to achieve rankings in a table using python. Let's say I have a column "test" with a list of number lists:

df = pd.DataFrame({"test":[[1,4,7], [4,2,6], [3,8,1]]})

I expected to rank each item at the same location across rows (lists), and average all ranks to get a final score:

expected:

       test      rank_list    final_score
0   [1, 4, 7]    [1, 2, 3]       2
1   [4, 2, 6]    [3, 1, 2]       2
2   [3, 8, 1]    [2, 3, 1]       2

I know it is not a good example that all final scores are the same, but with hundreds of rows, the results will be various. I hope I describe the questions clearly, but if not, please feel free to ask.

I don't know if I can do it in pandas, but I tried zip + scipy, but scipy.stats.rankdata did not give the rank on item at the same index:

l = list(dff["test"])
ranks_list = [scipy.stats.rankdata(x) for x in zip(*l)] # not right
estimated_rank = [sum(x) / len(x) for x in ranks_list]

I am open to any kinds of packages, whichever is convenient. Thank you!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

说谎友 2025-01-24 01:39:48
import numpy as np

# Create a numpy array
a = np.array([[1,4,7], [4,2,6], [3,8,1]])

# get the index of the sorted array along each row
# Python uses zero-based indexing so we add 1
rank_list = np.argsort(a, axis=0) + 1

# calculate the average rank of each column
final_score = np.mean(rank_list, axis=1)
import numpy as np

# Create a numpy array
a = np.array([[1,4,7], [4,2,6], [3,8,1]])

# get the index of the sorted array along each row
# Python uses zero-based indexing so we add 1
rank_list = np.argsort(a, axis=0) + 1

# calculate the average rank of each column
final_score = np.mean(rank_list, axis=1)
天涯沦落人 2025-01-24 01:39:48

您可以使用 rank 方法来获取排名。然后使用 agg 获取列 rank_list 列表的输出。最后,final_scoremean

tmp = pd.DataFrame(df['test'].tolist()).apply('rank').astype(int)
df['rank_list'] = tmp.agg(list, axis=1)
df['final_score'] = tmp.mean(axis=1)

输出:

        test  rank_list  final_score
0  [1, 4, 7]  [1, 2, 3]          2.0
1  [4, 2, 6]  [3, 1, 2]          2.0
2  [3, 8, 1]  [2, 3, 1]          2.0

You could use rank method to get the ranks. Then use agg to get the output as lists for column rank_list. Finally, mean for final_score:

tmp = pd.DataFrame(df['test'].tolist()).apply('rank').astype(int)
df['rank_list'] = tmp.agg(list, axis=1)
df['final_score'] = tmp.mean(axis=1)

Output:

        test  rank_list  final_score
0  [1, 4, 7]  [1, 2, 3]          2.0
1  [4, 2, 6]  [3, 1, 2]          2.0
2  [3, 8, 1]  [2, 3, 1]          2.0
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文