我有一个熊猫数据框架 df
,它的基于其BM25分数排名的语料库中的前10个文档,并由其 doc_id
索引。
DOC_ID |
等级 |
BM25分数 |
1234 |
1 |
3.3472 |
5678 |
2 |
3.3238 |
我也有一个列表文档
包含所有文档与其 doc_id
: [['first doc_id','第一个doc文本],['second doc_id','第二doc text],...]
。
我需要在 df
中使用 doc_id
,并在 df
中使用 doc_ /代码>并打印出文档文本。我知道如何通过执行 df
来获取特定等级的 doc_id
。 /code>,但我不确定如何从那里获取相应的文档文本。
I have a pandas data frame df
which has the top 10 documents from a corpus ranked based on their BM25 score, and indexed by their Doc_ID
.
Doc_ID |
Rank |
BM25 Score |
1234 |
1 |
3.3472 |
5678 |
2 |
3.3238 |
I also have a list documents
containing all of the documents paired up with their Doc_ID
, such that the list is in the following form: [['First Doc_ID', 'First doc text], ['Second Doc_ID', 'Second doc text], ...]
.
I need to take the Doc_ID
for the top 3 ranked documents in df
, and match each one with the corresponding Doc_ID
in documents
and print out the document text. I know how to get the Doc_ID
for a particular rank from df
by doing df.index[df['Rank'] == 1][0]
, but I'm unsure how to go from there to get the corresponding document text.
发布评论
评论(1)
您可以将列表转换为dataframe和代码> :
输出:
二手输入:
或者,如果您想要使用python的列表:
output:
['first doc text','第二doc text','第三doc text']
>You can convert your list to DataFrame and
merge
:output:
used input:
Alternatively, if you want a list using python:
output:
['First doc text', 'Second doc text', 'Third doc text']