我目前正在做一个关于人名消歧的项目。该项目背后的想法是,当有多个同名的人时,它将能够识别出正确的人。我为此使用了维基百科。我想根据一些标准数据评估我的项目。我正在寻找一些测试数据。我不熟悉维基百科上的流行名字。知道吗,我可以在哪里找到这些数据?我并不是在寻找大量数据。我只是在寻找一些 100-500 个例子。
谢谢您
向问题添加更多信息。
我正在寻找的是名字相同但实际上不同的人。例如,迈克尔乔丹是一位著名的篮球运动员,还有一位统计学家叫这个名字。我正在寻找这样的例子。
http://en.wikipedia.org/wiki/Michael_Jordan
http://en.wikipedia.org/wiki/Michael_I._Jordan
希望,您现在明白这个问题了。
I am currently doing a project on person name disambiguation. The idea behind the project, that it will be able to identify the correct person, when there are multiple people with the same name. I have used wikipedia for this. I want to evaluate my project on some standard data. I am looking for some testing data. I am not familiar with popular names in wikipedia. Any idea, where I can find this data? I am not looking for vast amounts of data. I am just looking for some 100-500 examples.
Thank you
Adding more information to the question.
What I am looking for is of people with same names but are actually different. For ex, Michael Jordon is a famous basketball player and there is also a statistician with that name. I am looking for examples like this.
http://en.wikipedia.org/wiki/Michael_Jordan
http://en.wikipedia.org/wiki/Michael_I._Jordan
Hope, you understand the question now.
发布评论
评论(3)
用于测试的数据集:
祝你好运!
Datasets for testing:
Good luck!
想知道为什么不能在 SO 用户上使用这些名称: https://stackoverflow.com/users?tab=reputation
它已经按代表排名 - 所以你知道“流行的名字”。
wondering why can't you use the names on SO users: https://stackoverflow.com/users?tab=reputation
it is already ranked by rep - so you know the "popular names".
http://en.wikipedia.org/wiki/Category:Redirects_to_disambiguation_pages 是一个巨大的列表维基百科上的消歧页面。从该链接链接的每个页面都包含事物名称不明确的页面链接。这就是您要找的吗?
http://en.wikipedia.org/wiki/Category:Redirects_to_disambiguation_pages is a huge list of disambiguation pages on wikipedia. Every page linked from that contains links of pages of ambiguous names of things. Is that what you're looking for?