为什么斯坦福·纳尔·塔格(Stanford Ner Tagger)为类似列表提供不同的标签?

发布于 2025-02-10 19:02:25 字数 1592 浏览 3 评论 0原文

我想更好地理解为什么Stanford ner(名称实体识别)标记为相同单词会产生不同的结果,具体取决于您提交的单词列表。

这是一个示例:

from nltk.tag import StanfordNERTagger

user ="MYUSERPATH"

stpath = user + 'PATHTOSTANFORDTAGGER'
St = StanfordNERTagger(stpath + 'classifiers/english.all.3class.distsim.crf.ser.gz', stpath+'stanford-ner.jar', encoding ='utf-8')

words1 = ["I","am", "amazed", "by", "Dylan","van", "Baarle", "and", "Remco", "Evenepoel"]
words2 = ["I","am", "amazed", "by","Dylan","van", "Baarle", "and","Remco", "Evenepoel", "Paris","Roubaix","is","a","great","race","I","watch","on","Eurosport"]
text_pars = St.tag(words1)
text_pars2 = St.tag(words2)
print(words1)
print(text_pars)
print(text_pars2)

这里列表words2words1的串联,第二部分的串联。当我比较这两个句子的标签时,我可以看到相同单词的输出不是相同的。

这是print(text_pars)的输出,标记words1。它准确地标记了“ remco”和“ venepoel”和一个人。

[('I', 'O'), ('am', 'O'), ('amazed', 'O'), ('by', 'O'), ('Dylan', 'PERSON'), ('van', 'PERSON'), ('Baarle', 'PERSON'), ('and', 'O'), ('Remco', 'PERSON'), ('Evenepoel', 'PERSON')]

第二个实例的输出产生不同的结果。现在,它将“ remco”和“ venepoel”标记为'组织':

[('I', 'O'), ('am', 'O'), ('amazed', 'O'), ('by', 'O'), ('Dylan', 'PERSON'), ('van', 'PERSON'), ('Baarle', 'PERSON'), ('and', 'O'), ('Remco', 'ORGANIZATION'), ('Evenepoel', 'ORGANIZATION'), ('Paris', 'ORGANIZATION'), ('Roubaix', 'ORGANIZATION'), ('is', 'O'), ('a', 'O'), ('great', 'O'), ('race', 'O'), ('I', 'O'), ('watch', 'O'), ('on', 'O'), ('Eurosport', 'LOCATION')]

为什么它们与众不同?它与单词的周围环境有关(在其之后将许多单词标记为组织)?

I would like to understand better why the Stanford NER (Name Entity Recognition) tagger yields different results for the same words, depending on the list of words you submit to it.

Here is an example:

from nltk.tag import StanfordNERTagger

user ="MYUSERPATH"

stpath = user + 'PATHTOSTANFORDTAGGER'
St = StanfordNERTagger(stpath + 'classifiers/english.all.3class.distsim.crf.ser.gz', stpath+'stanford-ner.jar', encoding ='utf-8')

words1 = ["I","am", "amazed", "by", "Dylan","van", "Baarle", "and", "Remco", "Evenepoel"]
words2 = ["I","am", "amazed", "by","Dylan","van", "Baarle", "and","Remco", "Evenepoel", "Paris","Roubaix","is","a","great","race","I","watch","on","Eurosport"]
text_pars = St.tag(words1)
text_pars2 = St.tag(words2)
print(words1)
print(text_pars)
print(text_pars2)

Here the list words2 is the concatenation of words1 and a second piece of a sentence. When I compare the tags of these two sentences, I can see that the output for the same words is not the same.

Here is the output of print(text_pars), which tagged words1. It accurately tagged "Remco" and "Evenepoel" and a PERSON.

[('I', 'O'), ('am', 'O'), ('amazed', 'O'), ('by', 'O'), ('Dylan', 'PERSON'), ('van', 'PERSON'), ('Baarle', 'PERSON'), ('and', 'O'), ('Remco', 'PERSON'), ('Evenepoel', 'PERSON')]

The output of the second instance yields different results. It now tags "Remco" and "Evenepoel" as 'ORGANIZATION':

[('I', 'O'), ('am', 'O'), ('amazed', 'O'), ('by', 'O'), ('Dylan', 'PERSON'), ('van', 'PERSON'), ('Baarle', 'PERSON'), ('and', 'O'), ('Remco', 'ORGANIZATION'), ('Evenepoel', 'ORGANIZATION'), ('Paris', 'ORGANIZATION'), ('Roubaix', 'ORGANIZATION'), ('is', 'O'), ('a', 'O'), ('great', 'O'), ('race', 'O'), ('I', 'O'), ('watch', 'O'), ('on', 'O'), ('Eurosport', 'LOCATION')]

Why are they different? Does it have to do with the surroundings of the words (Many words tagged as organization after it)?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

瞳孔里扚悲伤 2025-02-17 19:02:25

标记器通过采用周围单词的特征来起作用。一个名字的城市名称是一个说明,例如,它可能是组织的一部分,例如(帕丽斯·希尔顿(Paris Hilton))。

作为一个对这些实体一无所知的人,无论如何,这似乎是一个完全合理的决定。 Consider:

"I", "am", "amazed", "by", "Dylan", "van", "Baarle", "and", "Remco", "Evenepoel"

Okay,以van Baarle结尾的东西可能是一个人,这种情况使我认为第二个NE也是一个人。

word2 = [“ i”,“ am”,“惊人”,“”,“ dylan”,“ van”,“ baarle”,“ and”,“ remco”,“ embo”,“ evenpoel”,“ paris”,“ paris”, "Roubaix", "is", "a", "great", "race", "I", "watch", "on", "Eurosport"]

I read this as you're just live streaming your consciousness and have two separate thoughts in mind:

  • Dylan van Baarle amazes you
  • You like watching a race named "Remco Evenepoel Paris Roubaix"

The tagger works by taking features of the surrounding words. A city name late in a name is a tell that it is probably part of an Organization, for example (Paris Hilton non-withstanding).

As a human who knows nothing about these entities, this looks like a completely reasonable decision anyway. Consider:

"I", "am", "amazed", "by", "Dylan", "van", "Baarle", "and", "Remco", "Evenepoel"

Okay, something ending with van Baarle is probably a person, and that context makes me think the second NE is also a person.

words2 = ["I", "am", "amazed", "by", "Dylan", "van", "Baarle", "and", "Remco", "Evenepoel", "Paris", "Roubaix", "is", "a", "great", "race", "I", "watch", "on", "Eurosport"]

I read this as you're just live streaming your consciousness and have two separate thoughts in mind:

  • Dylan van Baarle amazes you
  • You like watching a race named "Remco Evenepoel Paris Roubaix"
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文