一个人如何提取spacy中的动词短语?
例如:
当人们认为所有可以买到的scoopers时,通常都会高估终极的冰淇淋scooper。
在这里我想摘下:
- 主题:“终极旋转冰淇淋scoopers”
- 副词子句:“当人们考虑所有可以购买的scoopers时,人们
- 通常都会被高估“
我具有以下功能 ”在
def get_subj(decomp):
for token in decomp:
if ("subj" in token.dep_):
subtree = list(token.subtree)
start = subtree[0].i
end = subtree[-1].i + 1
return str(decomp[start:end])
def get_obj(decomp):
for token in decomp:
if ("dobj" in token.dep_ or "pobr" in token.dep_):
subtree = list(token.subtree)
start = subtree[0].i
end = subtree[-1].i + 1
return str(decomp[start:end])
def get_advcl(decomp):
for token in decomp:
# print(f"pos: {token.pos_}; lemma: {token.lemma_}; dep: {token.dep_}")
if ("advcl" in token.dep_):
subtree = list(token.subtree)
start = subtree[0].i
end = subtree[-1].i + 1
return str(decomp[start:end])
phrase = "Ultimate Swirly Ice Cream Scoopers are usually overrated when one considers all of the scoopers one could buy."
nlp = spacy.load("en_core_web_sm")
decomp = nlp(phrase)
subj = get_subj(decomp)
obj = get_obj(decomp)
advcl = get_advcl(decomp)
print("subj: ", subj)
print("obj: ", obj)
print("advcl: ", advcl)
subj: Ultimate Swirly Ice Cream Scoopers
obj: all of the scoopers
advcl: when one considers all of the scoopers one could buy
副总裁“通常被高估”的最终单词是“根”。
因此,随着root
的子树返回整个句子,子树技术失败了。
For example:
Ultimate Swirly Ice Cream Scoopers are usually overrated when one considers all of the scoopers one could buy.
Here I'd like to pluck:
- Subject: "Ultimate Swirly Ice Cream Scoopers"
- Adverbial Clause: "When one considers all of the scoopers one could buy"
- Verb Phrase: "are usually overrated"
I have the following functions for subject
, object
, and adverbial clause
:
def get_subj(decomp):
for token in decomp:
if ("subj" in token.dep_):
subtree = list(token.subtree)
start = subtree[0].i
end = subtree[-1].i + 1
return str(decomp[start:end])
def get_obj(decomp):
for token in decomp:
if ("dobj" in token.dep_ or "pobr" in token.dep_):
subtree = list(token.subtree)
start = subtree[0].i
end = subtree[-1].i + 1
return str(decomp[start:end])
def get_advcl(decomp):
for token in decomp:
# print(f"pos: {token.pos_}; lemma: {token.lemma_}; dep: {token.dep_}")
if ("advcl" in token.dep_):
subtree = list(token.subtree)
start = subtree[0].i
end = subtree[-1].i + 1
return str(decomp[start:end])
phrase = "Ultimate Swirly Ice Cream Scoopers are usually overrated when one considers all of the scoopers one could buy."
nlp = spacy.load("en_core_web_sm")
decomp = nlp(phrase)
subj = get_subj(decomp)
obj = get_obj(decomp)
advcl = get_advcl(decomp)
print("subj: ", subj)
print("obj: ", obj)
print("advcl: ", advcl)
Output:
subj: Ultimate Swirly Ice Cream Scoopers
obj: all of the scoopers
advcl: when one considers all of the scoopers one could buy
However, the actual depenency
type .dep_
for the final word of the VP, "are usually overrated", is "ROOT".
So, the subtree technique fails, as the subtree of ROOT
returns the entire sentence.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您希望构造更像“动词群”的东西,其中您只使用root动词保留某些近亲依赖者,例如
aux
,cop
和advmod
代码>,但不是nsubj
,obj
或adval> advcl
。You are wanting to construct something more like a “verb group” where you keep with the root verb only certain close dependents like
aux
,cop
, andadvmod
but not ones likensubj
,obj
, oradvcl
.