一个人如何提取spacy中的动词短语?

发布于 2025-02-07 07:01:37 字数 1629 浏览 2 评论 0原文

例如:

当人们认为所有可以买到的scoopers时,通常都会高估终极的冰淇淋scooper。

在这里我想摘下:

  • 主题:“终极旋转冰淇淋scoopers”
  • 副词子句:“当人们考虑所有可以购买的scoopers时,人们
  • 通常都会被高估“

我具有以下功能 ”在

def get_subj(decomp):
    for token in decomp:
        if ("subj" in token.dep_):
            subtree = list(token.subtree)
            start = subtree[0].i
            end = subtree[-1].i + 1
            return str(decomp[start:end])

def get_obj(decomp):
    for token in decomp:
        if ("dobj" in token.dep_ or "pobr" in token.dep_):
            subtree = list(token.subtree)
            start = subtree[0].i
            end = subtree[-1].i + 1
            return str(decomp[start:end])

def get_advcl(decomp):
    for token in decomp:
        # print(f"pos: {token.pos_}; lemma: {token.lemma_}; dep: {token.dep_}")
        if ("advcl" in token.dep_):
            subtree = list(token.subtree)
            start = subtree[0].i
            end = subtree[-1].i + 1
            return str(decomp[start:end])

phrase = "Ultimate Swirly Ice Cream Scoopers are usually overrated when one considers all of the scoopers one could buy."

nlp = spacy.load("en_core_web_sm")
decomp = nlp(phrase)

subj = get_subj(decomp)
obj = get_obj(decomp)
advcl = get_advcl(decomp)

print("subj: ", subj)
print("obj: ", obj)
print("advcl: ", advcl)

subj:  Ultimate Swirly Ice Cream Scoopers
obj:  all of the scoopers
advcl:  when one considers all of the scoopers one could buy

​副总裁“通常被高估”的最终单词是“根”。

因此,随着root的子树返回整个句子,子树技术失败了。

For example:

Ultimate Swirly Ice Cream Scoopers are usually overrated when one considers all of the scoopers one could buy.

Here I'd like to pluck:

  • Subject: "Ultimate Swirly Ice Cream Scoopers"
  • Adverbial Clause: "When one considers all of the scoopers one could buy"
  • Verb Phrase: "are usually overrated"

I have the following functions for subject, object, and adverbial clause:

def get_subj(decomp):
    for token in decomp:
        if ("subj" in token.dep_):
            subtree = list(token.subtree)
            start = subtree[0].i
            end = subtree[-1].i + 1
            return str(decomp[start:end])

def get_obj(decomp):
    for token in decomp:
        if ("dobj" in token.dep_ or "pobr" in token.dep_):
            subtree = list(token.subtree)
            start = subtree[0].i
            end = subtree[-1].i + 1
            return str(decomp[start:end])

def get_advcl(decomp):
    for token in decomp:
        # print(f"pos: {token.pos_}; lemma: {token.lemma_}; dep: {token.dep_}")
        if ("advcl" in token.dep_):
            subtree = list(token.subtree)
            start = subtree[0].i
            end = subtree[-1].i + 1
            return str(decomp[start:end])

phrase = "Ultimate Swirly Ice Cream Scoopers are usually overrated when one considers all of the scoopers one could buy."

nlp = spacy.load("en_core_web_sm")
decomp = nlp(phrase)

subj = get_subj(decomp)
obj = get_obj(decomp)
advcl = get_advcl(decomp)

print("subj: ", subj)
print("obj: ", obj)
print("advcl: ", advcl)

Output:

subj:  Ultimate Swirly Ice Cream Scoopers
obj:  all of the scoopers
advcl:  when one considers all of the scoopers one could buy

However, the actual depenency type .dep_ for the final word of the VP, "are usually overrated", is "ROOT".

So, the subtree technique fails, as the subtree of ROOT returns the entire sentence.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

十二 2025-02-14 07:01:37

您希望构造更像“动词群”的东西,其中您只使用root动词保留某些近亲依赖者,例如auxcopadvmod代码>,但不是nsubjobjadval> advcl

You are wanting to construct something more like a “verb group” where you keep with the root verb only certain close dependents like aux, cop, and advmod but not ones like nsubj, obj, or advcl.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文