Spacy中的匹配模式返回一个空的结果

发布于 2025-01-29 12:02:49 字数 780 浏览 3 评论 0原文

我希望使用此简单代码找到一些模式。但是结果是空的。 我忘记了什么吗?

for tk in doc[:30]:
     print (tk.text, ':', tk.pos_)

Método:名词 DE:ADP avaliaçãosimulação:名词 计算:调整 符合:ADP PRODACEMENTOS:名词 Apresentados:动词 EM:ADP : 空间 Ediifif:PROPN cações:名词 EM:ADP fase:名词 DE:ADP projetoa:名词 Avaliação:名词 Deve:动词 ser:aux Feita:动词 段:adp 嗯:num 直径:名词 típico:adj DE:ADP projeto:名词 DE:ADP Verão:名词 E:CCONJ de:adp

pattern = [
       {'POS': 'NOUN'},
       {'LOWER': 'ADP'},
       ]
    #Matcher class object
matcher = Matcher(nlp.vocab)
matcher.add("matching_1", patterns = [pattern]) 

result = matcher(doc, as_spans=True) 

print(result)

[],

所以我期望pos标签“名词” +'adp'的模式可以找到以下单词: 'MétodoDe', 'caçõesem', 'fase de', 'Projeto de'。

I was hoping to find some patterns with this simple code. But the result is empty.
I'm forgetting something?

for tk in doc[:30]:
     print (tk.text, ':', tk.pos_)

Método : NOUN
de : ADP
avaliaçãoSimulação : NOUN
computacional : ADJ
conforme : ADP
procedimentos : NOUN
apresentados : VERB
em : ADP
: SPACE
Edifi : PROPN
cações : NOUN
em : ADP
fase : NOUN
de : ADP
projetoA : NOUN
avaliação : NOUN
deve : VERB
ser : AUX
feita : VERB
para : ADP
um : NUM
dia : NOUN
típico : ADJ
de : ADP
projeto : NOUN
de : ADP
verão : NOUN
e : CCONJ
de : ADP

pattern = [
       {'POS': 'NOUN'},
       {'LOWER': 'ADP'},
       ]
    #Matcher class object
matcher = Matcher(nlp.vocab)
matcher.add("matching_1", patterns = [pattern]) 

result = matcher(doc, as_spans=True) 

print(result)

[]

So I was expecting the pattern of the POS Tags 'NOUN' + 'ADP' could find the words:
'Método de',
'cações em',
'fase de',
'projeto de'.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

九厘米的零° 2025-02-05 12:02:50

以下规则将匹配与小写时等于“ ADP”的令牌。这将不匹配任何内容,因为“ ADP”不是小写。

{'LOWER': 'ADP'},

我不确定这应该匹配什么,也许您想将小写字与pos = adp匹配?在这种情况下,您需要这样的规则:

{"POS": "ADP", "REGEX": "^[a-z]+$"}

重申我上面说的内容:{'lower':'adp'} 不将小写字与ADP匹配演讲的一部分。您似乎对“较低”的含义或规则的工作方式感到困惑。

让我举个例子。 {“ lower”:“ dog”}将匹配诸如“狗”,“狗”或“狗”之类的单词。它不会与语音“狗”(不存在)的一部分相匹配。 “ lower”:value表示“匹配单词,看起来像value进行小写时”。

如果要匹配具有语音ADP部分的较低案例单词,则应使用上面写的规则,其中Regex bit。

The following rule will match a token that equals "ADP" when made lowercase. This will not match anything because "ADP" is not lowercase.

{'LOWER': 'ADP'},

I am not sure what this is supposed to match, maybe you want to match a lowercase word with POS = ADP? In that case you would want a rule like this:

{"POS": "ADP", "REGEX": "^[a-z]+
quot;}

To restate what I said above: {'LOWER': 'ADP'} does not match a lowercase word with the ADP part of speech. You seem to be confused about what "LOWER" means or how rules work.

Let me give an example. {"LOWER": "dog"} will match words like "Dog", "DOG", or "dog". It will not match words with the part of speech "dog" (which do not exist). "LOWER": value means, "match words which look like value when they are made lowercase".

If you want to match lower case words that have the ADP part of speech, you should use the rule I wrote above with the REGEX bit.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文