模式匹配中的 SPACY 否定运算符
我正在尝试在 spaCy 中编写一个与“黑色”匹配但不与“黑豆”匹配的模式。
我尝试了下面的代码,但它似乎与“black”旁边的标记匹配,只要它不是“bean”。如何修改以仅匹配“黑色”?
nlp = spacy.load("en_core_web_sm")
matcher = Matcher(nlp.vocab)
#pattern = [{"LOWER": "black"}, {"LEMMA": {"NOT_IN": ["bean", "beans"]}}]
pattern = [{"LOWER": "black"}, {"LEMMA": "bean", "OP": "!"}]
matcher.add("blackbeans", [pattern])
doc = nlp("I liked the black beans, but the avocado was black making the whole meal blackish-looking and not good.")
matches = matcher(doc)
for match_id, start, end in matches:
string_id = nlp.vocab.strings[match_id] # Get string representation
span = doc[start:end] # The matched span
print(match_id, string_id, start, end, span.text)
I am trying to write a pattern in spaCy that matches against "black" but not "black beans."
I tried the code below, but it seems to match the token that is next to "black" so long as it is not "bean." How do I modify to match against only "black"?
nlp = spacy.load("en_core_web_sm")
matcher = Matcher(nlp.vocab)
#pattern = [{"LOWER": "black"}, {"LEMMA": {"NOT_IN": ["bean", "beans"]}}]
pattern = [{"LOWER": "black"}, {"LEMMA": "bean", "OP": "!"}]
matcher.add("blackbeans", [pattern])
doc = nlp("I liked the black beans, but the avocado was black making the whole meal blackish-looking and not good.")
matches = matcher(doc)
for match_id, start, end in matches:
string_id = nlp.vocab.strings[match_id] # Get string representation
span = doc[start:end] # The matched span
print(match_id, string_id, start, end, span.text)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
没有办法做到这一点 - 匹配器返回输入模式描述的每个标记。否定模式也不匹配非标记,因此如果“black”是句子中的最后一个标记,则您的模式将失败。
有几种方法可以解决此问题:
There's no way to do this - the Matcher returns every token described by the input pattern. The negation pattern also doesn't match non-tokens, so your pattern will fail if "black" is the last token in a sentence.
There are a couple of ways to work around this: