POS 模式过滤器?
我正在编写一些代码来迭代一组 POS 标签(由 NLTK 中的 pos_tag 生成)来搜索 POS 模式。匹配的 POS 标签集存储在列表中以供以后处理。当然,对于这样的任务已经存在正则表达式样式的模式过滤器,但是一些最初的谷歌搜索没有给我任何东西。
是否有任何代码片段可以为我进行 POS 模式过滤?
谢谢, Dave
编辑:完整的解决方案(使用 RegexParser,消息是任何字符串)
text = nltk.word_tokenize(message)
tags = nltk.pos_tag(text)
grammar = r"""
RULE_1: {<JJ>+<NNP>*<NN>*}
"""
chunker = nltk.RegexpParser(grammar)
chunked = chunker.parse(tags)
def filter(tree):
return (tree.node == "RULE_1")
for s in chunked.subtrees(filter):
print s
查看 http://nltk.googlecode.com/svn/trunk/doc/book/ch07.html 和 http://www.regular-expressions.info/reference.html 了解有关创建规则的更多信息。
I'm writing some code that iterates a set of POS tags (generated by pos_tag in NLTK) to search for POS patterns. Matching sets of POS tags are stored in a list for later processing. Surely a regex-style pattern filter already exists for a task like this, but a couple of initial google searches didn't give me anything.
Are there any code snippets out there that can do my POS pattern filtering for me?
Thanks,
Dave
EDIT: Complete solution (using RegexParser, and where messages is any string)
text = nltk.word_tokenize(message)
tags = nltk.pos_tag(text)
grammar = r"""
RULE_1: {<JJ>+<NNP>*<NN>*}
"""
chunker = nltk.RegexpParser(grammar)
chunked = chunker.parse(tags)
def filter(tree):
return (tree.node == "RULE_1")
for s in chunked.subtrees(filter):
print s
Check out http://nltk.googlecode.com/svn/trunk/doc/book/ch07.html and http://www.regular-expressions.info/reference.html for more on creating the rules.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我认为您正在寻找
RegexpChunkParser
。I think you're looking for
RegexpChunkParser
.