什么是自然语言处理中的分块器?
有谁知道文本处理上下文中的分块器是什么以及它的用途是什么?
Does anyone know what is a chunker in the context of text processing and what is it's usage?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
根据这些幻灯片,分块是解析的一种替代方法,它提供了部分句法句子的结构,具有有限的树深度,而不是完整的解析。
它比完全解析受到更多限制,但在提取或忽略信息时就足够了,因此被多次使用,因为它比解析更快、更稳健。
幻灯片中提供了更多信息。
更多链接:
According to these slides, chunking is an alternative to parsing that provides a partial syntactic structure of a sentence, with a limited tree depth, as opposed to full on parsing.
It is more limited than full parsing, but is sufficient when it comes to extracting or ignoring information, and is thus many times used, as it's faster and more robust than parsing.
Much more information is available in the slides.
Further links:
我个人并不反对其他答案,但是 Jurafsky 和 Martin给出一个稍微不同的定义。对于他们来说,分块是一种没有递归短语的浅层分析。
他们给出的一个例子是短语“从丹佛起飞的航班”。分块器不会生成的一个解析是“[NP the Flight [PP from [NP Denver]]]”,因为它暗示了具有 NP 递归性的语法。
I don't personally disagree with the other answers, but Jurafsky and Martin give a slightly different definition. For them, chunking is specifically the type of shallow parsing in which there are no recursive phrases.
One example they give is the phrase "the flight from Denver". One parse that would not be generated by a chunker is "[NP the flight [PP from [NP Denver]]]" because it implies a grammar with NP-recursivity.
这是一种非常简单的解析类型,称为浅层解析。 OpenNLP 项目有一个可用的 chunker 模块,您可以查看其文档 实际分块的示例
It's a very simplistic type of parsing, called shallow parsing. The OpenNLP project has a chunker module available, and you can see its documentation for an example of chunking in action