词性标注和浅解析有什么区别?
我目前正在大学学习自然语言处理课程,但仍然对一些基本概念感到困惑。我从统计自然语言处理基础一书中得到了词性标注的定义:
标记是对句子中的每个单词进行标记(或标记)的任务 及其适当的词性。我们决定每个单词是否是一个 名词、动词、形容词或其他任何形式。
但我在书中找不到浅层解析的定义,因为它也将浅层解析描述为词性标注的实用程序之一。于是我开始在网上搜索,没有找到关于浅层解析的直接解释,但是在 Wikipedia 中:
浅层分析(也称为分块、“轻分析”)是对句子的分析,它识别成分(名词组、动词、动词组等),但不指定它们的内部结构,也不指定它们在句子中的作用。主要句子。
坦率地说,我没有看到区别,但这可能是因为我的英语或只是我不理解简单的基本概念。谁能解释一下浅层解析和词性标注之间的区别吗?浅层解析通常也称为浅层语义解析吗?
先谢谢了。
I'm currently taking a Natural Language Processing course at my University and still confused with some basic concept. I get the definition of POS Tagging from the Foundations of Statistical Natural Language Processing book:
Tagging is the task of labeling (or tagging) each word in a sentence
with its appropriate part of speech. We decide whether each word is a
noun, verb, adjective, or whatever.
But I can't find a definition of Shallow Parsing in the book since it also describe shallow parsing as one of the utilities of POS Tagging. So I began to search the web and found no direct explanation of shallow parsing, but in Wikipedia:
Shallow parsing (also chunking, "light parsing") is an analysis of a sentence which identifies the constituents (noun groups, verbs, verb groups, etc.), but does not specify their internal structure, nor their role in the main sentence.
I frankly don't see the difference, but it may be because of my English or just me not understanding simple basic concept. Can anyone please explain the difference between shallow parsing and POS Tagging? Is shallow parsing often also called Shallow Semantic Parsing?
Thanks before.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
词性标记将为输入句子中的每个单词赋予词性标记。
解析句子(例如使用 stanford pcfg)会将句子转换为一棵树,其叶子将包含 POS 标签(对应于句子中的单词),但树的其余部分会告诉您这些单词是如何连接的连在一起构成一个整体的句子。例如,一个形容词和一个名词可能组合成一个“名词短语”,它可能与另一个形容词组合形成另一个名词短语(例如,快速的棕色狐狸)(这些片段组合的确切方式取决于所讨论的解析器)。< br>
您可以在 http://nlp.stanford.edu:8080/parser/index.jsp 中查看解析器输出的样子
浅层解析器或“分块器”介于两者之间。一个普通的词性标注器确实很快,但不能给你提供足够的信息,而一个完整的解析器则很慢,而且给你的信息太多。词性标注器可以被认为是一个解析器,它只向您返回解析树的最底层。分块器可能被认为是一个解析器,它将解析树的其他层返回给您。有时您只需要知道一堆单词一起形成一个名词短语,但不关心这些单词中树的子结构(即哪些单词是形容词、限定词、名词等以及它们如何组合) 。在这种情况下,您可以使用分块器来准确获取所需的信息,而不是浪费时间为句子生成完整的解析树。
POS tagging would give a POS tag to each and every word in the input sentence.
Parsing the sentence (using the stanford pcfg for example) would convert the sentence into a tree whose leaves will hold POS tags (which correspond to words in the sentence), but the rest of the tree would tell you how exactly these these words are joining together to make the overall sentence. For example an adjective and a noun might combine to be a 'Noun Phrase', which might combine with another adjective to form another Noun Phrase (e.g. quick brown fox) (the exact way the pieces combine depends on the parser in question).
You can see how parser output looks like at http://nlp.stanford.edu:8080/parser/index.jsp
A shallow parser or 'chunker' comes somewhere in between these two. A plain POS tagger is really fast but does not give you enough information and a full blown parser is slow and gives you too much. A POS tagger can be thought of as a parser which only returns the bottom-most tier of the parse tree to you. A chunker might be thought of as a parser that returns some other tier of the parse tree to you instead. Sometimes you just need to know that a bunch of words together form a Noun Phrase but don't care about the sub-structure of the tree within those words (i.e. which words are adjectives, determiners, nouns, etc and how do they combine). In such cases you can use a chunker to get exactly the information you need instead of wasting time generating the full parse tree for the sentence.
词性标记是决定文本中每个标记的类型的过程,例如名词、动词、限定词等。标记可以是单词或标点符号。
同时浅层分析或分块是将文本划分为语法相关组的过程。
位置标记输出
分块输出
POS tagging is a process deciding what is the type of every token from a text, e.g. NOUN, VERB, DETERMINER, etc. Token can be word or punctuation.
Meanwhile shallow parsing or chunking is a process dividing a text into syntactically related group.
Pos Tagging output
Chunking output
约束语法框架是说明性的。在其最简单、最原始的形式中,它采用词性标记文本作为输入,并添加所谓的“部分子句”标记。例如,对于形容词,它可以添加
@NN>
来指示它是 NP 的一部分,且其中心词位于右侧。The Constraint Grammar framework is illustrative. In its simplest, crudest form, it takes as input POS-tagged text, and adds what you could call Part of Clause tags. For an adjective, for example, it could add
@NN>
to indicate that it is part of an NP whose head word is to the right.在POS_tagger中,我们使用“标签集”来标记单词,例如{noun, verb, adj, adv, prob...}
而浅层解析器尝试定义子组件,例如名称实体和句子中的短语,例如
“我目前(在(我的大学)学习自然(语言处理课程))并且(仍然对一些基本概念感到困惑。)”
In POS_tagger, we tag words using a "tagset" like {noun, verb, adj, adv, prob...}
while shallow parser try to define sub-components such as Name Entity and phrases in the sentence like
"I'm currently (taking a Natural (Language Processing course) at (my University)) and (still confused with some basic concept.)"
D. Jurafsky 和 JH Martin 在他们的书中说,浅层解析(部分parse) 是一种解析,它不会从句子中提取所有可能的信息,而只是提取特定情况下有价值的信息。
分块只是浅层解析的方法之一。正如所提到的,它仅提取有关基本非递归短语(例如动词短语或名词短语)的信息。
例如,其他方法生成扁平解析树。这些树可能包含有关词性标签的信息,但推迟了可能需要语义或上下文因素的决策,例如并联附件、协调歧义和名义复合分析。
因此,浅层解析是产生部分解析树的解析。分块就是这种解析的一个例子。
D. Jurafsky and J. H. Martin say in their book, that shallow parse (partial parse) is a parse that doesn't extract all the possible information from the sentence, but just extract valuable in the specific case information.
Chunking is just a one of the approaches to shallow parsing. As it was mentioned, it extracts only information about basic non-recursive phrases (e.g. verb phrases or noun phrases).
Other approaches, for example, produce flatted parse trees. These trees may contain information about part-of-speech tags, but defer decisions that may require semantic or contextual factors, such as PP attachments, coordination ambiguities, and nominal compound analyses.
So, shallow parse is the parse that produce a partial parse tree. Chunking is an example of such parsing.