NLTK 分块并遍历结果树
我正在使用 NLTK RegexpParser 从标记的标记中提取名词组和动词组。
如何遍历生成的树以仅查找 NP 或 V 组的块?
from nltk.chunk import RegexpParser
grammar = '''
NP: {<DT>?<JJ>*<NN>*}
V: {<V.*>}'''
chunker = RegexpParser(grammar)
token = [] ## Some tokens from my POS tagger
chunked = chunker.parse(tokens)
print chunked
#How do I walk the tree?
#for chunk in chunked:
# if chunk.??? == 'NP':
# print chunk
(S (NP 运营商/NN) 为/IN 组织-/JJ 和/CC 细胞培养/JJ 为/IN (NP/DT准备/NN) 的/IN (NP 植入物/NNS) 和/CC (NP 植入/NN) (含V/VBG) (NP/DT 运营商/NN) ./.)
I'm using NLTK RegexpParser to extract noungroups and verbgroups from tagged tokens.
How do I walk the resulting tree to find only the chunks that are NP or V groups?
from nltk.chunk import RegexpParser
grammar = '''
NP: {<DT>?<JJ>*<NN>*}
V: {<V.*>}'''
chunker = RegexpParser(grammar)
token = [] ## Some tokens from my POS tagger
chunked = chunker.parse(tokens)
print chunked
#How do I walk the tree?
#for chunk in chunked:
# if chunk.??? == 'NP':
# print chunk
(S
(NP Carrier/NN)
for/IN
tissue-/JJ
and/CC
cell-culture/JJ
for/IN
(NP the/DT preparation/NN)
of/IN
(NP implants/NNS)
and/CC
(NP implant/NN)
(V containing/VBG)
(NP the/DT carrier/NN)
./.)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
令牌
中的小错误Small mistake in
token
Savino 的答案很好,但值得注意的是,子树也可以通过索引访问,例如
Savino's answer is great, but it's also worth noting that subtrees can be accessed by index as well, e.g.
这应该有效:
This should work: