需要树木和匹配方面的帮助
我已经实现了以下数据结构:
class Node(object):
"""Rules:
A node's child is ONLY an iterable of nodes
A leaf node must NOT have children and MUST have word
"""
def __init__(self, tag, children=[], word=u""):
assert isinstance(tag, unicode) and isinstance(word, unicode)
self.tag=tag
self.word=word
self.parent=None #Set by recursive function
self.children=children #Can only be iterable of nodes now
for child in self.children:
child.parent=self
def matches(self, node):
"""Match RECURSIVELY down!"""
if self.tag == node.tag:
if all( map( lambda t:t[0].matches(t[1]), zip( self.children, node.children))):
if self.word != WILDCARD and node.word != WILDCARD:
return self.word == node.word
else:
return True
return False
def __unicode__(self):
childrenU= u", ".join( map( unicode, self.children))
return u"(%s, %s, %s)" % (self.tag, childrenU, self.word)
def __str__(self):
return unicode(self).encode('utf-8')
def __repr__(self):
return unicode(self)
因此,树基本上是一堆连接在一起的节点。
我正在解析 S-Expression,如下所示: (副总裁 (VP (VC w1) (NP (CP (IP (NP (NN w2)) (副总裁 (ADVP (AD w3)) (VP (VA w4)))) (十二月第 5 周)) (NP (NN w6)))) (ADVP(AD w7)))
所以我有兴趣编写将子树与更大的树相匹配的方法。问题是,子树有通配符,我希望也能够匹配这些字符。
例如: 如果给定一个子树,
(VP
(ADVP (AD X))
(VP (VA Y))))
“匹配”它们的操作应该返回 { X:W3, Y:W4 }
这里有人能够推荐一个有效、简单的解决方案吗?
I have implemented the following data structure:
class Node(object):
"""Rules:
A node's child is ONLY an iterable of nodes
A leaf node must NOT have children and MUST have word
"""
def __init__(self, tag, children=[], word=u""):
assert isinstance(tag, unicode) and isinstance(word, unicode)
self.tag=tag
self.word=word
self.parent=None #Set by recursive function
self.children=children #Can only be iterable of nodes now
for child in self.children:
child.parent=self
def matches(self, node):
"""Match RECURSIVELY down!"""
if self.tag == node.tag:
if all( map( lambda t:t[0].matches(t[1]), zip( self.children, node.children))):
if self.word != WILDCARD and node.word != WILDCARD:
return self.word == node.word
else:
return True
return False
def __unicode__(self):
childrenU= u", ".join( map( unicode, self.children))
return u"(%s, %s, %s)" % (self.tag, childrenU, self.word)
def __str__(self):
return unicode(self).encode('utf-8')
def __repr__(self):
return unicode(self)
So a tree is basically a bunch of these nodes connected together.
I am parsing S-Expression, like this:
(VP
(VP (VC w1)
(NP
(CP
(IP
(NP (NN w2))
(VP
(ADVP (AD w3))
(VP (VA w4))))
(DEC w5))
(NP (NN w6))))
(ADVP (AD w7)))
So I am interested in writing matching a subtree with a bigger tree. The catch is, the subtree has wildcard characters, and I would like to also be able to match these characters.
For example:
If given a subtree,
(VP
(ADVP (AD X))
(VP (VA Y))))
The operation which "matches" both of them should return { X:W3, Y:W4 }
Anyone here able to recommend an effecient, simple solution?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论