需要树木和匹配方面的帮助

发布于 2024-08-31 13:11:14 字数 1786 浏览 2 评论 0原文

我已经实现了以下数据结构:

class Node(object):
    """Rules:
    A node's child is ONLY an iterable of nodes
    A leaf node must NOT have children and MUST have word
    """
    def __init__(self, tag, children=[], word=u""):
        assert isinstance(tag, unicode) and isinstance(word, unicode)
        self.tag=tag
        self.word=word
        self.parent=None                #Set by recursive function
        self.children=children          #Can only be iterable of nodes now
        for child in self.children:
            child.parent=self

    def matches(self, node):
        """Match RECURSIVELY down!"""
        if self.tag == node.tag:
            if all( map( lambda t:t[0].matches(t[1]), zip( self.children, node.children))):
                if self.word != WILDCARD and node.word != WILDCARD:
                    return self.word == node.word
                else:
                    return True
        return False

    def __unicode__(self):
        childrenU= u", ".join( map( unicode, self.children))
        return u"(%s, %s, %s)" % (self.tag, childrenU, self.word)

    def __str__(self):
        return unicode(self).encode('utf-8')

    def __repr__(self):
        return unicode(self)

因此,树基本上是一堆连接在一起的节点。

我正在解析 S-Expression,如下所示: (副总裁 (VP (VC w1) (NP (CP (IP (NP (NN w2)) (副总裁 (ADVP (AD w3)) (VP (VA w4)))) (十二月第 5 周)) (NP (NN w6)))) (ADVP(AD w7)))

所以我有兴趣编写将子树与更大的树相匹配的方法。问题是,子树有通配符,我希望也能够匹配这些字符。

例如: 如果给定一个子树,

    (VP
      (ADVP (AD X))
      (VP (VA Y))))

“匹配”它们的操作应该返回 { X:W3, Y:W4 }

这里有人能够推荐一个有效、简单的解决方案吗?

I have implemented the following data structure:

class Node(object):
    """Rules:
    A node's child is ONLY an iterable of nodes
    A leaf node must NOT have children and MUST have word
    """
    def __init__(self, tag, children=[], word=u""):
        assert isinstance(tag, unicode) and isinstance(word, unicode)
        self.tag=tag
        self.word=word
        self.parent=None                #Set by recursive function
        self.children=children          #Can only be iterable of nodes now
        for child in self.children:
            child.parent=self

    def matches(self, node):
        """Match RECURSIVELY down!"""
        if self.tag == node.tag:
            if all( map( lambda t:t[0].matches(t[1]), zip( self.children, node.children))):
                if self.word != WILDCARD and node.word != WILDCARD:
                    return self.word == node.word
                else:
                    return True
        return False

    def __unicode__(self):
        childrenU= u", ".join( map( unicode, self.children))
        return u"(%s, %s, %s)" % (self.tag, childrenU, self.word)

    def __str__(self):
        return unicode(self).encode('utf-8')

    def __repr__(self):
        return unicode(self)

So a tree is basically a bunch of these nodes connected together.

I am parsing S-Expression, like this:
(VP
(VP (VC w1)
(NP
(CP
(IP
(NP (NN w2))
(VP
(ADVP (AD w3))
(VP (VA w4))))
(DEC w5))
(NP (NN w6))))
(ADVP (AD w7)))

So I am interested in writing matching a subtree with a bigger tree. The catch is, the subtree has wildcard characters, and I would like to also be able to match these characters.

For example:
If given a subtree,

    (VP
      (ADVP (AD X))
      (VP (VA Y))))

The operation which "matches" both of them should return { X:W3, Y:W4 }

Anyone here able to recommend an effecient, simple solution?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文