如何操作解析树?

发布于 2024-09-18 09:08:06 字数 1040 浏览 5 评论 0原文

我一直在研究自然语言解析树并以各种方式操作它们。我一直在使用斯坦福大学的 Tregex 和 Tsurgeon 工具,但代码很混乱,并且不太适合我主要使用 Python 的环境(这些工具是 Java,不适合调整)。我想要一个工具集,当我需要更多功能时可以轻松进行黑客攻击。是否有其他工具非常适合在树上进行模式匹配,然后操作这些匹配的分支?

例如,我想将以下树作为输入:

(ROOT
  (S
    (NP
      (NP (NNP Bank))
      (PP (IN of)
        (NP (NNP America))))
    (VP (VBD used)
      (S
        (VP (TO to)
          (VP (VB be)
            (VP (VBN called)
              (NP
                (NP (NNP Bank))
                (PP (IN of)
                  (NP (NNP Italy)))))))))))

并且(这是一个简化的示例):

  1. 查找带有标签 NP 的任何节点,该节点具有带有标签 NP 的第一个子节点和一些名为“Bank”的后代,以及一个第二个孩子的标签是PP。
  2. 如果匹配,则获取 PP 节点的所有子节点并将它们移动到匹配的 NP 子节点的末尾。

例如,将树的这一部分:

(NP
  (NP (NNP Bank))
  (PP (IN of)
    (NP (NNP America))))

并将其变成这样:

(NP
  (NP (NNP Bank) (IN of) (NP (NNP America))))

由于我的输入树是 S 表达式,因此我考虑过使用 Lisp(嵌入到我的 Python 程序中),但是已经太久了,我已经在其中编写了任何重要的内容Lisp,我什至不知道从哪里开始。

描述模式的好方法是什么?描述操作的好方法是什么?思考这个问题的好方法是什么?

I've been playing around with natural language parse trees and manipulating them in various ways. I've been using Stanford's Tregex and Tsurgeon tools but the code is a mess and doesn't fit in well with my mostly Python environment (those tools are Java and aren't ideal for tweaking). I'd like to have a toolset that would allow for easy hacking when I need more functionality. Are there any other tools that are well suited for doing pattern matching on trees and then manipulation of those matched branches?

For example, I'd like to take the following tree as input:

(ROOT
  (S
    (NP
      (NP (NNP Bank))
      (PP (IN of)
        (NP (NNP America))))
    (VP (VBD used)
      (S
        (VP (TO to)
          (VP (VB be)
            (VP (VBN called)
              (NP
                (NP (NNP Bank))
                (PP (IN of)
                  (NP (NNP Italy)))))))))))

and (this is a simplified example):

  1. Find any node with the label NP that has a first child with the label NP and some descendent named "Bank", and a second child with the label PP.
  2. If that matches, then take all of the children of the PP node and move them to end of the matched NP's children.

For example, take this part of the tree:

(NP
  (NP (NNP Bank))
  (PP (IN of)
    (NP (NNP America))))

and turn it into this:

(NP
  (NP (NNP Bank) (IN of) (NP (NNP America))))

Since my input trees are S-expressions I've considered using Lisp (embedded into my Python program) but it's been so long that I've written anything significant in Lisp that I have no idea where to even start.

What would be a good way to describe the patterns? What would be a good way to describe the manipulations? What's a good way to think about this problem?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

娜些时光,永不杰束 2024-09-25 09:08:06

情人眼里出西施。但你从来没有说过 Tregex 或 Turgeon 的代码是多么混乱。听起来更像是您无法处理 Java 或更高的抽象,因此您正在寻找用 Python 编写的具体内容。

手写树匹配和变换函数没有什么问题。事实上,我们以前一直这样做。但在前几百个之后,似乎必须有更好的方法,因此我们转而使用 Tregex 和 Tsurgeon 的特定领域语言。这通常被视为一种值得称赞的编程风格。请参阅维基百科。它们是具有精确语法规范等的明确指定的语言。这是使用它们的示例。

Tree t = Tree.valueOf("(ROOT (S (NP (NP (NNP Bank)) (PP (IN of) (NP (NNP America)))) (VP (VBD used) (S (VP (TO to) (VP (VB be) (VP (VBN called) (NP (NP (NNP Bank)) (PP (IN of) (NP (NNP Italy)))))))))))");
TregexPattern pat = TregexPattern.compile("NP <1 (NP << Bank) <2 PP=remove");
TsurgeonPattern surgery = Tsurgeon.parseOperation("excise remove remove");
Tsurgeon.processPattern(pat, surgery, t).pennPrint();

请注意,Java 代码实际上比 Lisp 代码,正是因为使用了特定于领域的语言。很难看出这怎么可以更简单:指定模式,指定操作,应用。

但是,如果您更喜欢手写方法来匹配树上的模式并将其更改为 Python 中的其他树,那么我们非常欢迎您这样做。

Beauty is in the eye of the beholder. But you never say how the code of Tregex or Tsurgeon is a mess. It sounds more like you can't deal with Java or greater abstraction and so you're looking for something concrete written in Python.

There's nothing wrong with hand-writing tree matching and transformation functions. Indeed, we used to do that all the time. But after the first couple of hundred, it seemed like there had to be a better way, and hence we moved to using the domain-specific languages of Tregex and Tsurgeon. This is generally seen as a laudable programming style. See in Wikipedia. They're well-specified languages with an exact syntax specification, etc. Here is your example using them.

Tree t = Tree.valueOf("(ROOT (S (NP (NP (NNP Bank)) (PP (IN of) (NP (NNP America)))) (VP (VBD used) (S (VP (TO to) (VP (VB be) (VP (VBN called) (NP (NP (NNP Bank)) (PP (IN of) (NP (NNP Italy)))))))))))");
TregexPattern pat = TregexPattern.compile("NP <1 (NP << Bank) <2 PP=remove");
TsurgeonPattern surgery = Tsurgeon.parseOperation("excise remove remove");
Tsurgeon.processPattern(pat, surgery, t).pennPrint();

Note that the Java code is actually shorter than the Lisp code, precisely because of use of the domain-specific language. It's hard to see how this could be simpler: specify pattern, specify operation, apply.

But if you'd prefer to be hand-writing methods that match patterns on trees and change them into other trees in Python, then you're most welcome to go off and do that.

苏佲洛 2024-09-25 09:08:06

这是使用 Lisp 的典型案例。您需要一个将另一个函数映射到树上的函数。

这是一个使用 Common Lisp 的程序匹配示例。 Lisp 中有一些匹配器可以处理列表结构,可以使用它们来代替。使用列表匹配器可以简化示例(有关使用模式匹配器的示例,请参阅我的其他答案)。

代码:

(defun node-children (node)
  (rest node))

(defun node-name (node)
  (second node))

(defun node-type (node)
  (first node))


(defun treemap (tree matcher transformer)
  (cond ((null tree) nil)
        ((consp tree)
         (if (funcall matcher tree)
             (funcall transformer tree)
           (cons (node-type tree)
                 (mapcar (lambda (child)
                           (treemap child matcher transformer))
                         (node-children tree)))))
        (t tree))))

示例:

(defvar *tree*
  '(ROOT
    (S
     (NP
      (NP (NNP Bank))
      (PP (IN of)
          (NP (NNP America))))
     (VP (VBD used)
         (S
          (VP (TO to)
              (VP (VB be)
                  (VP (VBN called)
                      (NP
                       (NP (NNP Bank))
                       (PP (IN of)
                           (NP (NNP Italy))))))))))))



(defun example ()
  (pprint
   (treemap *tree*
            (lambda (node)
              (and (= (length (node-children node)) 2)
                   (eq (node-type (first (node-children node))) 'np)
                   (some (lambda (node)
                           (eq (node-name node) 'bank))
                         (children (first (node-children node))))
                   (eq (first (second (node-children node))) 'pp)))
            (lambda (node)
              (list (node-type node)
                    (append (first (node-children node))
                            (node-children (second (node-children node)))))))))

运行示例:

CL-USER 75 > (example)

(ROOT
 (S
  (NP
   (NP (NNP BANK) (IN OF) (NP (NNP AMERICA))))
  (VP
   (VBD USED)
   (S
    (VP
     (TO TO)
     (VP
      (VB BE)
      (VP
       (VBN CALLED)
       (NP
        (NP
         (NNP BANK)
         (IN OF)
         (NP (NNP ITALY)))))))))))

This is a typical case of using Lisp. You would need a function that maps another function over the tree.

Here is a procedural matching example using Common Lisp. There are matchers in Lisp that work over list structures, which could be used instead. Using a list matcher would simplify the example (see my other answer for an example using a pattern matcher).

The code:

(defun node-children (node)
  (rest node))

(defun node-name (node)
  (second node))

(defun node-type (node)
  (first node))


(defun treemap (tree matcher transformer)
  (cond ((null tree) nil)
        ((consp tree)
         (if (funcall matcher tree)
             (funcall transformer tree)
           (cons (node-type tree)
                 (mapcar (lambda (child)
                           (treemap child matcher transformer))
                         (node-children tree)))))
        (t tree))))

The example:

(defvar *tree*
  '(ROOT
    (S
     (NP
      (NP (NNP Bank))
      (PP (IN of)
          (NP (NNP America))))
     (VP (VBD used)
         (S
          (VP (TO to)
              (VP (VB be)
                  (VP (VBN called)
                      (NP
                       (NP (NNP Bank))
                       (PP (IN of)
                           (NP (NNP Italy))))))))))))



(defun example ()
  (pprint
   (treemap *tree*
            (lambda (node)
              (and (= (length (node-children node)) 2)
                   (eq (node-type (first (node-children node))) 'np)
                   (some (lambda (node)
                           (eq (node-name node) 'bank))
                         (children (first (node-children node))))
                   (eq (first (second (node-children node))) 'pp)))
            (lambda (node)
              (list (node-type node)
                    (append (first (node-children node))
                            (node-children (second (node-children node)))))))))

Running the example:

CL-USER 75 > (example)

(ROOT
 (S
  (NP
   (NP (NNP BANK) (IN OF) (NP (NNP AMERICA))))
  (VP
   (VBD USED)
   (S
    (VP
     (TO TO)
     (VP
      (VB BE)
      (VP
       (VBN CALLED)
       (NP
        (NP
         (NNP BANK)
         (IN OF)
         (NP (NNP ITALY)))))))))))
深海蓝天 2024-09-25 09:08:06

这是 Common Lisp 的第二个版本。这次我使用模式匹配器

我正在使用一个与 Lisp 数据匹配模式的函数。 PMATCH:MATCH 是 Winston/Horn, Lisp, 3rd Edition 书中的模式匹配器的增强版本。有类似的模式匹配功能可用。

数据和我另一个答案里的一样。

树映射函数更改为使用模式匹配器。如果匹配成功,PMATCH:MATCH 函数将返回 T 或绑定的关联列表。如果匹配不成功则返回NIL。 PMATCH:INSTANTIATE-PATTERN 采用一个模式和一组绑定。它返回一个新的列表结构,其中模式变量被绑定替换。

(defun treemapp (tree pattern transformer)
  (cond ((null tree) nil)
        ((consp tree)
         (let ((bindings (pmatch:match pattern tree)))
           (if bindings
               (pmatch:instantiate-pattern transformer bindings)
             (cons (node-type tree)
                   (mapcar (lambda (child)
                             (treemapp child pattern transformer))
                           (node-children tree))))))
        (t tree)))

示例使用 now 模式。

该模式是一个列表结构。 #?symbol 匹配单个项目并创建符号绑定。 #$symbol 匹配项目列表并为符号创建绑定。

转换器是一种将通过绑定实例化的模式。

(defun example1 ()
  (pprint (treemapp *tree*
                    '(NP (NP (#?type bank)) (PP #$children))
                    '(NP (NP (#?type bank) #$children)))))

运行此代码会返回与我的其他答案相同的结果。

Here is a second version in Common Lisp. This time I'm using a pattern matcher.

I'm using a function that matches a pattern against Lisp data. PMATCH:MATCH is an enhanced version of a pattern matcher found in the book Winston/Horn, Lisp, 3rd Edition. There are similar pattern matching functions available.

The data is as in my other answer.

The tree mapping function is changed to use the pattern matcher. The PMATCH:MATCH function returns T or an assoc list of bindings if the match is successful. It returns NIL if the match is not successful. The PMATCH:INSTANTIATE-PATTERN takes a pattern and a set of bindings. It returns a new list structure, where the pattern variables are replaced with the bindings.

(defun treemapp (tree pattern transformer)
  (cond ((null tree) nil)
        ((consp tree)
         (let ((bindings (pmatch:match pattern tree)))
           (if bindings
               (pmatch:instantiate-pattern transformer bindings)
             (cons (node-type tree)
                   (mapcar (lambda (child)
                             (treemapp child pattern transformer))
                           (node-children tree))))))
        (t tree)))

The example uses now patterns.

The pattern is a list structure. #?symbol matches a single item and creates a binding for symbol. #$symbol matches a list of items and creates a binding for symbol.

The transformer is a pattern that will be instantiated with the bindings.

(defun example1 ()
  (pprint (treemapp *tree*
                    '(NP (NP (#?type bank)) (PP #$children))
                    '(NP (NP (#?type bank) #$children)))))

Running this code returns the same result as in my other answer.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文