尝试使用 HPSG PET 解析器

发布于 2024-09-27 19:25:20 字数 145 浏览 11 评论 0原文

我正在尝试使用 PET 解析器,但给定的使用文档不足。谁能给我推荐一篇关于使用 PET 的好文章或教程?支持UTF-8吗?

I'm trying to use the PET Parser, but the given documentation for usage is insufficient. Can anyone point me to a good article or tutorial on using PET? Does it support UTF-8?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

深府石板幽径 2024-10-04 19:25:20

要使用 PET 解析器,首先必须加载感兴趣语言的语法。语法必须使用 TDL 语言编写,如 DELPH-IN 联盟中使用的语言维基此处)。大型、兼容的语法可用于多种语言,包括英语、日语和德语。还有更小的语法可用,您可以编写自己的语法。

为此,以及使用这些语法,您最好的选择是 Ann Copestake 的书, “实现类型化特征结构语法” (CSLI 2002)。本书全面介绍了 TDL 和语法,例如通过类型化特征结构的统一发挥作用的语法。语法支持语法(表面字符串)和语义(根据 Copestake 的 MRS 表示的“含义”——最小递归语义)之间的双向映射。请注意,这些是精确语法,这意味着它们通常比统计系统更难容忍不符合语法的输入。

英语资源语法 (ERG) 是一个大型英语语法,具有广泛的通用领域覆盖范围。它是开源的,您可以从网站下载。可以在此处找到由 PET 解析器提供支持的在线演示。

PET 解析器分两步运行。第一个称为flop,生成语法的“编译”版本。第二步是实际解析,使用cheap程序。您需要为您的 Linux 机器获取这两个 PET 二进制文件,或者自己构建它们。如果您不熟悉在 Linux 上构建软件,这一步可能并不容易。 PET 不能在 Windows(或 Mac,据我所知)上运行。

运行翻牌很容易。只需转到 /erg 目录,然后输入:

$ flop english.tdl

这将生成 english.grm 文件。现在您可以通过运行cheap来解析句子:

$ echo the child has the flu. | cheap --mrs english.grm

此示例以 MRS(最小递归语义)格式生成句子的单个语义表示:

 [ LTOP: h1
   INDEX: e2 [ e SF: PROP TENSE: PRES MOOD: INDICATIVE PROG: - PERF: - ]
   RELS: <
          [ _the_q_rel<-1:-1>
            LBL: h3
            ARG0: x6 [ x PERS: 3 NUM: SG IND: + ]
            RSTR: h5
            BODY: h4 ]
          [ "_child_n_1_rel"<-1:-1>
            LBL: h7
            ARG0: x6 ]
          [ "_have_v_1_rel"<-1:-1>
            LBL: h8
            ARG0: e2
            ARG1: x6
            ARG2: x9 [ x PERS: 3 NUM: SG ] ]
          [ _the_q_rel<-1:-1>
            LBL: h10
            ARG0: x9
            RSTR: h12
            BODY: h11 ]
          [ "_flu_n_1_rel"<-1:-1>
            LBL: h13
            ARG0: x9 ] >
   HCONS: < h5 qeq h7 h12 qeq h13 > ]

Copestake 的书解释了兼容语法中使用的特定语法和语言形式主义与 PET。它还可以作为开源LKB系统的用户手册,这是一个更具交互性的系统,也可以用这些语法进行解析。除了解析之外,LKB 还可以执行相反的操作:从 MRS 语义表示生成句子。目前仅 Linux/Unix 支持 LKB。实际上共有四种符合 DELPH-IN 的语法处理引擎,包括 LKB 和 PET。

对于Windows,有agree,一个多线程解析器/我为 .NET 开发的生成器(以及此处);它还支持生成和解析。如果您需要交互地使用语法,您可能需要考虑使用 LKB 或 agree 除了 PET 之外(或代替 PET)。 agree 的交互式客户端前端主要基于 WPF,但引擎和简单的控制台客户端可以在任何 Mono 平台上运行。

ACE 是另一个开源的 DELPH-IN 兼容解析和生成系统,专为高性能而设计,并且它适用于 Linux 和 MacOS。

LKB 是用 Lisp 编写的,而 PET 和 ACE 是 C/C++,因此后者是用于生产使用的更快的解析器。 agree 也比 LKB 快得多,但仅在解析复杂句子时才比 PET 更快,其中 agree 无锁并发的开销会被摊销。

[2011年11月25日编辑:同意现在支持生成和解析]

To use the PET parser, first you have to load a grammar for the language of interest. The grammar must be authored in the TDL language, as used in the DELPH-IN consortium (wiki here). Large, compatible grammars are available for several languages, including English, Japanese, and German. There are also smaller grammars available, and you can write your own.

For this--and for working with these grammars--your best bet is Ann Copestake's book, "Implementing Typed Feature Structure Grammars" (CSLI 2002). The book provides a thorough introduction to TDL and grammars such as these which function via the unification of typed feature structures. The grammars support bidirectional mapping between syntax (surface strings) and semantics ("meaning," represented according to Copestake's MRS--Minimal Recursion Semantics). Note that these are precision grammars, which means that they are generally less tolerant of ungrammatical inputs than statistical systems.

The English Resource Grammar (ERG) is a large grammar of English which has broad, general-domain coverage. It's open source and you can download it from the website. An online demo, powered by the PET parser, can be found here.

The PET parser runs in two steps. The first, called flop produces a "compiled" version of the grammar. The second step is the actual parsing, which uses the cheap program. You will need to obtain these two PET binaries for your Linux machine, or build them yourself. This step may not be easy if you're not familiar with building software on Linux. PET does not run on Windows (or Mac, to my knowledge).

Running flop is easy. Just go to your /erg directory, and type:

$ flop english.tdl

This will produce the english.grm file. Now you can parse sentences by running cheap:

$ echo the child has the flu. | cheap --mrs english.grm

This example produces a single semantic representation of the sentence in MRS (Minimal Recursion Semantics) format:

 [ LTOP: h1
   INDEX: e2 [ e SF: PROP TENSE: PRES MOOD: INDICATIVE PROG: - PERF: - ]
   RELS: <
          [ _the_q_rel<-1:-1>
            LBL: h3
            ARG0: x6 [ x PERS: 3 NUM: SG IND: + ]
            RSTR: h5
            BODY: h4 ]
          [ "_child_n_1_rel"<-1:-1>
            LBL: h7
            ARG0: x6 ]
          [ "_have_v_1_rel"<-1:-1>
            LBL: h8
            ARG0: e2
            ARG1: x6
            ARG2: x9 [ x PERS: 3 NUM: SG ] ]
          [ _the_q_rel<-1:-1>
            LBL: h10
            ARG0: x9
            RSTR: h12
            BODY: h11 ]
          [ "_flu_n_1_rel"<-1:-1>
            LBL: h13
            ARG0: x9 ] >
   HCONS: < h5 qeq h7 h12 qeq h13 > ]

Copestake's book explains the specific syntax and linguistic formalism used in grammars that are compatible with PET. It also serves as a user's manual for the open-source LKB system, which is a more interactive system that can also parse with these grammars. In addition to parsing, the LKB can do the reverse: generate sentences from MRS semantic representations. The LKB is currently only supported on Linux/Unix. There are actually a total of four DELPH-IN compliant grammar processing engines, including LKB and PET.

For Windows, there is agree, a multi-threaded parser/generator (and here) that I've developed for .NET; it also supports both generation and parsing. If you need to work with the grammars interactively, you might want to consider using the LKB or agree in addition to--or instead of--PET. The interactive client front-ends for agree are mostly WPF-based, but the engine and a simple console client can run on any Mono platform.

ACE is another open-source DELPH-IN compatible parsing and generation system which is designed for high performance, and it is available for Linux and MacOS.

The LKB is written in Lisp, whereas PET and ACE are C/C++, so the latter are the faster parsers for production use. agree is also much faster than the LKB, but only becomes faster than PET when parsing complex sentences, where overheads from agree's lock-free concurrency become amortized.

[11/25/2011 edit: agree now supports generation as well as parsing]

陪你搞怪i 2024-10-04 19:25:20

PET 确实支持 UTF-8,具体取决于编译时的配置方式。除了 wiki 页面之外,还可以查看或将问题发布到邮件列表。

确实存在多种输入法,我推荐 FSC (XML) 或 YY(s-exp),因为它们是最现代的。我不知道有任何简短的教程,但您也可以查看 Heart of Gold 以获得完整的端到端NLP 包,其中 PET 是一个组件。

你用 ERG 解析吗?

PET does support UTF-8, depending on how it was configured when compiled. In addition to the wiki page, also take a look or post a question to the mailing list.

Several input methods do exist, I would recommend FSC (XML) or YY(s-exp) for being the most modern. Im unaware of any short tutorials, but you could also look at Heart of Gold for a complete end-to-end NLP package, where PET is a component.

Are you parsing with the ERG?

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文