如何创建一个解析器来标记从文件中获取的单词列表？

发布于 2024-09-06 17:57:55 字数 764 浏览 13 评论 0原文

我正在尝试为我的编译器类做一个语法文本校正器。我的想法是：我有一些语言固有的规则（在我的例子中是葡萄牙语），例如“有效的短语是主语动词形容词”，如“Ruby is Great”。

好的，首先我必须对输入“Ruby is Great”进行标记。所以我有一个文本文件“verbs”，其中有很多动词，一行一行。然后我有一个文本“形容词”，一个“代词”等。

我正在尝试使用 Ragel 来创建一个解析器，但我不知道如何做类似的事情：

%%{
  machine test;
  subject = <open-the-subjects-file-and-accept-each-one-of-them>;
  verb = <open-the-verbs-file-and-accept-each-one-of-them>;
  adjective = <open-the-adjective-file-and-accept-each-one-of-them>;
  main = subject verb adjective @ { print "Valid phrase!" } ;
}%%

我查看了 ANTLR、Lex/Yacc、Ragel等等。但找不到似乎可以解决这个问题的人。我能想到的唯一方法是预处理 Ragel 的输入文件，以便我的程序读取该文件并将其内容写入正确的位置。但我也不喜欢这个解决方案。

有谁知道我该怎么做？如果不是Ragel也没有问题，我只是想解决这个问题。我想使用 Ruby 或 Python，但这也不是真正必要的。

谢谢。

原文

I am trying to do a syntax text corrector for my compilers' class. The idea is: I have some rules, which are inherent to the language (in my case, Portuguese), like "A valid phrase is SUBJECT VERB ADJECTIVE", as in "Ruby is great".

Ok, so first I have to tokenize the input "Ruby is great". So I have a text file "verbs", with a lot of verbs, one by line. Then I have one text "adjectives", one "pronouns", etc.

I am trying to use Ragel to create a parser, but I don't know how I could do something like:

%%{
  machine test;
  subject = <open-the-subjects-file-and-accept-each-one-of-them>;
  verb = <open-the-verbs-file-and-accept-each-one-of-them>;
  adjective = <open-the-adjective-file-and-accept-each-one-of-them>;
  main = subject verb adjective @ { print "Valid phrase!" } ;
}%%

I looked at ANTLR, Lex/Yacc, Ragel, etc. But couldn't find one that seemed to solve this problem. The only way to do this that I could think of was to preprocess Ragel's input file, so that my program reads the file and writes its contents at the right place. But I don't like this solution either.

Does anyone knows how I could do this? There's no problem if it isn't with Ragel, I just want to solve this problem. I would like to use Ruby or Python, but that's not really necessary either.

Thanks.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

述情 2024-09-13 17:57:55

如果你想在编译时读取文件..使它们的格式为：

subject = \
ruby|\
python|\
c++

然后使用ragel的'include'或'import'语句（我忘记了哪个..必须检查手册）来导入它。

如果您想在运行时检查主题列表，也许只需让 ragel 读取 3 个单词，然后为每个单词关联一个操作。该操作可以读取文件并在运行时查找单词是否正确。

该操作读取文本文件并比较单词的内容。

%%{
machine test

action startWord {
    lastWordStart = p;
}
action checkSubject {
   word = input[lastWordStart:p+1]  
   for possible in open('subjects.txt'):
       if possible == word:
           fgoto verb
   # If we get here do whatever ragel does to go to an error or just raise a python exception 
   raise Exception("Invalid subject '%s'" % word)
}
action checkVerb { .. exercise for reader .. ;) }
action checkAdjective { .. put adjective checking code here .. }

subject = ws*.(alnum*)>startWord%checkSubject
verb := : ws*.(alnum*)>startWord%checkVerb
adjective := ws*.)alnum*)>startWord%checkAdjective
main := subject;
}%%

If you want to read the files at compile time .. make them be of the format:

subject = \
ruby|\
python|\
c++

then use ragel's 'include' or 'import' statement (I forget which .. must check the manual) to import it.

If you want to check the list of subjects at run time, maybe just make ragel read 3 words, then have an action associated with each word. The action can read the file and lookup if the word is good or not at runtime.

The action reads the text file and compares the word's contents.

%%{
machine test

action startWord {
    lastWordStart = p;
}
action checkSubject {
   word = input[lastWordStart:p+1]  
   for possible in open('subjects.txt'):
       if possible == word:
           fgoto verb
   # If we get here do whatever ragel does to go to an error or just raise a python exception 
   raise Exception("Invalid subject '%s'" % word)
}
action checkVerb { .. exercise for reader .. ;) }
action checkAdjective { .. put adjective checking code here .. }

subject = ws*.(alnum*)>startWord%checkSubject
verb := : ws*.(alnum*)>startWord%checkVerb
adjective := ws*.)alnum*)>startWord%checkAdjective
main := subject;
}%%

回复收藏 0 原文