OCaml lex:无论如何根本不起作用

发布于 2024-10-27 09:04:43 字数 754 浏览 6 评论 0原文

我已经束手无策了。我无法在 ocamllex 中进行任何操作,这让我发疯。这是我的 .mll 文件:

{

open Parser

}

rule next = parse
  | (['a'-'z'] ['a'-'z']*) as id { Identifier id }
  | '=' { EqualsSign }
  | ';' { Semicolon }
  | '\n' | ' ' { next lexbuf }
  | eof { EOF }

这是我作为输入传入的文件的内容:

a=b;

然而,当我编译并运行该文件时,我在第一个字符处收到错误,说它不是有效的。老实说,我不知道发生了什么事,谷歌根本没有帮助我。这怎么可能?正如你所看到的,我真的被难住了。

编辑:

我工作了很长时间,以至于我放弃了解析器。现在这是我的主文件中的相关代码:

let parse_file filename =
  let l = Lexing.from_channel (open_in filename) in
    try
      Lexer.next l; ()
    with
      | Failure msg ->
        printf "line: %d, col: %d\n" l.lex_curr_p.pos_lnum l.lex_curr_p.pos_cnum

打印出“行:1,列:1”。

I am at the end of my rope here. I cannot get anything to work in ocamllex, and it is driving me nuts. This is my .mll file:

{

open Parser

}

rule next = parse
  | (['a'-'z'] ['a'-'z']*) as id { Identifier id }
  | '=' { EqualsSign }
  | ';' { Semicolon }
  | '\n' | ' ' { next lexbuf }
  | eof { EOF }

Here are the contents of the file I pass in as input:

a=b;

Yet, when I compile and run the thing, I get an error on the very first character, saying it's not valid. I honestly have no idea what's going on, and Google has not helped me at all. How can this even be possible? As you can see, I'm really stumped here.

EDIT:

I was working for so long that I gave up on the parser. Now this is the relevant code in my main file:

let parse_file filename =
  let l = Lexing.from_channel (open_in filename) in
    try
      Lexer.next l; ()
    with
      | Failure msg ->
        printf "line: %d, col: %d\n" l.lex_curr_p.pos_lnum l.lex_curr_p.pos_cnum

Prints out "line: 1, col: 1".

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

简单爱 2024-11-03 09:04:43

如果没有相应的 ocamlyacc 解析器,没有人能够找到您的代码的问题,因为您的词法分析器工作得很好!

我冒昧地编写了以下小型解析器(parser.mly),它构造了标识符对列表,例如输入“a=b;”应该给出单例列表 [("a", "b")]。

%{%}

%token <string> Identifier
%token EqualsSign
%token Semicolon
%token EOF

%start start
%type <(string * string) list> start

%%

start:
| EOF {[]}
| Identifier EqualsSign Identifier Semicolon start {($1, $3) :: $5}
;

%%

为了测试解析器是否执行我所承诺的操作,我们创建另一个文件 (main.ml) 来解析字符串“a=b;”并打印结果。

let print_list = List.iter (fun (a, b) -> Printf.printf "%s = %s;\n" a b)
let () = print_list (Parser.start Lexer.next (Lexing.from_string "a=b;"))

代码应该编译(例如 ocamlbuild main.byte)而没有任何抱怨,并且程序应该输出“a=b;”正如所承诺的。


回应最新的编辑:

一般来说,我不认为捕获旨在指示失败或误用(例如 Invalid_argument 或 Failure)的标准库异常是一个好主意。原因是它们在整个库中无处不在,因此您通常无法分辨哪个函数引发了异常以及为什么会这样做。

此外,您正在丢弃唯一有用的信息:错误消息!错误消息应该告诉您问题的根源是什么(我最好的猜测是与 IO 相关的问题)。因此,您应该打印错误消息或让异常传播到顶层。就我个人而言,我更喜欢后一种选择。

但是,您可能仍然希望以优雅的方式处理语法错误的输入。为此,您可以在词法分析器中定义一个新的异常,并添加一个捕获无效标记的默认情况。

{
  exception Unexpected_token
}
...
| _ {raise Unexpected_token}

现在,您可以在主文件中捕获新定义的异常,并且与以前不同的是,该异常特定于语法无效的输入。因此,您知道异常的来源和原因,让您有机会做一些比以前更有意义的事情。

一个相当随机的 OCaml 开发提示:如果您在启用调试信息的情况下编译程序,请设置环境变量 OCAMLRUNPARAM 到“b”(例如导出 OCAMLRUNPARAM=b)启用未捕获异常的堆栈跟踪!

Without the corresponding ocamlyacc parser, nobody will be able to find the issue with your code since your lexer works perfectly fine!

I have taken the liberty of writing the following tiny parser (parser.mly) that constructs a list of identifier pairs, e.g. input "a=b;" should give the singleton list [("a", "b")].

%{%}

%token <string> Identifier
%token EqualsSign
%token Semicolon
%token EOF

%start start
%type <(string * string) list> start

%%

start:
| EOF {[]}
| Identifier EqualsSign Identifier Semicolon start {($1, $3) :: $5}
;

%%

To test whether the parser does what I promised, we create another file (main.ml) that parses the string "a=b;" and prints the result.

let print_list = List.iter (fun (a, b) -> Printf.printf "%s = %s;\n" a b)
let () = print_list (Parser.start Lexer.next (Lexing.from_string "a=b;"))

The code should compile (e.g. ocamlbuild main.byte) without any complaints and the program should output "a=b;" as promised.


In response to the latest edit:

In general, I don't believe that catching standard library exceptions that are meant to indicate failure or misuse (like Invalid_argument or Failure) is a good idea. The reason is that they are used ubiquitously throughout the library such that you usually cannot tell which function raised the exception and why it did so.

Furthermore, you are throwing away the only useful information: the error message! The error message should tell you what the source of the problem is (my best guess is an IO-related issue). Thus, you should either print the error message or let the exception propagate to the toplevel. Personally, I prefer the latter option.

However, you probably still want to deal with syntactically ill-formed inputs in a graceful manner. For this, you can define a new exception in the lexer and add a default case that catches invalid tokens.

{
  exception Unexpected_token
}
...
| _ {raise Unexpected_token}

Now, you can catch the newly defined exception in your main file and, unlike before, the exception is specific to syntactically invalid inputs. Consequently, you know both the source and the cause of the exception giving you the chance to do something far more meaningful than before.

A fairly random OCaml development hint: If you compile the program with debug information enabled, setting the environment variable OCAMLRUNPARAM to "b" (e.g. export OCAMLRUNPARAM=b) enables stack traces for uncaught exceptions!

美羊羊 2024-11-03 09:04:43

顺便提一句。 ocamllex 还可以在正则表达式中对“一个或多个”执行 + 运算符,因此这

['a'-'z']+

相当于您的

['a'-'z']['a'-'z']*

btw. ocamllex also can do the + operator for 'one or more' in regular expressions, so this

['a'-'z']+

is equivalent to your

['a'-'z']['a'-'z']*
千里故人稀 2024-11-03 09:04:43

我只是在努力解决同样的问题(这就是我发现这个问题的方式),最后才意识到我错误地将输入文件的路径指定为 Sys.argv.(0) 而不是 <代码>Sys.argv.(1)!哈哈,

我真的希望它有帮助! :)

I was just struggling with the same thing (which is how I found this question), only to finally realize that I had mistakenly specified the path to input file as Sys.argv.(0) instead of Sys.argv.(1)! LOLs

I really hope it helps! :)

自由如风 2024-11-03 09:04:43

看起来标识符的正则表达式中有一个空格。这可能会阻止词法分析器识别 a=b ,尽管它仍然应该识别 a = b ;

It looks like you have a space in the regular expression for identifiers. This could keep the lexer from recognizing a=b, although it should still recognize a = b ;

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文