解析器/词法分析器忽略不完整的语法规则
我有一个用 ocamlyacc 和 ocamllex 编写的解析器和词法分析器。如果要解析的文件提前结束,例如我忘记了行尾的分号,则应用程序不会引发语法错误。我意识到这是因为我正在引发和捕获 EOF,这使得词法分析器忽略未完成的规则,但是我应该如何执行此操作来引发语法错误?
这是我当前的解析器(简化)
%{
let parse_error s = Printf.ksprinf failwith "ERROR: %s" s
%}
%token COLON
%token SEPARATOR
%token SEMICOLON
%token <string> FLOAT
%token <string> INT
%token <string> LABEL
%type <Conf.config> command
%start command
%%
command:
| label SEPARATOR data SEMICOLON { Conf.Pair ($1,$3) }
| label SEPARATOR data_list { Conf.List ($1,$3) }
| label SEMICOLON { Conf.Single ($1) }
label :
| LABEL { Conf.Label $1 }
data :
| label { $1 }
| INT { Conf.Integer $1 }
| FLOAT { Conf.Float $1 }
data_list :
| star_data COMMA star_data data_list_ending
{ $1 :: $3 :: $4 }
data_list_ending:
| COMMA star_data data_list_ending { $2 :: $3 }
| SEMICOLON { [] }
和lexxer(简化),
{
open ConfParser
exception Eof
}
rule token = parse
| ['\t' ' ' '\n' '\010' '\013' '\012']
{ token lexbuf }
| ['0'-'9']+ ['.'] ['0'-'9']* ('e' ['-' '+']? ['0'-'9']+)? as n
{ FLOAT n }
| ['0'-'9']+ as n { INT n }
| '#' { comment lexbuf }
| ';' { SEMICOLON }
| ['=' ':'] { SEPARATOR }
| ',' { COMMA }
| ['_' 'a'-'z' 'A'-'Z']([' ']?['a'-'z' 'A'-'Z' '0'-'9' '_' '-' '.'])* as w
{ LABEL w }
| eof { raise Eof }
and comment = parse
| ['#' '\n'] { token lexbuf }
| _ { comment lexbuf }
示例输入文件,
one = two, three, one-hundred;
single label;
list : command, missing, a, semicolon
一种解决方案是在命令规则中对其自身添加递归调用,并添加一个空规则,所有这些都构建一个列表返回主程序。我想我可能将 Eof 解释为期望和结束条件,而不是词法分析器中的错误,这是正确的吗?
I have a parser and lexer written in ocamlyacc and ocamllex. If the file to parse ends prematurely, as in I forget a semicolon at the end of a line, the application doesn't raise a syntax error. I realize it's because I'm raising and catching EOF and that is making the lexer ignore the unfinished rule, but how should I be doing this to raise a syntax error?
Here is my current parser (simplified),
%{
let parse_error s = Printf.ksprinf failwith "ERROR: %s" s
%}
%token COLON
%token SEPARATOR
%token SEMICOLON
%token <string> FLOAT
%token <string> INT
%token <string> LABEL
%type <Conf.config> command
%start command
%%
command:
| label SEPARATOR data SEMICOLON { Conf.Pair ($1,$3) }
| label SEPARATOR data_list { Conf.List ($1,$3) }
| label SEMICOLON { Conf.Single ($1) }
label :
| LABEL { Conf.Label $1 }
data :
| label { $1 }
| INT { Conf.Integer $1 }
| FLOAT { Conf.Float $1 }
data_list :
| star_data COMMA star_data data_list_ending
{ $1 :: $3 :: $4 }
data_list_ending:
| COMMA star_data data_list_ending { $2 :: $3 }
| SEMICOLON { [] }
and lexxer (simplified),
{
open ConfParser
exception Eof
}
rule token = parse
| ['\t' ' ' '\n' '\010' '\013' '\012']
{ token lexbuf }
| ['0'-'9']+ ['.'] ['0'-'9']* ('e' ['-' '+']? ['0'-'9']+)? as n
{ FLOAT n }
| ['0'-'9']+ as n { INT n }
| '#' { comment lexbuf }
| ';' { SEMICOLON }
| ['=' ':'] { SEPARATOR }
| ',' { COMMA }
| ['_' 'a'-'z' 'A'-'Z']([' ']?['a'-'z' 'A'-'Z' '0'-'9' '_' '-' '.'])* as w
{ LABEL w }
| eof { raise Eof }
and comment = parse
| ['#' '\n'] { token lexbuf }
| _ { comment lexbuf }
example input file,
one = two, three, one-hundred;
single label;
list : command, missing, a, semicolon
One solution, is to add a recursive call in the command rule to itself at the end, and adding an empty rule, all of which build a list to return to the main program. I think I maybe interpreting Eof as a expectation, and ending condition, rather then an error in the lexer, is this correct?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
ocamlyacc
不一定消耗整个输入。如果您想在整个输入不可解析的情况下强制它失败,则需要在语法中匹配EOF
。不要在词法分析器中引发Eof
,而是添加标记EOF
并将start
符号更改为ocamlyacc
does not necessarily consume the whole input. If you want to force it to fail if the whole input is not parse-able, you need to matchEOF
in your grammar. Instead of raisingEof
in you lexer, add a tokenEOF
and change yourstart
symbol to