解析器/词法分析器忽略不完整的语法规则

发布于 2024-08-06 00:47:21 字数 2259 浏览 11 评论 0原文

我有一个用 ocamlyacc 和 ocamllex 编写的解析器和词法分析器。如果要解析的文件提前结束，例如我忘记了行尾的分号，则应用程序不会引发语法错误。我意识到这是因为我正在引发和捕获 EOF，这使得词法分析器忽略未完成的规则，但是我应该如何执行此操作来引发语法错误？

这是我当前的解析器（简化）

%{
    let parse_error s = Printf.ksprinf failwith "ERROR: %s" s
%}

%token COLON
%token SEPARATOR
%token SEMICOLON
%token <string> FLOAT
%token <string> INT
%token <string> LABEL

%type <Conf.config> command
%start command
%%
  command:
      | label SEPARATOR data SEMICOLON    { Conf.Pair ($1,$3)     }
      | label SEPARATOR data_list         { Conf.List ($1,$3)     }
      | label SEMICOLON                   { Conf.Single ($1)      }
  label :
      | LABEL                             { Conf.Label $1         }
  data :
      | label                             { $1                    }
      | INT                               { Conf.Integer $1       }
      | FLOAT                             { Conf.Float $1         }
  data_list :
      | star_data COMMA star_data data_list_ending
                                          { $1 :: $3 :: $4        }
  data_list_ending:
      | COMMA star_data data_list_ending  { $2 :: $3              }
      | SEMICOLON                         { []                    }

和lexxer（简化），

{
    open ConfParser
    exception Eof
}

rule token = parse
    | ['\t' ' ' '\n' '\010' '\013' '\012']
                        { token lexbuf   }
    | ['0'-'9']+ ['.'] ['0'-'9']* ('e' ['-' '+']? ['0'-'9']+)? as n
                        { FLOAT n        }
    | ['0'-'9']+ as n   { INT n          }
    | '#'               { comment lexbuf }
    | ';'               { SEMICOLON      }
    | ['=' ':']         { SEPARATOR      }
    | ','               { COMMA          }
    | ['_' 'a'-'z' 'A'-'Z']([' ']?['a'-'z' 'A'-'Z' '0'-'9' '_' '-' '.'])* as w
                        { LABEL w        }
    | eof               { raise Eof      }

and comment = parse
    | ['#' '\n']        { token lexbuf   }
    | _                 { comment lexbuf }

示例输入文件，

one = two, three, one-hundred;
single label;
list : command, missing, a, semicolon

一种解决方案是在命令规则中对其自身添加递归调用，并添加一个空规则，所有这些都构建一个列表返回主程序。我想我可能将 Eof 解释为期望和结束条件，而不是词法分析器中的错误，这是正确的吗？

原文

I have a parser and lexer written in ocamlyacc and ocamllex. If the file to parse ends prematurely, as in I forget a semicolon at the end of a line, the application doesn't raise a syntax error. I realize it's because I'm raising and catching EOF and that is making the lexer ignore the unfinished rule, but how should I be doing this to raise a syntax error?

Here is my current parser (simplified),

%{
    let parse_error s = Printf.ksprinf failwith "ERROR: %s" s
%}

%token COLON
%token SEPARATOR
%token SEMICOLON
%token <string> FLOAT
%token <string> INT
%token <string> LABEL

%type <Conf.config> command
%start command
%%
  command:
      | label SEPARATOR data SEMICOLON    { Conf.Pair ($1,$3)     }
      | label SEPARATOR data_list         { Conf.List ($1,$3)     }
      | label SEMICOLON                   { Conf.Single ($1)      }
  label :
      | LABEL                             { Conf.Label $1         }
  data :
      | label                             { $1                    }
      | INT                               { Conf.Integer $1       }
      | FLOAT                             { Conf.Float $1         }
  data_list :
      | star_data COMMA star_data data_list_ending
                                          { $1 :: $3 :: $4        }
  data_list_ending:
      | COMMA star_data data_list_ending  { $2 :: $3              }
      | SEMICOLON                         { []                    }

and lexxer (simplified),

{
    open ConfParser
    exception Eof
}

rule token = parse
    | ['\t' ' ' '\n' '\010' '\013' '\012']
                        { token lexbuf   }
    | ['0'-'9']+ ['.'] ['0'-'9']* ('e' ['-' '+']? ['0'-'9']+)? as n
                        { FLOAT n        }
    | ['0'-'9']+ as n   { INT n          }
    | '#'               { comment lexbuf }
    | ';'               { SEMICOLON      }
    | ['=' ':']         { SEPARATOR      }
    | ','               { COMMA          }
    | ['_' 'a'-'z' 'A'-'Z']([' ']?['a'-'z' 'A'-'Z' '0'-'9' '_' '-' '.'])* as w
                        { LABEL w        }
    | eof               { raise Eof      }

and comment = parse
    | ['#' '\n']        { token lexbuf   }
    | _                 { comment lexbuf }

example input file,

one = two, three, one-hundred;
single label;
list : command, missing, a, semicolon

One solution, is to add a recursive call in the command rule to itself at the end, and adding an empty rule, all of which build a list to return to the main program. I think I maybe interpreting Eof as a expectation, and ending condition, rather then an error in the lexer, is this correct?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

沉溺在你眼里的海 2024-08-13 00:47:21

ocamlyacc 不一定消耗整个输入。如果您想在整个输入不可解析的情况下强制它失败，则需要在语法中匹配 EOF 。不要在词法分析器中引发 Eof，而是添加标记 EOF 并将 start 符号更改为

%type <Conf.config list> main

main:
    EOF { [] }
  | command main { $1::$2 }

ocamlyacc does not necessarily consume the whole input. If you want to force it to fail if the whole input is not parse-able, you need to match EOF in your grammar. Instead of raising Eof in you lexer, add a token EOF and change your start symbol to

%type <Conf.config list> main

main:
    EOF { [] }
  | command main { $1::$2 }

回复收藏 0 原文

~没有更多了~

关于作者

月野兔

暂无简介

文章

26 人气

关注发私信

友情链接

文江博客

解析器/词法分析器忽略不完整的语法规则

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

知足的幸福

我一向站在原地

慕烟庭风

秉忠贞之诚守退让之实

小兔几

mb_3y7WUgWY

友情链接

解析器/词法分析器忽略不完整的语法规则

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

知足的幸福

我一向站在原地

慕烟庭风

秉忠贞之诚 守退让之实

小兔几

mb_3y7WUgWY

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

秉忠贞之诚守退让之实