scala StdLexical 中的词法换行符?

发布于 2024-08-29 00:25:02 字数 348 浏览 6 评论 0原文

我正在尝试 lex(然后解析)类似 C 的语言。在 C 中,有一些预处理器指令,其中换行符很重要,然后是实际代码,其中换行符只是空格。

实现此目的的一种方法是像早期的 C 编译器一样进行两遍处理 - 对 # 指令有一个单独的预处理器,然后对其输出进行 lex 处理。

但是,我想知道是否可以在单个词法分析器中完成此操作。我对编写 scala 解析器组合器代码非常满意,但我不太确定 StdLexical 如何处理空格。

有人可以编写一些简单的示例代码来表示可以 lex a #include 行(使用换行符)和一些简单的代码(忽略换行符)吗?或者这是不可能的,最好采用 2 遍方法?

I'm trying to lex (then parse) a C like language. In C there are preprocessor directives where line breaks are significant, then the actual code where they are just whitespace.

One way of doing this would be do a two pass process like early C compilers - have a separate preprocessor for the # directives, then lex the output of that.

However, I wondered if it was possible to do it in a single lexer. I'm pretty happy with writing the scala parser-combinator code, but I'm not so sure of how StdLexical handles whitespace.

Could someone write some simple sample code which say could lex a #include line (using the newline) and some trivial code (ignoring the newline)? Or is this not possible, and it is better to go with the 2-pass appproach?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

‖放下 2024-09-05 00:25:02

好吧,我自己解决了这个问题,在这里为后代回答。

在 StdLexical 中,您已经能够在词法分析器中指定空格。您所要做的就是适当地重写您的令牌方法。这是一些示例代码(删除了不相关的位)

override def token: CeeLexer.Parser[Token] = controlLine 
  // | ... (where ... is whatever you want to keep of the original method)
def controlLine = hashInclude

def hashInclude : CeeLexer.Parser[HashInclude] =
  ('#' ~ word("include") ~ rep(nonEolws)~'\"' ~ rep(chrExcept('\"', '\n', EofCh)) ~ '\"' ~ '\n' |
   '#' ~ word("include") ~ rep(nonEolws)~'<' ~ rep(chrExcept('>', '\n', EofCh)) ~ '>' ~ '\n' ) ^^ {
   case hash~include~whs~openQ~fname~closeQ~eol =>  // code to handle #include
 }

OK, solved this myself, answer here for posterity.

In StdLexical you already have the ability to specify whitespace in your lexer. All you have to do is override your token method appropriately. Here is some sample code (with non relevant bits removed)

override def token: CeeLexer.Parser[Token] = controlLine 
  // | ... (where ... is whatever you want to keep of the original method)
def controlLine = hashInclude

def hashInclude : CeeLexer.Parser[HashInclude] =
  ('#' ~ word("include") ~ rep(nonEolws)~'\"' ~ rep(chrExcept('\"', '\n', EofCh)) ~ '\"' ~ '\n' |
   '#' ~ word("include") ~ rep(nonEolws)~'<' ~ rep(chrExcept('>', '\n', EofCh)) ~ '>' ~ '\n' ) ^^ {
   case hash~include~whs~openQ~fname~closeQ~eol =>  // code to handle #include
 }
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文