用 C 语言构建词法分析器

发布于 2024-07-24 15:44:12 字数 199 浏览 14 评论 0原文

我想用 C 语言构建一个词法分析器,并且我正在关注 dragon book,我可以理解状态过渡,但如何实施?

有更好的书吗?

事实上,我必须通过多个状态来解析字符串,以便我可以判断该字符串是否可接受!

I want to build a lexer in C and I am following the dragon book, I can understand the state transitions but how to implement them?

Is there a better book?

The fact that I have to parse a string through a number of states so that I can tell whether the string is acceptable or not!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

远昼 2024-07-31 15:44:13

如果您正在寻找比龙书更现代的治疗方法:Andrew W. Appel 和 Maia Ginsburg,《现代》C 语言编译器实现,剑桥大学出版社,2008 年。

第 2 章重点介绍词法分析:词法标记、正则表达式、有限自动机; 非确定性有限自动机; 词法分析器生成器

查看目录

If you're looking for a more modern treatment than the dragon book(s) : Andrew W. Appel and Maia Ginsburg, Modern Compiler Implementation in C, Cambridge University Press, 2008.

Chapter 2 is focused on Lexical Analysis : Lexical tokens, Regular expressions, Finite automata; Nondeterministic Finite Automata; Lexical analyzer generators

Look at the Table of Contents

執念 2024-07-31 15:44:13

程序 flex(lex 的克隆)将为您创建一个词法分析器。

给定一个带有词法分析器规则的输入文件,它将生成一个 C 文件,其中包含这些规则的词法分析器的实现。

因此,您可以检查 flex 的输出,了解如何用 C 编写词法分析器。也就是说,如果您不仅仅想使用 flex 的词法分析器...

The program flex (a clone of lex) will create a lexer for you.

Given an input file with the lexer rules, it will produce a C file with an implementation of a lexer for those rules.

You can thus check the output of flex for how to write a lexer in C. That is, if you don't just want to use flex's lexer...

送君千里 2024-07-31 15:44:12

您可以使用单个状态变量实现简单的状态转换,例如,如果您想循环状态 start->part1->part2->end 那么您可以使用枚举来跟踪当前状态并使用您想要在每个状态下运行的代码的 switch 语句。

enum state { start=1, part1, part2, end} mystate;

// ...
mystate = start;
do {
  switch (mystate) {
    case start:
      // ...
    case part1:
      // ...
    case part2:
      // ...
      if (part2_end_condition) mystate = end; // state++ will also work
      // Note you could also set the state back to part1 on some condition here
      // which creates a loop
      break;
  }
} while (mystate != end);

对于依赖于多个变量的更复杂的状态转换,您应该使用如下表/数组:

var1    var2    var_end    next_state
0       0       0          state1
0       1       0          state2
1       0       0          state3
1       1       0          state4
-1      -1      1          state_end // -1 represents "doesn't matter" here

You can implement simple state transitions with a single state variable, for example if you want to cycle through the states start->part1->part2->end then you can use an enum to keep track of the current state and use a switch statement for the code you want to run in each state.

enum state { start=1, part1, part2, end} mystate;

// ...
mystate = start;
do {
  switch (mystate) {
    case start:
      // ...
    case part1:
      // ...
    case part2:
      // ...
      if (part2_end_condition) mystate = end; // state++ will also work
      // Note you could also set the state back to part1 on some condition here
      // which creates a loop
      break;
  }
} while (mystate != end);

For more complex state transitions that depend on several variables, you should use tables/arrays like this:

var1    var2    var_end    next_state
0       0       0          state1
0       1       0          state2
1       0       0          state3
1       1       0          state4
-1      -1      1          state_end // -1 represents "doesn't matter" here
メ斷腸人バ 2024-07-31 15:44:12

天啊,

假设您指的是有关编译器设计的《The Dragon》一书,我建议您浏览一下此页面 关于编译工具。

该页面本身很小,但提供了有关词法分析器的各种优秀资源的链接。

HTH

欢呼,

G'day,

Assuming you mean The Dragon book on compiler design, I'd recommend having a look around this page on compiler tools.

The page itself is quite small but has links through to various excellent resources on lexical analysers.

HTH

cheers,

把人绕傻吧 2024-07-31 15:44:12

有不止一种方法可以做到这一点。 每个正则表达式都直接对应于一个简单的结构化程序。 例如,数字的表达式可能是这样的:

// regular expression
digit* [.digit*]

相应的 C 代码将是:

// corresponding code
while(DIGIT(*pc)) pc++;
if (*pc=='.'){
    pc++;
    while(DIGIT(*pc)) pc++;
}

在我看来,构建词法分析器的转换表方式不必要地复杂,并且显然运行速度较慢。

There's more than one way to do it. Every regular expression corresponds directly to a simple structured program. For example, an expression for numbers could be this:

// regular expression
digit* [.digit*]

and the corresponding C code would be:

// corresponding code
while(DIGIT(*pc)) pc++;
if (*pc=='.'){
    pc++;
    while(DIGIT(*pc)) pc++;
}

The transition-table way of building lexers is, in my opinion, needlessly complicated, and obviously runs slower.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文