PLY:C 解析器中的令牌转移问题

发布于 2024-07-05 13:35:00 字数 641 浏览 9 评论 0原文

我正在使用 PLY 编写一个 C 解析器,最近遇到了一个问题。 此代码:

typedef int my_type;
my_type x;

是正确的 C 代码,因为 my_type 之前被定义为类型 被这样使用。 我通过在中填充类型符号表来处理它 词法分析器使用解析器来区分类型和 简单的标识符。

然而,虽然类型声明规则以 SEMI(“;”标记)结尾,但 PLY 在决定第一行完成之前会从第二行转移标记 my_type。 因此,我没有机会将类型符号表中的更新传递给词法分析器,它 将 my_type 视为标识符而不是类型。

有解决办法吗?

完整代码位于:http://code。 google.com/p/pycparser/source/browse/trunk/src/c_parser.py 不知道如何创建一个更小的例子。

编辑:

问题已解决。 请参阅下面我的解决方案。

I'm writing a C parser using PLY, and recently ran into a problem.
This code:

typedef int my_type;
my_type x;

Is correct C code, because my_type is defined as a type previously to
being used as such. I handle it by filling a type symbol table in the
parser that gets used by the lexer to differentiate between types and
simple identifiers.

However, while the type declaration rule ends with SEMI (the ';' token), PLY shifts the token my_type from the second line before deciding it's done with the first one. Because of this, I have no chance to pass the update in the type symbol table to the lexer and it
sees my_type as an identifier and not a type.

Any ideas for a fix ?

The full code is at: http://code.google.com/p/pycparser/source/browse/trunk/src/c_parser.py
Not sure how I can create a smaller example out of this.

Edit:

Problem solved. See my solution below.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

っ〆星空下的拥抱 2024-07-12 13:35:00

不知道为什么你要在词法分析器中进行这种级别的分析。

词法分析可能应该用于将输入流分离为词法标记(数字、换行、关键字等)。 解析阶段应该进行该级别的分析,包括 typedef 的表查找等。

这就是我一直在我选择的工具 lexx 和 yacc 之间划分职责的方式。

Not sure why you're doing that level of analysis in your lexer.

Lexical analysis should probably be used to separate the input stream into lexical tokens (number, line-change, keyword and so on). It's the parsing phase that should be doing that level of analysis, including table lookups for typedefs and such.

That's the way I've always separated the duties between lexx and yacc, my tools of choice.

画▽骨i 2024-07-12 13:35:00

使用 来自 Dave Beazley(PLY 的创建者)的一些帮助,我的问题得到了解决。

这个想法是使用特殊的子规则并在其中执行操作。 就我而言,我将 declaration 规则拆分为:

def p_decl_body(self, p):
    """ decl_body : declaration_specifiers init_declarator_list_opt
    """
    # <<Handle the declaration here>>        

def p_declaration(self, p):
    """ declaration : decl_body SEMI 
    """
    p[0] = p[1]

在 SEMI 移入后,decl_body 始终在令牌之前减少,因此我的操作会在正确的时间执行。

With some help from Dave Beazley (PLY's creator), my problem was solved.

The idea is to use special sub-rules and do the actions in them. In my case, I split the declaration rule to:

def p_decl_body(self, p):
    """ decl_body : declaration_specifiers init_declarator_list_opt
    """
    # <<Handle the declaration here>>        

def p_declaration(self, p):
    """ declaration : decl_body SEMI 
    """
    p[0] = p[1]

decl_body is always reduced before the token after SEMI is shifted in, so my action gets executed at the correct time.

第几種人 2024-07-12 13:35:00

我认为您需要将 ID 是否为 TYPEID 的检查从 c_lexer.py 移至 c_parser.py。

正如您所说,由于解析器正在向前查找 1 个标记,因此您无法在词法分析器中做出该决定。

相反,更改解析器来检查 ID,以查看它们是否是声明中的 TYPEID,如果不是,则生成错误。

正如 Pax Diablo 在他的出色回答中所说,词法分析器/标记器的工作不是做出有关标记的此类决定。 这就是解析器的工作。

I think you need to move the check for whether an ID is a TYPEID from c_lexer.py to c_parser.py.

As you said, since the parser is looking ahead 1 token, you can't make that decision in the lexer.

Instead, alter your parser to check ID's to see if they are TYPEID's in declarations, and, if they aren't, generate an error.

As Pax Diablo said in his excellent answer, the lexer/tokenizer's job isn't to make those kinds of decisions about tokens. That's the parser's job.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文