使用 PLY 为一个解析器提供多个词法分析器?
我正在尝试使用 PLY 为 Kconfig 语言实现一个 python 解析器,该语言用于生成 Linux 内核的配置选项。
有一个名为 source 的关键字执行包含操作,所以我所做的是,当词法分析器遇到此关键字时,我更改词法分析器状态以创建一个新的词法分析器,它将对源文件进行词法分析:
def t_begin_source(t):
r'source '
t.lexer.begin('source')
def t_source_path(t):
r'[^\n]+\n+'
t.lexer.begin('INITIAL')
global path
source_lexer = lex.lex(errorlog=lex.NullLogger())
source_file_name = (path + t.value.strip(' \"\n'))
sourced_file = file(path + t.value.strip(' \"\n')).read()
source_lexer.input(sourced_file)
while True:
tok = source_lexer.token()
if not tok:
break
在其他地方我有此行
lexer = lex.lex(errorlog=lex.NullLogger())
这是将由解析器调用的“main”或“root”词法分析器。
我的问题是我不知道如何告诉解析器使用不同的词法分析器或告诉“source_lexer”返回某些内容...
也许应该使用克隆函数...
谢谢
I'm trying to implement a python parser using PLY for the Kconfig language used to generate the configuration options for the linux kernel.
There's a keyword called source which performs an inclusion, so what i do is that when the lexer encounters this keyword, I change the lexer state to create a new lexer which is going to lex the sourced file:
def t_begin_source(t):
r'source '
t.lexer.begin('source')
def t_source_path(t):
r'[^\n]+\n+'
t.lexer.begin('INITIAL')
global path
source_lexer = lex.lex(errorlog=lex.NullLogger())
source_file_name = (path + t.value.strip(' \"\n'))
sourced_file = file(path + t.value.strip(' \"\n')).read()
source_lexer.input(sourced_file)
while True:
tok = source_lexer.token()
if not tok:
break
Somewhere else I have this line
lexer = lex.lex(errorlog=lex.NullLogger())
This is the "main" or "root" lexer which is going to be called by the parser.
My problem is that I don't know how to tell the parser to use a different lexer or to tell the "source_lexer" to return something...
Maybe the clone function should be used...
Thanks
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
一个有趣的巧合是,来自同一个谷歌搜索的链接引导我找到了这个问题,解释了如何为 PLY 解析器编写您自己的词法分析器。这篇文章简单明了地解释了它,但它是四个实例变量和单个
token
方法的问题。By an interesting coincidence a link from the same Google search that led me to this question explains how to write your own lexer for a PLY parser. The post explains it simply and well, but it's a matter of four instance variables and single
token
method.好的,
所以我所做的是构建所有标记的列表,该列表是在实际解析之前构建的。
解析器不再调用词法分析器,因为您可以在调用解析函数时使用 tokenfunc 参数覆盖解析器使用的 getToken 函数。
我的函数现在是调用以获取下一个令牌的函数,该函数迭代先前构建的令牌列表。
考虑到词法分析,当我遇到源关键字时,我会克隆词法分析器并更改输入以包含该文件。
Ok,
so what i've done is building a list of all the tokens, which is built before the actual parsing.
The parser no longer calls the lexer because you can override the getToken function used by the parser using the tokenfunc parameter when calling the parse function.
and my function which is now the function called to get the next token iterates over the list of tokens previously built.
Considering the lexing, when I encounter a source keyword, I clone my lexer and change the input to include the file.
我不知道 PLY 的细节,但在我构建的其他类似系统中,使用单个词法分析器来管理包含文件堆栈是最有意义的。因此,词法分析器将返回统一的标记流,并在遇到包含文件时打开和关闭它们。
I don't know about the details of PLY, but in other systems like this that I've built, it made the most sense to have a single lexer which managed the stack of include files. So the lexer would return a unified stream of tokens, opening and closing include files as they were encountered.