如何使用 pyparsing 解析缩进和缩进?
以下是 Python 语法的子集:(
single_input: NEWLINE | simple_stmt | compound_stmt NEWLINE
stmt: simple_stmt | compound_stmt
simple_stmt: small_stmt (';' small_stmt)* [';'] NEWLINE
small_stmt: pass_stmt
pass_stmt: 'pass'
compound_stmt: if_stmt
if_stmt: 'if' test ':' suite ('elif' test ':' suite)* ['else' ':' suite]
suite: simple_stmt | NEWLINE INDENT stmt+ DEDENT
您可以在 Python SVN 存储库中阅读完整语法:http://svn.python.org/.../Grammar)
我正在尝试使用此语法在 Python 中生成 Python 解析器。我遇到的问题是如何将 INDENT 和 DEDENT 标记表示为 pyparsing 对象。
以下是我实现其他终端的方法:
import pyparsing as p
string_start = (p.Literal('"""') | "'''" | '"' | "'")
string_token = ('\\' + p.CharsNotIn("",exact=1) | p.CharsNotIn('\\',exact=1))
string_end = p.matchPreviousExpr(string_start)
terminals = {
'NEWLINE': p.Literal('\n').setWhitespaceChars(' \t')
.setName('NEWLINE').setParseAction(terminal_action('NEWLINE')),
'ENDMARKER': p.stringEnd.copy().setWhitespaceChars(' \t')
.setName('ENDMARKER').setParseAction(terminal_action('ENDMARKER')),
'NAME': (p.Word(p.alphas + "_", p.alphanums + "_", asKeyword=True))
.setName('NAME').setParseAction(terminal_action('NAME')),
'NUMBER': p.Combine(
p.Word(p.nums) + p.CaselessLiteral("l") |
(p.Word(p.nums) + p.Optional("." + p.Optional(p.Word(p.nums))) | "." + p.Word(p.nums)) +
p.Optional(p.CaselessLiteral("e") + p.Optional(p.Literal("+") | "-") + p.Word(p.nums)) +
p.Optional(p.CaselessLiteral("j"))
).setName('NUMBER').setParseAction(terminal_action('NUMBER')),
'STRING': p.Combine(
p.Optional(p.CaselessLiteral('u')) +
p.Optional(p.CaselessLiteral('r')) +
string_start + p.ZeroOrMore(~string_end + string_token) + string_end
).setName('STRING').setParseAction(terminal_action('STRING')),
# I can't find a good way of parsing indents/dedents.
# The Grammar just has the tokens NEWLINE, INDENT and DEDENT scattered accross the rules.
# A single NEWLINE would be translated to NEWLINE + PEER (from pyparsing.indentedBlock()), unless followed by INDENT or DEDENT
# That NEWLINE and IN/DEDENT could be spit across rule boundaries. (see the 'suite' rule)
'INDENT': (p.LineStart() + p.Optional(p.Word(' '))).setName('INDENT'),
'DEDENT': (p.LineStart() + p.Optional(p.Word(' '))).setName('DEDENT')
}
terminal_action
是一个函数,它根据其参数返回相应的解析操作。
我知道 pyparsing.indentedBlock 辅助函数,但我不知道如何在没有 PEER 标记的情况下将其采用到语法中。
(查看pyparsing源代码看看我在说什么)
您可以在这里查看我的完整源代码:http://pastebin.ca/1609860
Here is a subset of the Python grammar:
single_input: NEWLINE | simple_stmt | compound_stmt NEWLINE
stmt: simple_stmt | compound_stmt
simple_stmt: small_stmt (';' small_stmt)* [';'] NEWLINE
small_stmt: pass_stmt
pass_stmt: 'pass'
compound_stmt: if_stmt
if_stmt: 'if' test ':' suite ('elif' test ':' suite)* ['else' ':' suite]
suite: simple_stmt | NEWLINE INDENT stmt+ DEDENT
(You can read the full grammar in the Python SVN repository: http://svn.python.org/.../Grammar)
I am trying to use this grammar to generate a parser for Python, in Python. What I am having trouble with is how to express the INDENT
and DEDENT
tokens as pyparsing objects.
Here is how I have implemented the other terminals:
import pyparsing as p
string_start = (p.Literal('"""') | "'''" | '"' | "'")
string_token = ('\\' + p.CharsNotIn("",exact=1) | p.CharsNotIn('\\',exact=1))
string_end = p.matchPreviousExpr(string_start)
terminals = {
'NEWLINE': p.Literal('\n').setWhitespaceChars(' \t')
.setName('NEWLINE').setParseAction(terminal_action('NEWLINE')),
'ENDMARKER': p.stringEnd.copy().setWhitespaceChars(' \t')
.setName('ENDMARKER').setParseAction(terminal_action('ENDMARKER')),
'NAME': (p.Word(p.alphas + "_", p.alphanums + "_", asKeyword=True))
.setName('NAME').setParseAction(terminal_action('NAME')),
'NUMBER': p.Combine(
p.Word(p.nums) + p.CaselessLiteral("l") |
(p.Word(p.nums) + p.Optional("." + p.Optional(p.Word(p.nums))) | "." + p.Word(p.nums)) +
p.Optional(p.CaselessLiteral("e") + p.Optional(p.Literal("+") | "-") + p.Word(p.nums)) +
p.Optional(p.CaselessLiteral("j"))
).setName('NUMBER').setParseAction(terminal_action('NUMBER')),
'STRING': p.Combine(
p.Optional(p.CaselessLiteral('u')) +
p.Optional(p.CaselessLiteral('r')) +
string_start + p.ZeroOrMore(~string_end + string_token) + string_end
).setName('STRING').setParseAction(terminal_action('STRING')),
# I can't find a good way of parsing indents/dedents.
# The Grammar just has the tokens NEWLINE, INDENT and DEDENT scattered accross the rules.
# A single NEWLINE would be translated to NEWLINE + PEER (from pyparsing.indentedBlock()), unless followed by INDENT or DEDENT
# That NEWLINE and IN/DEDENT could be spit across rule boundaries. (see the 'suite' rule)
'INDENT': (p.LineStart() + p.Optional(p.Word(' '))).setName('INDENT'),
'DEDENT': (p.LineStart() + p.Optional(p.Word(' '))).setName('DEDENT')
}
terminal_action
is a function that returns the corresponding parsing action, depending on its arguments.
I am aware of the pyparsing.indentedBlock
helper function, but I am can't figure out how to adopt that to a grammar without the PEER
token.
(Look at the pyparsing souce code to see what I am talking about)
You can see my full source code here: http://pastebin.ca/1609860
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
pyparsing wiki 示例页面上有几个示例可以为您提供一些见解:
要使用 pyparsing 的
indentedBlock
,我想你将suite
定义为:请注意,
indentedGrammarExample.py
早于在pyparsing中包含indentedBlock
,它自己的缩进解析实现也是如此。There are a couple of examples on the pyparsing wiki Examples page that could give you some insights:
To use pyparsing's
indentedBlock
, I think you would definesuite
as:Note that
indentedGrammarExample.py
pre-dates the inclusion ofindentedBlock
in pyparsing, so does its own implemention of indent parsing.