调试 Pyparsing 语法

发布于 2024-08-13 05:00:42 字数 3285 浏览 6 评论 0原文

我正在为一种名为 C 的虚构编程语言(不是实际的 C 语言)构建一个解析器。我已经到了需要将语言语法翻译成 Pyparsing 可以接受的阶段。不幸的是,当我解析输入字符串(这是正确的并且不会导致 Pyparsing 错误)时,它无法正确解析。我担心这是由于我的语法错误造成的,但是当我第一次开始 Pyparsing 时,我似乎看不出哪里出了问题。

我已经从此处上传了我正在翻译的语法,供人们阅读。

编辑:根据保罗的建议进行了更新。

这是我目前得到的语法(我知道语法定义的最上面两行对我来说非常糟糕):

# Lexical structure definition
ifS = Keyword('if')
elseS = Keyword('else')
whileS = Keyword('while')
returnS = Keyword('return')
intVar = Keyword('int')
voidKeyword = Keyword('void')
sumdiff = Literal('+') | Literal('-')
prodquot = Literal('*') | Literal('/')
relation = Literal('<=') | Literal('<') | Literal('==') | \
           Literal('!=') | Literal('>') | Literal('=>')
lbrace = Literal('{')
rbrace = Literal('}')
lparn = Literal('(')
rparn = Literal(')')
semi = Literal(';')
comma = Literal(',')
number = Word(nums)
identifier = Word(alphas, alphanums)

# Syntax definition
term = ''
statement = ''
variable    =   intVar + identifier + semi
locals      =   ZeroOrMore(variable)
expr        =   term | OneOrMore(Group(sumdiff + term))
args        =   ZeroOrMore(OneOrMore(Group(expr + comma)) | expr)
funccall    =   Group(identifier + lparn + args + rparn)
factor      =   Group(lparn + expr + rparn) | identifier | funccall | number
term        =   factor | OneOrMore(prodquot + factor)
cond        =   Group(lparn + expr + relation + expr + rparn)
returnState =   Group(returnS + semi) | Combine(returnS + expr + semi)
assignment  =   Group(identifier + '=' + expr + semi)
proccall    =   Group(identifier + lparn + args + rparn + semi)
block       =   Group(lbrace + locals + statement + rbrace)
iteration   =   Group(whileS + cond + block)
selection   =   Group(ifS + cond + block) | Group(ifS + cond + block + elseS + block)
statement   =   OneOrMore(proccall | assignment | selection | iteration | returnState)
param       =   Group(intVar + identifier)
paramlist   =   OneOrMore(Combine(param + comma)) | param
params      =   paramlist | voidKeyword
procedure   =   Group(voidKeyword + identifier + lparn + params + rparn + block)
function    =   Group(intVar + identifier + lparn + params + rparn + block)
declaration =   variable | function | procedure
program     =   OneOrMore(declaration)

我想知道我在翻译语法时是否犯了任何错误以及我有哪些改进可以在遵守我所给出的语法的同时简化它。

编辑2:更新以包含新错误。

这是我正在解析的输入字符串:

int larger ( int first , int second ) { 
if ( first > second ) { 
return first ; 
} else { 
return second ; 
} 
} 

void main ( void ) { 
int count ; 
int sum ; 
int max ; 
int x ; 

x = input ( ) ; 
max = x ; 
sum = 0 ; 
count = 0 ; 

while ( x != 0 ) { 
count = count + 1 ; 
sum = sum + x ; 
max = larger ( max , x ) ; 
x = input ( ) ; 
} 

output ( count ) ; 
output ( sum ) ; 
output ( max ) ; 
} 

这是从终端运行程序时收到的错误消息:

/Users/Joe/Documents/Eclipse Projects/Parser/src/pyparsing.py:1156: SyntaxWarning: null string passed to Literal; use Empty() instead
other = Literal( other )
/Users/Joe/Documents/Eclipse Projects/Parser/src/pyparsing.py:1258: SyntaxWarning: null string passed to Literal; use Empty() instead
other = Literal( other )
Expected ")" (at char 30), (line:6, col:26)
None

I'm building a parser for an imaginary programming language called C-- (not the actual C-- language). I've gotten to the stage where I need to translate the language's grammar into something Pyparsing can accept. Unfortunatly when I come to parse my input string (which is correct and should not cause Pyparsing to error) it's not parsing correctly. I fear this is due to errors in my grammar, but as I'm starting Pyparsing for the first time, I can't seem to see where I'm going wrong.

I've uploaded the grammar that I'm translating from here for people to have a read through.

EDIT: Updated with the advice from Paul.

This is the grammer I've currently got (the two top lines of Syntax definition are terribly bad of me I know):

# Lexical structure definition
ifS = Keyword('if')
elseS = Keyword('else')
whileS = Keyword('while')
returnS = Keyword('return')
intVar = Keyword('int')
voidKeyword = Keyword('void')
sumdiff = Literal('+') | Literal('-')
prodquot = Literal('*') | Literal('/')
relation = Literal('<=') | Literal('<') | Literal('==') | \
           Literal('!=') | Literal('>') | Literal('=>')
lbrace = Literal('{')
rbrace = Literal('}')
lparn = Literal('(')
rparn = Literal(')')
semi = Literal(';')
comma = Literal(',')
number = Word(nums)
identifier = Word(alphas, alphanums)

# Syntax definition
term = ''
statement = ''
variable    =   intVar + identifier + semi
locals      =   ZeroOrMore(variable)
expr        =   term | OneOrMore(Group(sumdiff + term))
args        =   ZeroOrMore(OneOrMore(Group(expr + comma)) | expr)
funccall    =   Group(identifier + lparn + args + rparn)
factor      =   Group(lparn + expr + rparn) | identifier | funccall | number
term        =   factor | OneOrMore(prodquot + factor)
cond        =   Group(lparn + expr + relation + expr + rparn)
returnState =   Group(returnS + semi) | Combine(returnS + expr + semi)
assignment  =   Group(identifier + '=' + expr + semi)
proccall    =   Group(identifier + lparn + args + rparn + semi)
block       =   Group(lbrace + locals + statement + rbrace)
iteration   =   Group(whileS + cond + block)
selection   =   Group(ifS + cond + block) | Group(ifS + cond + block + elseS + block)
statement   =   OneOrMore(proccall | assignment | selection | iteration | returnState)
param       =   Group(intVar + identifier)
paramlist   =   OneOrMore(Combine(param + comma)) | param
params      =   paramlist | voidKeyword
procedure   =   Group(voidKeyword + identifier + lparn + params + rparn + block)
function    =   Group(intVar + identifier + lparn + params + rparn + block)
declaration =   variable | function | procedure
program     =   OneOrMore(declaration)

I'd like to know if there are any mistakes I've made in translating the grammar across and what improvements I could do to make it simplified whilst adhering to the grammar I've been given.

EDIT 2: Updated to include the new error.

Here is the input string I am parsing:

int larger ( int first , int second ) { 
if ( first > second ) { 
return first ; 
} else { 
return second ; 
} 
} 

void main ( void ) { 
int count ; 
int sum ; 
int max ; 
int x ; 

x = input ( ) ; 
max = x ; 
sum = 0 ; 
count = 0 ; 

while ( x != 0 ) { 
count = count + 1 ; 
sum = sum + x ; 
max = larger ( max , x ) ; 
x = input ( ) ; 
} 

output ( count ) ; 
output ( sum ) ; 
output ( max ) ; 
} 

And this is the error message I get when running my program from Terminal:

/Users/Joe/Documents/Eclipse Projects/Parser/src/pyparsing.py:1156: SyntaxWarning: null string passed to Literal; use Empty() instead
other = Literal( other )
/Users/Joe/Documents/Eclipse Projects/Parser/src/pyparsing.py:1258: SyntaxWarning: null string passed to Literal; use Empty() instead
other = Literal( other )
Expected ")" (at char 30), (line:6, col:26)
None

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

望笑 2024-08-20 05:00:42

1) 将 Literal("if") 更改为 Keyword("if") (依此类推,直至 Literal("void") ),以防止匹配名为 “ifactor” 的变量的前导“if”。

2) numsalphasalphanums 不是表达式,它们是字符串,可以与 Word 类一起使用来定义一些典型的集合定义“单词”时的字符数,例如“数字是由数字组成的单词”,或者“标识符是以字母开头,后跟零个或多个字母的单词”。 。

number = nums
identifier = alphas + OneOrMore(alphanums)

number = Word(nums)
identifier = Word(alphas, alphanums)

因此,我认为您需要的是 Group,而不是: 3) 而不是 Combine 当您希望匹配的标记连续且没有中间空格时,请使用Combine,并将连接标记并将它们作为单个字符串返回。 Combine 通常用于这样的情况:

realnum = Combine(Word(nums) + "." + Word(nums))

如果没有 Combine,解析 "3.14" 将返回字符串列表 ['3' , '.', '14'],因此我们添加 Combine ,以便 realnum 的解析结果为 '3.14' (然后您可以将其传递给转换为实际浮点值的解析操作3.14)。 Combine 强制执行无中间空格的做法还可以防止我们意外解析 “答案是 3. 10 太多了。” 并认为 “3. 10” 代表一个实数。

4) 这不应导致您的错误,但您的输入字符串有很多额外空格。如果您的语法正常工作,您应该能够像 "int x ;" 一样解析 "int x;"

希望其中一些提示可以帮助您继续前进。您读过任何在线 pyparsing 文章或教程吗?请查看在线示例。您需要很好地掌握 WordLiteralCombine 等如何执行各自的解析任务。

5) 您错误地实现了术语和语句的递归定义。不要将 '' 分配给它们,而是这样写:

term = Forward()
statement = Forward()

然后,当您使用递归定义实际定义它们时,请使用 << 运算符(并确保将它们括起来) () 中的 RHS)。

term << (... term definition ...)
statement << (... statement definition ...)

您可以在此处找到递归解析器的示例,以及关于基本 pyparsing 用法的演示此处 - 请参阅标题为“解析”的部分列表”以详细了解如何处理递归。

1) Change Literal("if") to Keyword("if") (and so on, down to Literal("void")), to prevent matching the leading "if" of a variable named "ifactor".

2) nums, alphas, and alphanums are not expressions, they are strings, that can be used with the Word class to define some typical sets of characters when defining "words" like "a number is a word made up of nums", or "an identifier is a word that starts with an alpha, followed by zero or more alphanums." So instead of:

number = nums
identifier = alphas + OneOrMore(alphanums)

you want

number = Word(nums)
identifier = Word(alphas, alphanums)

3) Instead of Combine, I think you want Group. Use Combine when you want the matched tokens to be contiguous with no intervening whitespace, and will concatenate the tokens and return them as a single string. Combine is often used in cases like this:

realnum = Combine(Word(nums) + "." + Word(nums))

Without Combine, parsing "3.14" would return the list of strings ['3', '.', '14'], so we add Combine so that the parsed result for realnum is '3.14' (which you could then pass to a parse action to convert to the actual floating value 3.14). Combines enforcement of no intervening whitespace also keeps us from accidentally parsing 'The answer is 3. 10 is too much.' and thinking the "3. 10" represents a real number.

4) This should not cause your error, but your input string has lots of extra spaces. If you get your grammar working, you should be able to parse "int x;" just as well as "int x ;".

Hope some of these hints get you going. Have you read any online pyparsing articles or tutorials? And please look through the online examples. You'll need to get a good grasp of how Word, Literal, Combine, etc. perform their individual parsing tasks.

5) You have mis-implemented the recursive definitions for term and statement. Instead of assigning '' to them, write:

term = Forward()
statement = Forward()

Then when you go to actually define them with their recursive definitions, use the << operator (and be sure to enclose the RHS in ()'s).

term << (... term definition ...)
statement << (... statement definition ...)

You can find an example of a recursive parser here, and a presentation on basic pyparsing usage here - see the section titled "Parsing Lists" for more step-by-step on how the recursion is handled.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文