修复错误的 JSON 语法
我刚刚开始学习解析,我用 Haskell (使用 parsec)编写了 这个简单的解析器 来读取 JSON 和为它构建一个简单的树。我正在使用 RFC 4627 中的语法。
但是,当我尝试解析字符串 {"x":1 }
时,我得到了输出:
parse error at (line 1, column 8): unexpected "}" expecting whitespace character or ","
只有当我在右大括号 (]) 或小胡子 ( })。
我做错了什么?如果我在结束符号之前避免空格,那么它就可以完美地工作。
I've just started learning about parsing, and I wrote this simple parser in Haskell (using parsec) to read JSON and construct a simple tree for it. I am using the grammar in RFC 4627.
However, when I try parsing the string {"x":1 }
, I'm getting the output:
parse error at (line 1, column 8): unexpected "}" expecting whitespace character or ","
This only seems to be happening when I have spaces before a closing brace (]) or mustachio (}).
What have I done wrong? If I avoid whitespace before a closing symbol, it works perfectly.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
秒差距不会自动倒带和回溯。当您编写
sepBy 成员 valueSeparator
时,valueSeparator
会消耗空格,因此解析器将像这样解析您的值:当
valueSeparator
失败时,Parsec不会返回并尝试不同的解析组合,因为valueSeparator
中已经匹配了一个字符。您有两种选择来解决您的问题:
tok
应该只消耗 char 后面的空白,因此它的定义是tok c = char c *> ws
((*>)
fromControl.Applicative
);将相同的规则应用于所有其他解析器。由于以这种方式输入“错误的解析器”后您将永远不会消耗空白,因此您最终不会不得不回溯。try
来使用秒差距中的回溯,并且在失败时应该倒回其输入。编辑:更新了 ASCII 图形以使其更有意义。
Parsec doesn't do rewinding and backtracking automatically. When you write
sepBy member valueSeparator
, thevalueSeparator
consumes white space, so the parser will parse your value like so:When the
valueSeparator
fails, Parsec won't go back and try a different combination of parses, because one character has already matched invalueSeparator
.You have two options to solve your problem:
tok
should only consume white space after the char, so its definition istok c = char c *> ws
((*>)
fromControl.Applicative
); apply the same rule to all the other parsers. Since you'll never consume white space after having entered the "wrong parser" that way, you won't end up having to back-track.try
in front of parsers that might consume more than one character, and that should rewind their input if they fail.EDIT: updated ASCII graphic to make more sense.
一个通用的解决方案是让所有解析器跳过尾随空格。查看 Parsec 文档中的
lexeme
(在ParsecToken
中),了解执行此操作的巧妙方法,或者自己创建一个简单的版本:然后在所有令牌上使用此函数(如数字文字)。这样您只需担心表达式开头的空格即可。
有关
ParsecToken
及其朋友的详细信息,请参阅 秒差距文档。仅跳过标记后面的空格是有意义的,除非在开头您可以手动跳过它。即使您最终不使用
ParsecToken
模块,您也应该采用这种方法。看来您已经有了
tok
,它的作用类似于我的lexeme
,只不过它消耗了两侧的空格。将其更改为仅消耗令牌之后的空格,并手动忽略输入开头的空格。这应该(理想情况下:))解决问题。A general solution would be to have all your parsers skip trailing whitespace. Check out
lexeme
(inParsecToken
) in the Parsec docs for a neat way to do this or just whip up a simple version yourself:Then use this function on all of your tokens (like numerical literals). This way you only ever have to worry about the whitespace at the very beginning of an expression.
For more info about
ParsecToken
and friends, look at the "Lexical Analysis" section of the Parsec docs.It makes sense to only skip whitespace after a token except at the very beginning where you can skip it manually. You should take this approach even if you end up not using the
ParsecToken
module.It seems you already have
tok
which acts like mylexeme
except it consumes whitespace on both sides. Change it to only consume whitespace after the token and just ignore the whitespace at the very beginning of the input manually. That should (ideally :)) fix the problem.