修复错误的 JSON 语法

发布于 2024-12-24 01:31:24 字数 437 浏览 0 评论 0原文

我刚刚开始学习解析，我用 Haskell （使用 parsec）编写了这个简单的解析器来读取 JSON 和为它构建一个简单的树。我正在使用 RFC 4627 中的语法。

但是，当我尝试解析字符串 {"x":1 } 时，我得到了输出：

parse error at (line 1, column 8):
unexpected "}"
expecting whitespace character or ","

只有当我在右大括号 (]) 或小胡子 ( }）。

我做错了什么？如果我在结束符号之前避免空格，那么它就可以完美地工作。

原文

I've just started learning about parsing, and I wrote this simple parser in Haskell (using parsec) to read JSON and construct a simple tree for it. I am using the grammar in RFC 4627.

However, when I try parsing the string {"x":1 }, I'm getting the output:

parse error at (line 1, column 8):
unexpected "}"
expecting whitespace character or ","

This only seems to be happening when I have spaces before a closing brace (]) or mustachio (}).

What have I done wrong? If I avoid whitespace before a closing symbol, it works perfectly.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

原野 2024-12-31 01:31:24

秒差距不会自动倒带和回溯。当您编写 sepBy 成员 valueSeparator 时，valueSeparator 会消耗空格，因此解析器将像这样解析您的值：

{"x":1 }
[------- object
%        beginObject
 [-]     name
    %    nameSeparator
     %   jvalue
      [- valueSeparator
       X In valueSeparator: unexpected "}"

Legend:
[--]     full match
%        full char match
[--      incomplete match
X        incomplete char match

当 valueSeparator 失败时，Parsec不会返回并尝试不同的解析组合，因为 valueSeparator 中已经匹配了一个字符。

您有两种选择来解决您的问题：

由于空格在 JSON 中无关紧要，因此始终在重要标记之后消耗空格，而不是之前。因此，tok 应该只消耗 char 后面的空白，因此它的定义是 tok c = char c *> ws ((*>) from Control.Applicative);将相同的规则应用于所有其他解析器。由于以这种方式输入“错误的解析器”后您将永远不会消耗空白，因此您最终不会不得不回溯。
通过在可能消耗多个字符的解析器前面添加 try 来使用秒差距中的回溯，并且在失败时应该倒回其输入。

编辑：更新了 ASCII 图形以使其更有意义。

Parsec doesn't do rewinding and backtracking automatically. When you write sepBy member valueSeparator, the valueSeparator consumes white space, so the parser will parse your value like so:

{"x":1 }
[------- object
%        beginObject
 [-]     name
    %    nameSeparator
     %   jvalue
      [- valueSeparator
       X In valueSeparator: unexpected "}"

Legend:
[--]     full match
%        full char match
[--      incomplete match
X        incomplete char match

When the valueSeparator fails, Parsec won't go back and try a different combination of parses, because one character has already matched in valueSeparator.

You have two options to solve your problem:

Since white space is insignificant in JSON, always consume white space after a significant token, never before. So, a tok should only consume white space after the char, so its definition is tok c = char c *> ws ((*>) from Control.Applicative); apply the same rule to all the other parsers. Since you'll never consume white space after having entered the "wrong parser" that way, you won't end up having to back-track.
Use back-tracking in Parsec by adding try in front of parsers that might consume more than one character, and that should rewind their input if they fail.

EDIT: updated ASCII graphic to make more sense.

回复收藏 0 原文

丢了幸福的猪 2024-12-31 01:31:24

一个通用的解决方案是让所有解析器跳过尾随空格。查看 Parsec 文档中的 lexeme （在 ParsecToken 中），了解执行此操作的巧妙方法，或者自己创建一个简单的版本：

 lexeme parser = do result <- parser
                    spaces
                    return result

然后在所有令牌上使用此函数（如数字文字）。这样您只需担心表达式开头的空格即可。

有关 ParsecToken 及其朋友的详细信息，请参阅秒差距文档。

仅跳过标记后面的空格是有意义的，除非在开头您可以手动跳过它。即使您最终不使用 ParsecToken 模块，您也应该采用这种方法。

看来您已经有了 tok ，它的作用类似于我的 lexeme ，只不过它消耗了两侧的空格。将其更改为仅消耗令牌之后的空格，并手动忽略输入开头的空格。这应该（理想情况下:)）解决问题。

A general solution would be to have all your parsers skip trailing whitespace. Check out lexeme (in ParsecToken) in the Parsec docs for a neat way to do this or just whip up a simple version yourself:

 lexeme parser = do result <- parser
                    spaces
                    return result

Then use this function on all of your tokens (like numerical literals). This way you only ever have to worry about the whitespace at the very beginning of an expression.

For more info about ParsecToken and friends, look at the "Lexical Analysis" section of the Parsec docs.

It makes sense to only skip whitespace after a token except at the very beginning where you can skip it manually. You should take this approach even if you end up not using the ParsecToken module.

It seems you already have tok which acts like my lexeme except it consumes whitespace on both sides. Change it to only consume whitespace after the token and just ignore the whitespace at the very beginning of the input manually. That should (ideally :)) fix the problem.

回复收藏 0 原文

~没有更多了~