尝试将 parse_json 与有效 JSON 一起使用时出现雪花错误 - 解析 JSON 时出错:缺少逗号,第 6 行,第 17 行
这是有效的 JSON(我已经针对两个 JSON 验证器运行了它,并使用 powershell 对其进行了解析):
{
"actionCD": "error",
"NotesTXT": "\"Exception call timeout\""
}
这不是有效的 JSON:
{
"actionCD": "error",
"NotesTXT": "\\"Exception call timeout\\""
}
但是,parse_json 函数在第一个示例中产生失败:
SELECT '{ "actionCD": "error", "NotesTXT": "\"Exception call timeout\"" }' as json_str
,PARSE_JSON(json_str) as json;
解析 JSON 时出错:缺少逗号, pos 38
出乎意料的是,雪花 parse_json 函数与无效 json 一起工作:
SELECT '{ "actionCD": "error", "NotesTXT": "\\"Exception call timeout\\"" }' as json_str
,PARSE_JSON(json_str) as json;
这让我彻底崩溃了感到困惑和不确定如何继续。我使用 powershell 以编程方式创建有效的 JSON,然后尝试使用 INSERT INTO ()...SELECT ... 将有效 JSON 插入到雪花中,
这是插入语句我正在尝试在 powershell 中构建:
INSERT INTO DBNAME.SCHEMANAME.TABLENAME(
RunID
,jsonLogTXT
) SELECT
'$RunID'
,parse_json('$($mylogdata | ConvertTo-Json)')
;
# where $($mylogdata | ConvertTo-Json) outputs valid json, and from time-to-time includes \" to escape the double quotes.
# But snowflake fails because snowflake wants \\" to escape the double quotes.
这是预期的吗? (显然我觉得这是出乎意料的:-))。这里有什么建议? (我应该在 powershell 中搜索“json-stored-as-a-string”并将其替换为“\”,然后再将其发送到雪花吗?不过,这感觉真的很黑客?)
This is valid JSON (I've run it against two JSON validators and also parsed it using powershell):
{
"actionCD": "error",
"NotesTXT": "\"Exception call timeout\""
}
This is not valid JSON:
{
"actionCD": "error",
"NotesTXT": "\\"Exception call timeout\\""
}
However, the parse_json function yields a failure with the first example:
SELECT '{ "actionCD": "error", "NotesTXT": "\"Exception call timeout\"" }' as json_str
,PARSE_JSON(json_str) as json;
Error parsing JSON: missing comma, pos 38
And unexpectedly, the snowflake parse_json function works with the invalid json:
SELECT '{ "actionCD": "error", "NotesTXT": "\\"Exception call timeout\\"" }' as json_str
,PARSE_JSON(json_str) as json;
<No Errors>
This is leaving me throughly flummoxxed and uncertain on how to proceed. I'm using powershell programmatically to create valid JSON and then trying to insert valid JSON into snowflake using INSERT INTO ()...SELECT ...
Here is the insert statement I'm trying to build in powershell:
INSERT INTO DBNAME.SCHEMANAME.TABLENAME(
RunID
,jsonLogTXT
) SELECT
'$RunID'
,parse_json('$($mylogdata | ConvertTo-Json)')
;
# where $($mylogdata | ConvertTo-Json) outputs valid json, and from time-to-time includes \" to escape the double quotes.
# But snowflake fails because snowflake wants \\" to escape the double quotes.
Is this expected? (obviously I find it unexpected :-) ). What is the recommendation here? (Should I search my json-stored-as-a-string in powershell for " and replace it with \" before sending it on to snowflake? That feels really hacky, though?)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您发布的代码显示了答案:
不是“你输入的”,因此 PARSE_JSON 正在解析的是你注意到的“有效的 JSON”。
答案对于许多计算机环境来说很常见,那就是环境正在读取你的输入,它作用于其中的一些,因此 SQL 解析器正在读取您的 SQL,它在有效的 json 中看到单个
\
并认为您正在开始一个转义序列,然后抱怨逗号在其中错误的地方。BASH(或 PowerShell)、Python,甚至 Java 都要求您了解字符串(也称为有效 JSON)内容之间的差异以及必须如何表示它,以便它能够通过语言解析器。
因此,如果数据量很大,应该如何“在雪花中插入 JSON”,一个通用答案不是通过 INSERT 命令。或者,如果您不想使字符串解析器安全,您可以对数据进行 BASE64 编码(在 powershell 中)并插入
base64_decode(awesomestring)
,如下所示eyAiYWN0aW9uQ0QiOiAiZXJyb3IiLCAiTm90ZXNUWFQiOiAiXCJFeGNlcHRpb24gY2FsbCB0aW1lb3V0XCIiIH0= 因此
给出:
The code you posted shows the answer:
What you see is not what "you entered" thus what PARSE_JSON is parsing is what you note is "valid JSON"
The answer is very common to many computer environment, and that is the environment is reading you input, and it acts on some of it, thus the here the SQL parser is reading your SQL and it see the single
\
in the valid json and thinks you are starting an escape sequence and then complains about comma's being in the wrong place.BASH (or PowerShell), Python, even Java requires you to understand the difference between the content of a string (aka you valid JSON) and how you have to represent it so it get past the language parser.
So how should to "INSERT JSON in snowflake" one general answer is not via INSERT commands if it's high volumne. Or if you don't want to make the string parser safe you can BASE64 encode the data (in powershell) and insert
base64_decode(awesomestring)
which looks like
eyAiYWN0aW9uQ0QiOiAiZXJyb3IiLCAiTm90ZXNUWFQiOiAiXCJFeGNlcHRpb24gY2FsbCB0aW1lb3V0XCIiIH0=
thusgives:
这是大多数人所期望的。 Snowflake 字符串在 JSON 解析发生之前使用反斜杠作为转义字符。
因此:
"\\"content\\""
会被雪花解析为"\"content\""
,这将被输入到 JSON 解析器中,并被视为有效的 JSON。类似的问题可以用单引号来解决。
在将其发送到雪花之前将
\
替换为\\
可能会起作用,尽管当我遇到这些类型的问题时,我发现它通常伴随着其他加密/解析错误。例如,我发现更改方法并让雪花解析包含 JSON 的文件通常更合适。那么你就不会再进行额外的一轮转义字符。但这对你的流程来说是一个更大的改变。Snowflake 的文档在此主题上有一个快速注释:https://docs.snowflake.com/en/sql-reference/functions-regexp.html#escape-characters-and-caveats
This is MOSTLY expected. Snowflake strings use backslash as an escape character BEFORE the JSON parsing happens.
As such:
"\\"content\\""
would get parsed by snowflake as"\"content\""
which is what would get fed into the JSON parser, and be treated as valid JSON.Similar issues can come up with single quotes.
Replacing
\
with\\
before sending it to snowflake might work, although when I've run into these types of issues I find it's often accompanied by other encryption/parsing errors. I find it's usually more appropriate to change the approach and have snowflake parse a file that has JSON, for example. Then you don't have the extra round of escaping characters going on. That's a bigger change to your process though.Snowflake's documentation has a quick note on this topic here: https://docs.snowflake.com/en/sql-reference/functions-regexp.html#escape-characters-and-caveats