逃脱json的正则蜂巢
我有一个JSON文件,其中我将SQL查询放在其中,稍后它将在Hive上自动运行。
JSON的结构是遵循的:
{
"name": "query1",
"query": "select regexp_extract(column, '(.*)\\s\\|', 1) as column_one from data"
}
想法是提取所有内容,直到第一个空间 +垂直条。
对于给定的示例,当我尝试使用Hive时,它可以按预期工作:
select regexp_extract('First part | Second Part', '(.*)\\s\\|', 1) as column_one;
First part
您可以看到,您已经需要一个backsslash,\ s
和\ |
否则没有在蜂巢工作。 但是,当自动使用JSON文件运行时,我会得到以下内容:
Fir
然后我了解您需要另一个后斜杠才能在JSON中逃脱它,因此我使用以下内容:
{
"name": "query1",
"query": "select regexp_extract(column, '(.*)\\\s\\\|', 1) as column_one from data"
}
但是,它仍然给了我fir
而不是第一部分
。
I have a JSON file in which I put my SQL query and it will later on be run automatically on Hive.
Structure of the JSON is as-follow:
{
"name": "query1",
"query": "select regexp_extract(column, '(.*)\\s\\|', 1) as column_one from data"
}
The idea is to extract everything until the first space + vertical bar.
For a given example, when I try in Hive, it works as expected:
select regexp_extract('First part | Second Part', '(.*)\\s\\|', 1) as column_one;
First part
As you can see you already need one more backslash for \s
and \|
otherwise is does not work in Hive.
However when run automatically using the JSON file, I get the following:
Fir
Then I understood you need another backslash to escape it in JSON so I went with the following:
{
"name": "query1",
"query": "select regexp_extract(column, '(.*)\\\s\\\|', 1) as column_one from data"
}
But still, it gives me the Fir
instead of First part
.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您可以使用
请注意,
([^|]*?)
- 捕获除|
以外的任何零或更多字符[|]
- 字面|
char(在字符类中,[...] ,
|
char被视为字面管道符号)。You can use
Note that
([^|]*?)
- Captures any zero or more chars other than|
as few as possible[[:space:]]*
- zero or more whitespace chars[|]
- a literal|
char (inside a character class,[...]
, the|
char is treated as a literal pipe symbol).