Hive 开箱即用的 json 解析器

发布于 2024-11-28 21:22:33 字数 514 浏览 2 评论 0原文

我有一个包含 json 记录的文本文件,我想加载到 Hive。我的 json 看起来像:

{"vr":1,"tm":1312816191516,"tms":"08-08-2011 15:09:51.516 GMT","as":1002,"pb":1102,"cts":[1204,1205],"ctgs":[1304,1305],"op":1400,"ev":2,"dv":1503,"dvgs":[1605,1606],"cnt":"cnt5","usr":"usr8","atts":[{"id":8002,"val":"ccc"},{"id":8003,"val":"ddd"}],"sel":{"cm":2102,"ty":"PRE","ag":3002,"ad":4002,"fl":5002,"fla":6002,"hg":7002,"mc":"WAP","pr":0.1}}

如您所见,我有一个嵌套的 json,其中包含基元数组和对象数组。

是否可以使用任何内置函数将其按原样加载到 Hive?

约西

I have a text file containing json records I would like to load to Hive. My json looks like:

{"vr":1,"tm":1312816191516,"tms":"08-08-2011 15:09:51.516 GMT","as":1002,"pb":1102,"cts":[1204,1205],"ctgs":[1304,1305],"op":1400,"ev":2,"dv":1503,"dvgs":[1605,1606],"cnt":"cnt5","usr":"usr8","atts":[{"id":8002,"val":"ccc"},{"id":8003,"val":"ddd"}],"sel":{"cm":2102,"ty":"PRE","ag":3002,"ad":4002,"fl":5002,"fla":6002,"hg":7002,"mc":"WAP","pr":0.1}}

As you can see I have a nested json with arrays of primitives and array of objects.

Is it possible to load it as is to Hive using any built in function?

Yosi

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

初心未许 2024-12-05 21:22:33

您应该能够将其按原样加载到 Hive 中。
您可能需要转义 "。我还没有将 JSON 加载到 hive 中,因此如果需要进行任何转义,则不是 100%。

一旦 JSON 元素位于 hive 中,就可以访问它; Hive 有一个内置的 get_json_object 函数,详细信息请参见:
https://cwiki.apache.org/confluence/display/Hive /LanguageManual+UDF#LanguageManualUDF-getjsonobject

You should be able to load it into Hive as is.
It's possible you may need to escape the "s. I haven't loaded JSON into hive, so not 100% if any escaping needs to be done.

To access the JSON elements once it is in hive; Hive has a built in function for doinh so. get_json_object, which can be seen in details at
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-getjsonobject

与往事干杯 2024-12-05 21:22:33

您可以使用自定义 serde 将 json 文件读取到 Hive 表。请参阅 github 上的以下 serde -
https://github.com/rcongiu/Hive-JSON-Serde

You can use a custom serde to read json files to hive tables. See the following serde on github -
https://github.com/rcongiu/Hive-JSON-Serde

中二柚 2024-12-05 21:22:33

另请查看brickhouse - https://github.com/klout/brickhouse
他们有相当不错的 json UDF(如 json_split 和 json_map)。
使用brickhouse和get_json_object / json_tuple(Nija在这里也提到过),您甚至可以避免使用自定义SerDe,例如Hive-JSON-Serde。

Also checkout the brickhouse - https://github.com/klout/brickhouse.
They have quite decent UDF's for json (like json_split and json_map).
With brickhouse and get_json_object / json_tuple (also mentioned by Nija here) you can even avoid using custom SerDe's, like Hive-JSON-Serde.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文