Hive 开箱即用的 json 解析器
我有一个包含 json 记录的文本文件,我想加载到 Hive。我的 json 看起来像:
{"vr":1,"tm":1312816191516,"tms":"08-08-2011 15:09:51.516 GMT","as":1002,"pb":1102,"cts":[1204,1205],"ctgs":[1304,1305],"op":1400,"ev":2,"dv":1503,"dvgs":[1605,1606],"cnt":"cnt5","usr":"usr8","atts":[{"id":8002,"val":"ccc"},{"id":8003,"val":"ddd"}],"sel":{"cm":2102,"ty":"PRE","ag":3002,"ad":4002,"fl":5002,"fla":6002,"hg":7002,"mc":"WAP","pr":0.1}}
如您所见,我有一个嵌套的 json,其中包含基元数组和对象数组。
是否可以使用任何内置函数将其按原样加载到 Hive?
约西
I have a text file containing json records I would like to load to Hive. My json looks like:
{"vr":1,"tm":1312816191516,"tms":"08-08-2011 15:09:51.516 GMT","as":1002,"pb":1102,"cts":[1204,1205],"ctgs":[1304,1305],"op":1400,"ev":2,"dv":1503,"dvgs":[1605,1606],"cnt":"cnt5","usr":"usr8","atts":[{"id":8002,"val":"ccc"},{"id":8003,"val":"ddd"}],"sel":{"cm":2102,"ty":"PRE","ag":3002,"ad":4002,"fl":5002,"fla":6002,"hg":7002,"mc":"WAP","pr":0.1}}
As you can see I have a nested json with arrays of primitives and array of objects.
Is it possible to load it as is to Hive using any built in function?
Yosi
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
您应该能够将其按原样加载到 Hive 中。
您可能需要转义
"
。我还没有将 JSON 加载到 hive 中,因此如果需要进行任何转义,则不是 100%。一旦 JSON 元素位于 hive 中,就可以访问它; Hive 有一个内置的 get_json_object 函数,详细信息请参见:
https://cwiki.apache.org/confluence/display/Hive /LanguageManual+UDF#LanguageManualUDF-getjsonobject
You should be able to load it into Hive as is.
It's possible you may need to escape the
"
s. I haven't loaded JSON into hive, so not 100% if any escaping needs to be done.To access the JSON elements once it is in hive; Hive has a built in function for doinh so.
get_json_object
, which can be seen in details athttps://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-getjsonobject
您可以使用自定义 serde 将 json 文件读取到 Hive 表。请参阅 github 上的以下 serde -
https://github.com/rcongiu/Hive-JSON-Serde
You can use a custom serde to read json files to hive tables. See the following serde on github -
https://github.com/rcongiu/Hive-JSON-Serde
另请查看brickhouse - https://github.com/klout/brickhouse 。
他们有相当不错的 json UDF(如 json_split 和 json_map)。
使用brickhouse和get_json_object / json_tuple(Nija在这里也提到过),您甚至可以避免使用自定义SerDe,例如Hive-JSON-Serde。
Also checkout the brickhouse - https://github.com/klout/brickhouse.
They have quite decent UDF's for json (like json_split and json_map).
With brickhouse and get_json_object / json_tuple (also mentioned by Nija here) you can even avoid using custom SerDe's, like Hive-JSON-Serde.