如何解析多层嵌套 (5/6) JSON 对象并将其转换为数据帧?

发布于 2025-01-15 06:21:03 字数 2236 浏览 3 评论 0原文

问题:我有一个多层嵌套的 JSON 文件,需要解析它并将其转换为 pandas 数据帧,其中每个字段都是一列。我采取了两种方法:

  1. 将原始文件转换为数据字典
  2. 将原始文件转换为 JSON 对象

对于我尝试过的数据字典:

  • df = pd.json_normalize(data)

    • 这给我留下了一个解析并加载到 field12 的 DF(请参阅下面的虚拟 JSON),其余数据将加载到该单个单元格中。还尝试添加 max_level=,无论使用什么数字,我都会得到相同的结果
  • 我尝试过 dt = dt.explode('field12')< /p>

    • 这使得第三个嵌套中的一列标题没有数据,其余数据丢失

For我尝试过的 JSON 对象:

  • pd.read_json(json_object)
    • 这给我留下了一个 4x9 的表格,其中第一列是第二个嵌套中的标题,各列是第一个嵌套中的标题,其余数据再次存储在单个单元格中

请帮忙,对这个非常迷失!

下面是一个虚拟 JSON 对象,它是我正在使用的文件的精确结构副本:

{
"field1":"dummyString",
"field2": null,
"field3":" dummyString",
"field4":{
"field5":" dummyString",
"field6":" dummyString",
"field7":" dummyString",
"field8":" dummyString",
"field9":"dummyNumber",
"field10":null,
"field11":" dummyString",
"field12":[
{
"field13":" dummyString",
"field14":" dummyString",
"field15":{
"field16":" dummyString",
"field17":" dummyString",
"field18":" dummyString"
},
"field19":" dummyString",
"field20":" dummyString",
"field21":" dummyString",
"field22":" dummyString",
"field23":" dummyString",
"field24": "dummyNumber",
"field25": "dummyNumber",
"field26":null,
"field27":" dummyString",
"field28":" dummyString",
"field29":" dummyString",
"field30": "dummyBoolean",
"field31": "dummyNumber",
"field32":" dummyString",
"field33":null,
"field34":[
{
"field35":"dummyString",
"field36":null,
"field37":{
"field38":"dummyString",
"field39":"dummyString",
"field40":" dummyString",
"field41":" dummyString",
"field42":" dummyString",
"field43":" dummyString",
"field44":null,
"field45":" dummyString",
"field46":" dummyString"
},
"field47":null,
"field48":null,
"field49":null,
"field50":null,
"field51": "dummyNumber",
"field52":"dummyString",
"field53":"dummyString",
"field54":"dummyBoolean",
"field55": "dummyBoolean",
"field56":"dummyString",
"field57":null,
"field58":null
},{
"field59":{
"field60":null,
"field61":null,
"field62":null

    } 
}
]

}
],
"field63":"dummyStringl"


}
} 

Problem: I have a multi nested JSON file that I need to parse and convert to a pandas dataframe where every field is a column. I've taken 2 approaches:

  1. Convert raw file to data dictionary
  2. Convert raw file to JSON object

For data dictionary I've tried:

  • df = pd.json_normalize(data)

    • This leaves me with a DF that parses and loads up to field12 (see dummy JSON below), the rest of the data is loaded into that single cell. Have also tried adding max_level= and I get the same result no matter what number I use
  • I've tried dt = dt.explode('field12')

    • This leaves me with a single column of the headers in the 3rd nest with no data and the rest of the data is missing

For JSON object I've tried:

  • pd.read_json(json_object)
    • This leaves me with a 4x9 table with the 1st column being the headers in the 2nd nest and the columns being the headers in the 1st nest, the rest of the data is stored in a single cell again

Please help, very lost with this one!

Below is a dummy JSON object that is an exact structure replica of the file I'm working with:

{
"field1":"dummyString",
"field2": null,
"field3":" dummyString",
"field4":{
"field5":" dummyString",
"field6":" dummyString",
"field7":" dummyString",
"field8":" dummyString",
"field9":"dummyNumber",
"field10":null,
"field11":" dummyString",
"field12":[
{
"field13":" dummyString",
"field14":" dummyString",
"field15":{
"field16":" dummyString",
"field17":" dummyString",
"field18":" dummyString"
},
"field19":" dummyString",
"field20":" dummyString",
"field21":" dummyString",
"field22":" dummyString",
"field23":" dummyString",
"field24": "dummyNumber",
"field25": "dummyNumber",
"field26":null,
"field27":" dummyString",
"field28":" dummyString",
"field29":" dummyString",
"field30": "dummyBoolean",
"field31": "dummyNumber",
"field32":" dummyString",
"field33":null,
"field34":[
{
"field35":"dummyString",
"field36":null,
"field37":{
"field38":"dummyString",
"field39":"dummyString",
"field40":" dummyString",
"field41":" dummyString",
"field42":" dummyString",
"field43":" dummyString",
"field44":null,
"field45":" dummyString",
"field46":" dummyString"
},
"field47":null,
"field48":null,
"field49":null,
"field50":null,
"field51": "dummyNumber",
"field52":"dummyString",
"field53":"dummyString",
"field54":"dummyBoolean",
"field55": "dummyBoolean",
"field56":"dummyString",
"field57":null,
"field58":null
},{
"field59":{
"field60":null,
"field61":null,
"field62":null

    } 
}
]

}
],
"field63":"dummyStringl"


}
} 

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

忘年祭陌 2025-01-22 06:21:03

我写了一个函数来扁平化像你这样的字典。

def flatten(d):
    ret = {**d}
    for k, v in d.items():
        if isinstance(v, dict):
            ret.pop(k)
            ret = {**ret, **flatten(v)}
        elif isinstance(v, list):
            ret.pop(k)
            for item in v:
                ret = {**ret, **flatten(item)}
            
    return ret

df = pd.json_normalize(flatten(j))

输出:

>>> df
        field1 field2        field3        field5        field6        field7        field8       field9 field10  ...       field41       field42       field43 field44       field45       field46 field60 field61 field62
0  dummyString   None   dummyString   dummyString   dummyString   dummyString   dummyString  dummyNumber    None  ...   dummyString   dummyString   dummyString    None   dummyString   dummyString    None    None    None

[1 rows x 57 columns]

I wrote a function to flatten a dict like yours.

def flatten(d):
    ret = {**d}
    for k, v in d.items():
        if isinstance(v, dict):
            ret.pop(k)
            ret = {**ret, **flatten(v)}
        elif isinstance(v, list):
            ret.pop(k)
            for item in v:
                ret = {**ret, **flatten(item)}
            
    return ret

df = pd.json_normalize(flatten(j))

Output:

>>> df
        field1 field2        field3        field5        field6        field7        field8       field9 field10  ...       field41       field42       field43 field44       field45       field46 field60 field61 field62
0  dummyString   None   dummyString   dummyString   dummyString   dummyString   dummyString  dummyNumber    None  ...   dummyString   dummyString   dummyString    None   dummyString   dummyString    None    None    None

[1 rows x 57 columns]
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文