将 JSON 转换为数据帧

发布于 2025-01-12 15:41:05 字数 1272 浏览 0 评论 0原文

我正在使用Python,我有以下JSON,我需要将其转换为Dataframe:

JSON:

{"Results": 
        {"forecast": [2.1632421537363355, 16.35421956127545], 
         "prediction_interval": ["[-114.9747272420262, 119.30121154949884]", 
                                 "[-127.10990770140964, 159.8183468239605]"], 
         "index": [{"SaleDate": 1644278400000, "OfferingGroupId": 0}, 
                   {"SaleDate": 1644364800000, "OfferingGroupId": 1}]
        }
}

预期的Dataframe输出:

Forecast                 SaleDate     OfferingGroupId
2.1632421537363355       2022-02-08    0
16.35421956127545        2022-02-09    1

我已经尝试了一些方法,但没有取得任何进展,我的最后一次尝试是:

string = '{"Results": {"forecast": [2.1632421537363355, 16.35421956127545], "prediction_interval": ["[-114.9747272420262, 119.30121154949884]", "[-127.10990770140964, 159.8183468239605]"], "index": [{"SaleDate": 1644278400000, "OfferingGroupId": 0}, {"SaleDate": 1644364800000, "OfferingGroupId": 1}]}}'
json_obj = json.loads(string)
df = pd.DataFrame(json_obj)
print(df)

df = pd.concat([df['Results']], axis=0)
df = pd.concat([df['forecast'], df['index'].apply(pd.Series)], axis=1)

这导致了错误:

属性错误:“列表”对象没有属性“应用”

I am working with Python and I have the following JSON which I need to convert to a Dataframe:

JSON:

{"Results": 
        {"forecast": [2.1632421537363355, 16.35421956127545], 
         "prediction_interval": ["[-114.9747272420262, 119.30121154949884]", 
                                 "[-127.10990770140964, 159.8183468239605]"], 
         "index": [{"SaleDate": 1644278400000, "OfferingGroupId": 0}, 
                   {"SaleDate": 1644364800000, "OfferingGroupId": 1}]
        }
}

Expected Dataframe output:

Forecast                 SaleDate     OfferingGroupId
2.1632421537363355       2022-02-08    0
16.35421956127545        2022-02-09    1

I have tried a few things but not getting anywhere close, my last attempt was:

string = '{"Results": {"forecast": [2.1632421537363355, 16.35421956127545], "prediction_interval": ["[-114.9747272420262, 119.30121154949884]", "[-127.10990770140964, 159.8183468239605]"], "index": [{"SaleDate": 1644278400000, "OfferingGroupId": 0}, {"SaleDate": 1644364800000, "OfferingGroupId": 1}]}}'
json_obj = json.loads(string)
df = pd.DataFrame(json_obj)
print(df)

df = pd.concat([df['Results']], axis=0)
df = pd.concat([df['forecast'], df['index'].apply(pd.Series)], axis=1)

which resulted in an error:

AttributeError: 'list' object has no attribute 'apply'

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

爱本泡沫多脆弱 2025-01-19 15:41:06

一种可能的方法是根据“Results”下的值创建一个 DataFrame(这将创建一个名为“index”的列),并使用“index”列构建另一个 DataFrame,然后将其连接回原始数据数据框:

df = pd.DataFrame(data['Results'])
df = df.join(pd.DataFrame(df['index'].tolist())).drop(columns=['prediction_interval', 'index'])
df['SaleDate'] = pd.to_datetime(df['SaleDate'], unit='ms')

输出:

    forecast   SaleDate  OfferingGroupId
0   2.163242 2022-02-08                0
1  16.354220 2022-02-09                1

One possible approach is to create a DataFrame from the value under "Results" (this will create a column named "index") and build another DataFrame with the "index" column and join it back to the original DataFrame:

df = pd.DataFrame(data['Results'])
df = df.join(pd.DataFrame(df['index'].tolist())).drop(columns=['prediction_interval', 'index'])
df['SaleDate'] = pd.to_datetime(df['SaleDate'], unit='ms')

Output:

    forecast   SaleDate  OfferingGroupId
0   2.163242 2022-02-08                0
1  16.354220 2022-02-09                1
深陷 2025-01-19 15:41:06

不是很漂亮,但我想你可以通过将其强制放入对齐的元组列表来扔掉所有使事情变得复杂的嵌套,然后使用它:

import json
import pandas as pd

string = '{"Results": {"forecast": [2.1632421537363355, 16.35421956127545], "prediction_interval": ["[-114.9747272420262, 119.30121154949884]", "[-127.10990770140964, 159.8183468239605]"], "index": [{"SaleDate": 1644278400000, "OfferingGroupId": 0}, {"SaleDate": 1644364800000, "OfferingGroupId": 1}]}}'
results_dict = json.loads(string)["Results"]
results_tuples = zip(results_dict["forecast"],
                     [d["SaleDate"] for d in results_dict["index"]],
                     [d["OfferingGroupId"] for d in results_dict["index"]])
df = pd.DataFrame(results_tuples, columns=["Forecast", "SaleDate", "OfferingGroupId"])
df['SaleDate'] = pd.to_datetime(df['SaleDate'], unit='ms')
print(df)
>     Forecast   SaleDate  OfferingGroupId
  0   2.163242 2022-02-08                0
  1  16.354220 2022-02-09                1

或者相同的想法,但强制将其放入对齐的字典格式:

string = '{"Results": {"forecast": [2.1632421537363355, 16.35421956127545], "prediction_interval": ["[-114.9747272420262, 119.30121154949884]", "[-127.10990770140964, 159.8183468239605]"], "index": [{"SaleDate": 1644278400000, "OfferingGroupId": 0}, {"SaleDate": 1644364800000, "OfferingGroupId": 1}]}}'
results_dict = json.loads(string)["Results"]
results_dict = {"Forecast": results_dict["forecast"],
                "SaleDate": [d["SaleDate"] for d in results_dict["index"]],
                "OfferingGroupId": [d["OfferingGroupId"] for d in results_dict["index"]]}
df = pd.DataFrame.from_dict(results_dict)
df['SaleDate'] = pd.to_datetime(df['SaleDate'], unit='ms')
print(df)
>     Forecast   SaleDate  OfferingGroupId
  0   2.163242 2022-02-08                0
  1  16.354220 2022-02-09                1

通常根据我让 pandas 的经验读取非预期的输入格式,然后使用 pandas 方法来修复它,会比创建一个字典或元组列表格式作为中间步骤并只是读取它更令人头痛。但这可能只是个人喜好。

Not very pretty but I guess you can just throw out all the nesting that makes it complicated by forcing it into an aligned tuple list and then use that:

import json
import pandas as pd

string = '{"Results": {"forecast": [2.1632421537363355, 16.35421956127545], "prediction_interval": ["[-114.9747272420262, 119.30121154949884]", "[-127.10990770140964, 159.8183468239605]"], "index": [{"SaleDate": 1644278400000, "OfferingGroupId": 0}, {"SaleDate": 1644364800000, "OfferingGroupId": 1}]}}'
results_dict = json.loads(string)["Results"]
results_tuples = zip(results_dict["forecast"],
                     [d["SaleDate"] for d in results_dict["index"]],
                     [d["OfferingGroupId"] for d in results_dict["index"]])
df = pd.DataFrame(results_tuples, columns=["Forecast", "SaleDate", "OfferingGroupId"])
df['SaleDate'] = pd.to_datetime(df['SaleDate'], unit='ms')
print(df)
>     Forecast   SaleDate  OfferingGroupId
  0   2.163242 2022-02-08                0
  1  16.354220 2022-02-09                1

Or the same idea but forcing it into an aligned dict format:

string = '{"Results": {"forecast": [2.1632421537363355, 16.35421956127545], "prediction_interval": ["[-114.9747272420262, 119.30121154949884]", "[-127.10990770140964, 159.8183468239605]"], "index": [{"SaleDate": 1644278400000, "OfferingGroupId": 0}, {"SaleDate": 1644364800000, "OfferingGroupId": 1}]}}'
results_dict = json.loads(string)["Results"]
results_dict = {"Forecast": results_dict["forecast"],
                "SaleDate": [d["SaleDate"] for d in results_dict["index"]],
                "OfferingGroupId": [d["OfferingGroupId"] for d in results_dict["index"]]}
df = pd.DataFrame.from_dict(results_dict)
df['SaleDate'] = pd.to_datetime(df['SaleDate'], unit='ms')
print(df)
>     Forecast   SaleDate  OfferingGroupId
  0   2.163242 2022-02-08                0
  1  16.354220 2022-02-09                1

Generally from my experience letting pandas read a non-intended input format and then using the pandas methods to fix it causes much more of a headache than creating a dict or tuple list format as a middle step and just read that. But that might just be personal preference.

萌能量女王 2025-01-19 15:41:06

只需将 index 作为一列加载,然后使用 tolist() 将其导出为两列并创建一个新的 DataFrame。通过 pd.concat() 将新数据帧与原始数据帧合并。

在此示例中,我还包含了 prediction_interval 列,因为我认为您可能也需要这样做。

d = {"Results":
        {"forecast": [2.1632421537363355, 16.35421956127545],
         "prediction_interval": ["[-114.9747272420262, 119.30121154949884]", "[-127.10990770140964, 159.8183468239605]"],
         "index": [{"SaleDate": 1644278400000, "OfferingGroupId": 0}, {"SaleDate": 1644364800000, "OfferingGroupId": 1}]
        }
}

res = pd.DataFrame(d['Results'])

sd = pd.DataFrame(res['index'].tolist())
sd['SaleDate'] = pd.to_datetime(sd['SaleDate'], unit='ms')

pi = pd.DataFrame(res['prediction_interval'].map(json.loads).tolist(), columns=['pi_start', 'pi_end'])

df = pd.concat((res, pi, sd), axis=1).drop(columns=['index', 'prediction_interval'])

Just load the index as a column, then use tolist() to export it as two columns and create a new DataFrame. Combine the new dataframe with the original via pd.concat().

In this example, I also included columns for prediction_interval because I figured you might want that, too.

d = {"Results":
        {"forecast": [2.1632421537363355, 16.35421956127545],
         "prediction_interval": ["[-114.9747272420262, 119.30121154949884]", "[-127.10990770140964, 159.8183468239605]"],
         "index": [{"SaleDate": 1644278400000, "OfferingGroupId": 0}, {"SaleDate": 1644364800000, "OfferingGroupId": 1}]
        }
}

res = pd.DataFrame(d['Results'])

sd = pd.DataFrame(res['index'].tolist())
sd['SaleDate'] = pd.to_datetime(sd['SaleDate'], unit='ms')

pi = pd.DataFrame(res['prediction_interval'].map(json.loads).tolist(), columns=['pi_start', 'pi_end'])

df = pd.concat((res, pi, sd), axis=1).drop(columns=['index', 'prediction_interval'])
等风来 2025-01-19 15:41:06

您必须使用 pandas 库:

import json
import pandas as pd

with open('data.json') as f:
    data = json.load(f)
print(data)
df = pd.read_json('data.json')
df

You must use the pandas library:

import json
import pandas as pd

with open('data.json') as f:
    data = json.load(f)
print(data)
df = pd.read_json('data.json')
df
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文