将 JSON 转换为数据帧
我正在使用Python,我有以下JSON,我需要将其转换为Dataframe:
JSON:
{"Results":
{"forecast": [2.1632421537363355, 16.35421956127545],
"prediction_interval": ["[-114.9747272420262, 119.30121154949884]",
"[-127.10990770140964, 159.8183468239605]"],
"index": [{"SaleDate": 1644278400000, "OfferingGroupId": 0},
{"SaleDate": 1644364800000, "OfferingGroupId": 1}]
}
}
预期的Dataframe输出:
Forecast SaleDate OfferingGroupId
2.1632421537363355 2022-02-08 0
16.35421956127545 2022-02-09 1
我已经尝试了一些方法,但没有取得任何进展,我的最后一次尝试是:
string = '{"Results": {"forecast": [2.1632421537363355, 16.35421956127545], "prediction_interval": ["[-114.9747272420262, 119.30121154949884]", "[-127.10990770140964, 159.8183468239605]"], "index": [{"SaleDate": 1644278400000, "OfferingGroupId": 0}, {"SaleDate": 1644364800000, "OfferingGroupId": 1}]}}'
json_obj = json.loads(string)
df = pd.DataFrame(json_obj)
print(df)
df = pd.concat([df['Results']], axis=0)
df = pd.concat([df['forecast'], df['index'].apply(pd.Series)], axis=1)
这导致了错误:
属性错误:“列表”对象没有属性“应用”
I am working with Python and I have the following JSON which I need to convert to a Dataframe:
JSON:
{"Results":
{"forecast": [2.1632421537363355, 16.35421956127545],
"prediction_interval": ["[-114.9747272420262, 119.30121154949884]",
"[-127.10990770140964, 159.8183468239605]"],
"index": [{"SaleDate": 1644278400000, "OfferingGroupId": 0},
{"SaleDate": 1644364800000, "OfferingGroupId": 1}]
}
}
Expected Dataframe output:
Forecast SaleDate OfferingGroupId
2.1632421537363355 2022-02-08 0
16.35421956127545 2022-02-09 1
I have tried a few things but not getting anywhere close, my last attempt was:
string = '{"Results": {"forecast": [2.1632421537363355, 16.35421956127545], "prediction_interval": ["[-114.9747272420262, 119.30121154949884]", "[-127.10990770140964, 159.8183468239605]"], "index": [{"SaleDate": 1644278400000, "OfferingGroupId": 0}, {"SaleDate": 1644364800000, "OfferingGroupId": 1}]}}'
json_obj = json.loads(string)
df = pd.DataFrame(json_obj)
print(df)
df = pd.concat([df['Results']], axis=0)
df = pd.concat([df['forecast'], df['index'].apply(pd.Series)], axis=1)
which resulted in an error:
AttributeError: 'list' object has no attribute 'apply'
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
一种可能的方法是根据“Results”下的值创建一个 DataFrame(这将创建一个名为“index”的列),并使用“index”列构建另一个 DataFrame,然后将其
连接
回原始数据数据框:输出:
One possible approach is to create a DataFrame from the value under "Results" (this will create a column named "index") and build another DataFrame with the "index" column and
join
it back to the original DataFrame:Output:
不是很漂亮,但我想你可以通过将其强制放入对齐的元组列表来扔掉所有使事情变得复杂的嵌套,然后使用它:
或者相同的想法,但强制将其放入对齐的字典格式:
通常根据我让 pandas 的经验读取非预期的输入格式,然后使用 pandas 方法来修复它,会比创建一个字典或元组列表格式作为中间步骤并只是读取它更令人头痛。但这可能只是个人喜好。
Not very pretty but I guess you can just throw out all the nesting that makes it complicated by forcing it into an aligned tuple list and then use that:
Or the same idea but forcing it into an aligned dict format:
Generally from my experience letting pandas read a non-intended input format and then using the pandas methods to fix it causes much more of a headache than creating a dict or tuple list format as a middle step and just read that. But that might just be personal preference.
只需将
index
作为一列加载,然后使用tolist()
将其导出为两列并创建一个新的 DataFrame。通过 pd.concat() 将新数据帧与原始数据帧合并。在此示例中,我还包含了
prediction_interval
列,因为我认为您可能也需要这样做。Just load the
index
as a column, then usetolist()
to export it as two columns and create a new DataFrame. Combine the new dataframe with the original viapd.concat()
.In this example, I also included columns for
prediction_interval
because I figured you might want that, too.您必须使用 pandas 库:
You must use the pandas library: