如何从Instagram JSON数据中获取Pandas DataFrame
我对所有这一切都是很新的,我花了一段时间就花了一段时间的Python Bootcamp,现在正在努力将一些Instagram数据纳入我了解的格式。
使用以下代码:
# Importing packages
import json
import re
import collections
import pandas as pd
import datetime as dt
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()
# Loading downloaded instagram data
json_data = {}
data_path = "C:/Users/etc.json"
with open(data_path) as file:
json_data = json.load(file)
print(json_data)
我将获得以下看起来很有希望的输出:
{'relationships_followers': [{'title': '', 'media_list_data': [], 'string_list_data': [{'href': 'https://www.instagram.com/username1', 'value': 'username1', 'timestamp': 1655411505}]}, {'title': '', 'media_list_data': [], 'string_list_data': [{'href': 'https://www.instagram.com/username2', 'value': 'username2', 'timestamp': 1655149264}]}, {'title': '', 'media_list_data': [], 'string_list_data': [{'href': 'https://www.instagram.com/username3', 'value': 'username3', 'timestamp': 1655129904}]}, etc.....
type = dict,
但是当我尝试将其转换为熊猫数据框架时,它会出现奇怪的
dfp = pd.read_json(data_path, orient = 'records')
print(dfp)
print(type(dfp))
输出:
relationships_followers
0 {'title': '', 'media_list_data': [], 'string_l...
1 {'title': '', 'media_list_data': [], 'string_l...
2 {'title': '', 'media_list_data': [], 'string_l...
3 {'title': '', 'media_list_data': [], 'string_l...
4 {'title': '', 'media_list_data': [], 'string_l...
.. ...
575 {'title': '', 'media_list_data': [], 'string_l...
576 {'title': '', 'media_list_data': [], 'string_l...
577 {'title': '', 'media_list_data': [], 'string_l...
578 {'title': '', 'media_list_data': [], 'string_l...
579 {'title': '', 'media_list_data': [], 'string_l...
[580 rows x 1 columns]
<class 'pandas.core.frame.DataFrame'>
如何停止将“ Realenth_followers”作为一个孤独的列?
试图获得如下输出:
href value timestamp
0 www.inst... username1 DDMMYY
1 www.inst... username2 DDMMYY
2 www.inst... username3 DDMMYY
3 www.inst... username4 DDMMYY
...
578 www.inst... username578 DDMMYY
579 www.inst... username579 DDMMYY
I am quite new to all this, I took a short Python bootcamp a while back and am now struggling to get some Instagram data into a format I understand.
Using the following code:
# Importing packages
import json
import re
import collections
import pandas as pd
import datetime as dt
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()
# Loading downloaded instagram data
json_data = {}
data_path = "C:/Users/etc.json"
with open(data_path) as file:
json_data = json.load(file)
print(json_data)
I get the following output which looks promising:
{'relationships_followers': [{'title': '', 'media_list_data': [], 'string_list_data': [{'href': 'https://www.instagram.com/username1', 'value': 'username1', 'timestamp': 1655411505}]}, {'title': '', 'media_list_data': [], 'string_list_data': [{'href': 'https://www.instagram.com/username2', 'value': 'username2', 'timestamp': 1655149264}]}, {'title': '', 'media_list_data': [], 'string_list_data': [{'href': 'https://www.instagram.com/username3', 'value': 'username3', 'timestamp': 1655129904}]}, etc.....
type = dict
But when I try to convert it into a pandas dataframe it presents strangely
dfp = pd.read_json(data_path, orient = 'records')
print(dfp)
print(type(dfp))
Output:
relationships_followers
0 {'title': '', 'media_list_data': [], 'string_l...
1 {'title': '', 'media_list_data': [], 'string_l...
2 {'title': '', 'media_list_data': [], 'string_l...
3 {'title': '', 'media_list_data': [], 'string_l...
4 {'title': '', 'media_list_data': [], 'string_l...
.. ...
575 {'title': '', 'media_list_data': [], 'string_l...
576 {'title': '', 'media_list_data': [], 'string_l...
577 {'title': '', 'media_list_data': [], 'string_l...
578 {'title': '', 'media_list_data': [], 'string_l...
579 {'title': '', 'media_list_data': [], 'string_l...
[580 rows x 1 columns]
<class 'pandas.core.frame.DataFrame'>
How do I stop taking "relationships_followers" as a lonely column?
Trying to get an output like the below:
href value timestamp
0 www.inst... username1 DDMMYY
1 www.inst... username2 DDMMYY
2 www.inst... username3 DDMMYY
3 www.inst... username4 DDMMYY
...
578 www.inst... username578 DDMMYY
579 www.inst... username579 DDMMYY
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
尝试对您的大师命令这样做。
Try doing this to your master dict.
在这种情况下,您可以使用 pd.sson_normalize(to)提取
HREF
,value
,timestamp
列从string_list_data
dictionary中提取。In this case you can use pd.json_normalize() to extract the
href
,value
,timestamp
columns from thestring_list_data
dictionary.