基于CSV列分组
我已经附加了一个CSV文件。我写了一个Python脚本,该脚本读取CSV文件并在数据框架上迭代并处理CSV的内容并将其插入MongoDB。
目前,所有数据都插入到数据库中。
是否有一种方法可以迭代Python dict,并且仅采用第一个排名数据(组等级列),该列已分组,如您在附件IMG中所见。
file = request.files['file']
client = pymongo.MongoClient("mongodb://localhost:27017")
df = pd.read_csv(file)
final_dict = {}
for row in df.iterrows():
cluster_name = row[1][1]
print(cluster_name)
if cluster_name not in final_dict.keys():
final_dict[cluster_name] = {}
final_dict[cluster_name]["queries"] = []
final_dict[cluster_name]["queries"].append(
{"cluster_name": row[1][0], "cluster_rank": row[1][1],
"cluster_size": row[1][2]})
else:
final_dict[cluster_name]["queries"].append(
{"cluster_name": row[1][0], "cluster_rank": row[1][1], "cluster_size": row[1][2]})
db = client["db_name"]
for key in final_dict:
db.testing.insert_one(final_dict[key])
I have attached an csv file. I have written a python script which reads the csv file and iterates over a data frame and process the contents of csv and insert it into mongoDB.
Right now, all data is getting inserted into the DB.
Is there a way to iterate over python dict and only take first 10 ranks data (group rank column), this column is grouped as you can see in attached img.
file = request.files['file']
client = pymongo.MongoClient("mongodb://localhost:27017")
df = pd.read_csv(file)
final_dict = {}
for row in df.iterrows():
cluster_name = row[1][1]
print(cluster_name)
if cluster_name not in final_dict.keys():
final_dict[cluster_name] = {}
final_dict[cluster_name]["queries"] = []
final_dict[cluster_name]["queries"].append(
{"cluster_name": row[1][0], "cluster_rank": row[1][1],
"cluster_size": row[1][2]})
else:
final_dict[cluster_name]["queries"].append(
{"cluster_name": row[1][0], "cluster_rank": row[1][1], "cluster_size": row[1][2]})
db = client["db_name"]
for key in final_dict:
db.testing.insert_one(final_dict[key])
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
要仅获取等于组等于或小于10的行,您可以使用LOC选项
To only get rows that are equal to or less than 10 in the group rank you can use a loc option