基于CSV列分组

发布于 2025-01-29 07:07:04 字数 1101 浏览 3 评论 0原文

我已经附加了一个CSV文件。我写了一个Python脚本，该脚本读取CSV文件并在数据框架上迭代并处理CSV的内容并将其插入MongoDB。

目前，所有数据都插入到数据库中。

是否有一种方法可以迭代Python dict，并且仅采用第一个排名数据（组等级列），该列已分组，如您在附件IMG中所见。

file = request.files['file']
client = pymongo.MongoClient("mongodb://localhost:27017")



df = pd.read_csv(file)

final_dict = {}
for row in df.iterrows():
    cluster_name = row[1][1]
    print(cluster_name)
    if cluster_name not in final_dict.keys():
        final_dict[cluster_name] = {}
        final_dict[cluster_name]["queries"] = []
        final_dict[cluster_name]["queries"].append(
  {"cluster_name": row[1][0], "cluster_rank": row[1][1], 
   "cluster_size": row[1][2]})
           
    else:
        final_dict[cluster_name]["queries"].append(
            {"cluster_name": row[1][0], "cluster_rank": row[1][1], "cluster_size": row[1][2]})



db = client["db_name"]

for key in final_dict:
    db.testing.insert_one(final_dict[key])

原文

I have attached an csv file. I have written a python script which reads the csv file and iterates over a data frame and process the contents of csv and insert it into mongoDB.

Right now, all data is getting inserted into the DB.

Is there a way to iterate over python dict and only take first 10 ranks data (group rank column), this column is grouped as you can see in attached img.

file = request.files['file']
client = pymongo.MongoClient("mongodb://localhost:27017")



df = pd.read_csv(file)

final_dict = {}
for row in df.iterrows():
    cluster_name = row[1][1]
    print(cluster_name)
    if cluster_name not in final_dict.keys():
        final_dict[cluster_name] = {}
        final_dict[cluster_name]["queries"] = []
        final_dict[cluster_name]["queries"].append(
  {"cluster_name": row[1][0], "cluster_rank": row[1][1], 
   "cluster_size": row[1][2]})
           
    else:
        final_dict[cluster_name]["queries"].append(
            {"cluster_name": row[1][0], "cluster_rank": row[1][1], "cluster_size": row[1][2]})



db = client["db_name"]

for key in final_dict:
    db.testing.insert_one(final_dict[key])

分享到QQ

分享到微博