基于CSV列分组

发布于 2025-01-29 07:07:04 字数 1101 浏览 3 评论 0原文

我已经附加了一个CSV文件。我写了一个Python脚本,该脚本读取CSV文件并在数据框架上迭代并处理CSV的内容并将其插入MongoDB。

目前,所有数据都插入到数据库中。

是否有一种方法可以迭代Python dict,并且仅采用第一个排名数据(组等级列),该列已分组,如您在附件IMG中所见。

file = request.files['file']
client = pymongo.MongoClient("mongodb://localhost:27017")



df = pd.read_csv(file)

final_dict = {}
for row in df.iterrows():
    cluster_name = row[1][1]
    print(cluster_name)
    if cluster_name not in final_dict.keys():
        final_dict[cluster_name] = {}
        final_dict[cluster_name]["queries"] = []
        final_dict[cluster_name]["queries"].append(
  {"cluster_name": row[1][0], "cluster_rank": row[1][1], 
   "cluster_size": row[1][2]})
           
    else:
        final_dict[cluster_name]["queries"].append(
            {"cluster_name": row[1][0], "cluster_rank": row[1][1], "cluster_size": row[1][2]})



db = client["db_name"]

for key in final_dict:
    db.testing.insert_one(final_dict[key])

I have attached an csv file. I have written a python script which reads the csv file and iterates over a data frame and process the contents of csv and insert it into mongoDB.

Right now, all data is getting inserted into the DB.

Is there a way to iterate over python dict and only take first 10 ranks data (group rank column), this column is grouped as you can see in attached img.

enter image description here

file = request.files['file']
client = pymongo.MongoClient("mongodb://localhost:27017")



df = pd.read_csv(file)

final_dict = {}
for row in df.iterrows():
    cluster_name = row[1][1]
    print(cluster_name)
    if cluster_name not in final_dict.keys():
        final_dict[cluster_name] = {}
        final_dict[cluster_name]["queries"] = []
        final_dict[cluster_name]["queries"].append(
  {"cluster_name": row[1][0], "cluster_rank": row[1][1], 
   "cluster_size": row[1][2]})
           
    else:
        final_dict[cluster_name]["queries"].append(
            {"cluster_name": row[1][0], "cluster_rank": row[1][1], "cluster_size": row[1][2]})



db = client["db_name"]

for key in final_dict:
    db.testing.insert_one(final_dict[key])

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

绮烟 2025-02-05 07:07:04

要仅获取等于组等于或小于10的行,您可以使用LOC选项

df = df.loc[df['group rank'] <= 10]
df

To only get rows that are equal to or less than 10 in the group rank you can use a loc option

df = df.loc[df['group rank'] <= 10]
df
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文