python字典根据日期检查重复项

发布于 2025-01-09 15:42:12 字数 2100 浏览 0 评论 0原文

所以我要循环一个目录并且我正在读取一些 JSON 文件在这些文件上，我解析出 4 个键，然后创建一个包含所有解析出的数据的 CSV 文件。

碰巧我有重复的条目，所以我想根据日期（较新的）消除重复项，然后重新写入？ CSV 不确定如何实现它，

例如：

def mdy_to_ymd(d):
    # convert the date into comparable string
    cor_date = datetime.strptime(d, '%b %d %Y').strftime('%d/%m/%Y')
    return time.strptime(cor_date, "%d/%m/%Y")


def date_converter(date):  # convert the date to readable string for csv
    return datetime.strptime(date, '%b %d %Y').strftime('%d/%m/%Y')


def csv_generator(path):  # creating the csv
    list_json = []
    ffresult = []
    duplicate_dict = {}
    for file in os.listdir(path):  # iterating through the directory with the files
        fresult = []
        with open(f"{directory}/{file}", "r") as result:  # opening the json file
            templates = json.load(result)
            hostname_str = file.split(".")
            site_code_str = (f"{file[:5]}")
            datetime_str3 = (mdy_to_ymd(datetime_str2))  # converting the date to comparable data
            duplicate_dict[hostname_str[0]] = datetime_str3
            """?? i am creating a 
            dictionary which as key has the hostname and as date has the date 
            but it doesnt work since when there is the same hostname it only updates the current key and there are 
            not duplicates but it doesnt guarantee there are only the newest based on date"""
            fresult.append(site_code_str)
            fresult.append(hostname_str[0])
            fresult.append((templates["execution_status"]))
            fresult.append(date_converter(datetime_str2))
            fresult.append(templates["protocol_name"])
            fresult.append(templates["protocol_version"])
            ffresult.append(fresult)


# i append the values i need into 2 lists
with open("jsondicts.csv", "w") as dst:
    writetoit = csv.writer(dst)
    writetoit.writerows(csv_generator(directory))
# this is how i write to csv so right now i have duplicate values on the csv

我只想有基于主机名的唯一值，但也只有最新的值基于日期的唯一数据，当然还有其他解析出的数据（协议名称、站点代码等）

原文

So I am for looping over a directory and I am reading some JSON files
on those files, I parse out 4 keys and then I create a CSV file with all the parsed out data

It happens that I have duplicate entries so I want to eliminate duplicates based on date(newer) and then re-write? the CSV not sure how to implement it

e.g:

def mdy_to_ymd(d):
    # convert the date into comparable string
    cor_date = datetime.strptime(d, '%b %d %Y').strftime('%d/%m/%Y')
    return time.strptime(cor_date, "%d/%m/%Y")


def date_converter(date):  # convert the date to readable string for csv
    return datetime.strptime(date, '%b %d %Y').strftime('%d/%m/%Y')


def csv_generator(path):  # creating the csv
    list_json = []
    ffresult = []
    duplicate_dict = {}
    for file in os.listdir(path):  # iterating through the directory with the files
        fresult = []
        with open(f"{directory}/{file}", "r") as result:  # opening the json file
            templates = json.load(result)
            hostname_str = file.split(".")
            site_code_str = (f"{file[:5]}")
            datetime_str3 = (mdy_to_ymd(datetime_str2))  # converting the date to comparable data
            duplicate_dict[hostname_str[0]] = datetime_str3
            """?? i am creating a 
            dictionary which as key has the hostname and as date has the date 
            but it doesnt work since when there is the same hostname it only updates the current key and there are 
            not duplicates but it doesnt guarantee there are only the newest based on date"""
            fresult.append(site_code_str)
            fresult.append(hostname_str[0])
            fresult.append((templates["execution_status"]))
            fresult.append(date_converter(datetime_str2))
            fresult.append(templates["protocol_name"])
            fresult.append(templates["protocol_version"])
            ffresult.append(fresult)


# i append the values i need into 2 lists
with open("jsondicts.csv", "w") as dst:
    writetoit = csv.writer(dst)
    writetoit.writerows(csv_generator(directory))
# this is how i write to csv so right now i have duplicate values on the csv

I want to have only unique values based on hostname but also only the newest
unique ones based on the date of course also the other parsed out data (protocol name, site code, etc)

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

旧时浪漫 2025-01-16 15:42:12

这解决了我必须使用 pandas lib

result_pan_xls = (result_pan.sort_values(by="Execution_Date").drop_duplicates(subset="HOSTNAME",keep="last"))

this solves it i had to use pandas lib though

result_pan_xls = (result_pan.sort_values(by="Execution_Date").drop_duplicates(subset="HOSTNAME",keep="last"))

回复收藏 0 原文

~没有更多了~

关于作者

千秋岁

暂无简介

文章

25 人气

关注发私信

微信用户

文章 0 评论 0

关注

小情绪

文章 0 评论 0

关注

追我者格杀勿论

文章 0 评论 0

关注

ゞ记忆︶ㄣ

文章 0 评论 0

关注

笨死的猪

文章 0 评论 0

关注

彭明超

文章 0 评论 0

友情链接

文江博客

python字典根据日期检查重复项

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

微信用户

小情绪

追我者格杀勿论

ゞ记忆︶ㄣ

笨死的猪

彭明超

友情链接

python字典根据日期检查重复项

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

微信用户

小情绪

追我者格杀勿论

ゞ记忆︶ㄣ

笨死的猪

彭明超

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。