python字典根据日期检查重复项

发布于 2025-01-09 15:42:12 字数 2100 浏览 0 评论 0原文

所以我要循环一个目录并且我正在读取一些 JSON 文件 在这些文件上,我解析出 4 个键,然后创建一个包含所有解析出的数据的 CSV 文件。

碰巧我有重复的条目,所以我想根据日期(较新的)消除重复项,然后重新写入? CSV 不确定如何实现它,

例如:

def mdy_to_ymd(d):
    # convert the date into comparable string
    cor_date = datetime.strptime(d, '%b %d %Y').strftime('%d/%m/%Y')
    return time.strptime(cor_date, "%d/%m/%Y")


def date_converter(date):  # convert the date to readable string for csv
    return datetime.strptime(date, '%b %d %Y').strftime('%d/%m/%Y')


def csv_generator(path):  # creating the csv
    list_json = []
    ffresult = []
    duplicate_dict = {}
    for file in os.listdir(path):  # iterating through the directory with the files
        fresult = []
        with open(f"{directory}/{file}", "r") as result:  # opening the json file
            templates = json.load(result)
            hostname_str = file.split(".")
            site_code_str = (f"{file[:5]}")
            datetime_str3 = (mdy_to_ymd(datetime_str2))  # converting the date to comparable data
            duplicate_dict[hostname_str[0]] = datetime_str3
            """?? i am creating a 
            dictionary which as key has the hostname and as date has the date 
            but it doesnt work since when there is the same hostname it only updates the current key and there are 
            not duplicates but it doesnt guarantee there are only the newest based on date"""
            fresult.append(site_code_str)
            fresult.append(hostname_str[0])
            fresult.append((templates["execution_status"]))
            fresult.append(date_converter(datetime_str2))
            fresult.append(templates["protocol_name"])
            fresult.append(templates["protocol_version"])
            ffresult.append(fresult)


# i append the values i need into 2 lists
with open("jsondicts.csv", "w") as dst:
    writetoit = csv.writer(dst)
    writetoit.writerows(csv_generator(directory))
# this is how i write to csv so right now i have duplicate values on the csv

我只想有基于主机名的唯一值,但也只有最新的值 基于日期的唯一数据,当然还有其他解析出的数据(协议名称、站点代码等)

So I am for looping over a directory and I am reading some JSON files
on those files, I parse out 4 keys and then I create a CSV file with all the parsed out data

It happens that I have duplicate entries so I want to eliminate duplicates based on date(newer) and then re-write? the CSV not sure how to implement it

e.g:

def mdy_to_ymd(d):
    # convert the date into comparable string
    cor_date = datetime.strptime(d, '%b %d %Y').strftime('%d/%m/%Y')
    return time.strptime(cor_date, "%d/%m/%Y")


def date_converter(date):  # convert the date to readable string for csv
    return datetime.strptime(date, '%b %d %Y').strftime('%d/%m/%Y')


def csv_generator(path):  # creating the csv
    list_json = []
    ffresult = []
    duplicate_dict = {}
    for file in os.listdir(path):  # iterating through the directory with the files
        fresult = []
        with open(f"{directory}/{file}", "r") as result:  # opening the json file
            templates = json.load(result)
            hostname_str = file.split(".")
            site_code_str = (f"{file[:5]}")
            datetime_str3 = (mdy_to_ymd(datetime_str2))  # converting the date to comparable data
            duplicate_dict[hostname_str[0]] = datetime_str3
            """?? i am creating a 
            dictionary which as key has the hostname and as date has the date 
            but it doesnt work since when there is the same hostname it only updates the current key and there are 
            not duplicates but it doesnt guarantee there are only the newest based on date"""
            fresult.append(site_code_str)
            fresult.append(hostname_str[0])
            fresult.append((templates["execution_status"]))
            fresult.append(date_converter(datetime_str2))
            fresult.append(templates["protocol_name"])
            fresult.append(templates["protocol_version"])
            ffresult.append(fresult)


# i append the values i need into 2 lists
with open("jsondicts.csv", "w") as dst:
    writetoit = csv.writer(dst)
    writetoit.writerows(csv_generator(directory))
# this is how i write to csv so right now i have duplicate values on the csv

I want to have only unique values based on hostname but also only the newest
unique ones based on the date of course also the other parsed out data (protocol name, site code, etc)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

旧时浪漫 2025-01-16 15:42:12

这解决了我必须使用 pandas lib

result_pan_xls = (result_pan.sort_values(by="Execution_Date").drop_duplicates(subset="HOSTNAME",keep="last"))

this solves it i had to use pandas lib though

result_pan_xls = (result_pan.sort_values(by="Execution_Date").drop_duplicates(subset="HOSTNAME",keep="last"))
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文