如何使我的CSV/Excel文件从多个输出中编译数据？

发布于 2025-02-04 05:14:29 字数 4663 浏览 3 评论 0原文

制作一个代码来扫描我的电子邮件，以寻找某种模式。我的目标是制作一个CSV文件，其中一个文件中列出了所有事件，但是我的代码仅将最后一封电子邮件添加到CSV中。这是输入：

pattern = re.compile(
            r"([a-zA-Z]+[0-9]+) Line ([0-9]+) Seq ([0-9]) ([0-9]+/[0-9]+/[0-9]+)")
matches = pattern.finditer(body)

with open("data.csv", "w") as f_out:
    writer = csv.writer(f_out)
    writer.writerows(map(lambda m: m.groups(), matches))

电子邮件通过以下内容：

第一封电子邮件：

PUU128378 Line 20 Seq 1 5/22/2023

PUN102939 Line 100 Seq 8 11/1/2024

PUU012939 Line 120 Seq 4 1/1/2025

第二封电子邮件：

PUU128377 Line 20 Seq 1 5/22/2023

PUN102938 Line 100 Seq 8 11/1/2024

PUU012938 Line 120 Seq 4 1/1/2025

excel文件看起来像：

我希望它看起来像：

我的代码：

for i in range(messages, messages-N, -1):
    # fetch the email message by ID
    res, msg = imap.fetch(str(i), "(RFC822)")
    for response in msg:
        if isinstance(response, tuple):
            # parse a bytes email into a message object
            msg = email.message_from_bytes(response[1])
            # decode the email subject
            subject, encoding = decode_header(msg["Subject"])[0]
            if isinstance(subject, bytes):
                # if it's a bytes, decode to str
                subject = subject.decode(encoding)
            # decode email sender
            From, encoding = decode_header(msg.get("From"))[0]
            if isinstance(From, bytes):
                From = From.decode(encoding)
            print("Subject:", subject)
            print("From:", From)
            
            # if the email message is multipart
            if msg.is_multipart():
                # iterate over email parts
                for part in msg.walk():
                    # extract content type of email
                    content_type = part.get_content_type()
                    content_disposition = str(part.get("Content-Disposition"))
                    try:
                        # get the email body
                        body = part.get_payload(decode=True).decode()
                    except:
                        pass
                    
                    if content_type == "text/plain" and "attachment" not in content_disposition:
                        # print text/plain emails and skip attachments
                            pattern = re.compile(r"([a-zA-Z]+[0-9]+) Line ([0-9]+) Seq ([0-9]) ([0-9]+/[0-9]+/[0-9]+)")
                            matches = pattern.finditer(body)
                            with open("data.csv", "w") as f_out:
                                writer = csv.writer(f_out)
                                writer.writerows(map(lambda m: m.groups(), matches))
                            for match in matches:
                                print(match)

新编辑

with open("data.csv", "w") as f_out:
writer = csv.writer(f_out)

for i in range(messages, messages-N, -1):
    # fetch the email message by ID
    res, msg = imap.fetch(str(i), "(RFC822)")
    for response in msg:
        if isinstance(response, tuple):
            # parse a bytes email into a message object
            msg = email.message_from_bytes(response[1])
            # decode the email subject
            subject, encoding = decode_header(msg["Subject"])[0]
            if isinstance(subject, bytes):
                # if it's a bytes, decode to str
                subject = subject.decode(encoding)
            # decode email sender
            From, encoding = decode_header(msg.get("From"))[0]
            if isinstance(From, bytes):
                From = From.decode(encoding)
            print("Subject:", subject)
            print("From:", From)

                # iterate over email parts
            for part in msg.walk():
                # extract content type of email
                content_type = part.get_content_type()
                content_disposition = str(part.get("Content-Disposition"))
                payload = part.get_payload(decode=True)
                if payload is None:
                    continue
                body = payload.decode()
                    
                pattern = re.compile(
                    r"([a-zA-Z]+[0-9]+) Line ([0-9]+) Seq ([0-9]) ([0-9]+/[0-9]+/[0-9]+)")
                matches = pattern.finditer(body)
                writer.writerows(map(lambda m: m.groups(), matches))

原文

Making a code that scans my emails looking for a certain pattern. My goal is to make a csv file with all the occurrences listed in one file, but my code adds ONLY the last email into the csv. Here's the input:

pattern = re.compile(
            r"([a-zA-Z]+[0-9]+) Line ([0-9]+) Seq ([0-9]) ([0-9]+/[0-9]+/[0-9]+)")
matches = pattern.finditer(body)

with open("data.csv", "w") as f_out:
    writer = csv.writer(f_out)
    writer.writerows(map(lambda m: m.groups(), matches))

The emails ran through are the following:

First email:

PUU128378 Line 20 Seq 1 5/22/2023

PUN102939 Line 100 Seq 8 11/1/2024

PUU012939 Line 120 Seq 4 1/1/2025

Second email:

PUU128377 Line 20 Seq 1 5/22/2023

PUN102938 Line 100 Seq 8 11/1/2024

PUU012938 Line 120 Seq 4 1/1/2025

The excel file looks like:

I would like it to look like:

rest of my code:

for i in range(messages, messages-N, -1):
    # fetch the email message by ID
    res, msg = imap.fetch(str(i), "(RFC822)")
    for response in msg:
        if isinstance(response, tuple):
            # parse a bytes email into a message object
            msg = email.message_from_bytes(response[1])
            # decode the email subject
            subject, encoding = decode_header(msg["Subject"])[0]
            if isinstance(subject, bytes):
                # if it's a bytes, decode to str
                subject = subject.decode(encoding)
            # decode email sender
            From, encoding = decode_header(msg.get("From"))[0]
            if isinstance(From, bytes):
                From = From.decode(encoding)
            print("Subject:", subject)
            print("From:", From)
            
            # if the email message is multipart
            if msg.is_multipart():
                # iterate over email parts
                for part in msg.walk():
                    # extract content type of email
                    content_type = part.get_content_type()
                    content_disposition = str(part.get("Content-Disposition"))
                    try:
                        # get the email body
                        body = part.get_payload(decode=True).decode()
                    except:
                        pass
                    
                    if content_type == "text/plain" and "attachment" not in content_disposition:
                        # print text/plain emails and skip attachments
                            pattern = re.compile(r"([a-zA-Z]+[0-9]+) Line ([0-9]+) Seq ([0-9]) ([0-9]+/[0-9]+/[0-9]+)")
                            matches = pattern.finditer(body)
                            with open("data.csv", "w") as f_out:
                                writer = csv.writer(f_out)
                                writer.writerows(map(lambda m: m.groups(), matches))
                            for match in matches:
                                print(match)

New Edit

with open("data.csv", "w") as f_out:
writer = csv.writer(f_out)

for i in range(messages, messages-N, -1):
    # fetch the email message by ID
    res, msg = imap.fetch(str(i), "(RFC822)")
    for response in msg:
        if isinstance(response, tuple):
            # parse a bytes email into a message object
            msg = email.message_from_bytes(response[1])
            # decode the email subject
            subject, encoding = decode_header(msg["Subject"])[0]
            if isinstance(subject, bytes):
                # if it's a bytes, decode to str
                subject = subject.decode(encoding)
            # decode email sender
            From, encoding = decode_header(msg.get("From"))[0]
            if isinstance(From, bytes):
                From = From.decode(encoding)
            print("Subject:", subject)
            print("From:", From)

                # iterate over email parts
            for part in msg.walk():
                # extract content type of email
                content_type = part.get_content_type()
                content_disposition = str(part.get("Content-Disposition"))
                payload = part.get_payload(decode=True)
                if payload is None:
                    continue
                body = payload.decode()
                    
                pattern = re.compile(
                    r"([a-zA-Z]+[0-9]+) Line ([0-9]+) Seq ([0-9]) ([0-9]+/[0-9]+/[0-9]+)")
                matches = pattern.finditer(body)
                writer.writerows(map(lambda m: m.groups(), matches))

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

多谢你的绝情让我学会死心 2025-02-11 05:14:29

您仅将最后一封电子邮件放入CSV文件的原因是因为您的代码 oftrites 每次处理消息时，任何现有的.csv文件。解决方案是仅在消息检索循环之外打开文件曾经的。

以下是您根据您在问题中添加的代码所建议的概述：

with open("data.csv", "w") as f_out:
    writer = csv.writer(f_out)
    pattern = re.compile(
        r"([a-zA-Z]+[0-9]+) Line ([0-9]+) Seq ([0-9]) ([0-9]+/[0-9]+/[0-9]+)")

    for i in range(messages, messages-N, -1):
        # fetch the email message by ID
        res, msg = imap.fetch(str(i), "(RFC822)")
        for response in msg:
            ...
            try:
                body = part.get_payload(decode=True).decode()
            except:  # NOTE it is bad to have non-specific except clauses like this.
                pass
            ...
            if content_type == "text/plain" and "attachment" not in content_disposition:
                # print text/plain emails and skip attachments
                matches = pattern.finditer(body)
                writer.writerows(map(lambda m: m.groups(), matches))
                for match in matches:
                    print(match)

The reason you're only getting the last email put into the CSV file is because your code overwrites any existing .csv file each time it processes a message. The solution is to only open the file for writing once outside the message retrieval loop.

Below is an outline of what I'm suggesting based on the code you added to your question:

with open("data.csv", "w") as f_out:
    writer = csv.writer(f_out)
    pattern = re.compile(
        r"([a-zA-Z]+[0-9]+) Line ([0-9]+) Seq ([0-9]) ([0-9]+/[0-9]+/[0-9]+)")

    for i in range(messages, messages-N, -1):
        # fetch the email message by ID
        res, msg = imap.fetch(str(i), "(RFC822)")
        for response in msg:
            ...
            try:
                body = part.get_payload(decode=True).decode()
            except:  # NOTE it is bad to have non-specific except clauses like this.
                pass
            ...
            if content_type == "text/plain" and "attachment" not in content_disposition:
                # print text/plain emails and skip attachments
                matches = pattern.finditer(body)
                writer.writerows(map(lambda m: m.groups(), matches))
                for match in matches:
                    print(match)

回复收藏 0 原文

~没有更多了~