解码UTF-8编码的CSV文件时出错

发布于 2025-01-17 17:22:50 字数 1811 浏览 1 评论 0原文

第一次在这里发帖:)

我在尝试从电子邮件收件箱下载所有附件时遇到问题。我下载它们，然后将它们写入一个文件，并指定其路径。它非常适合直接下载到文件的 .png 文件，但当涉及 .csv 文件时，它会给出以下错误消息：

OSError: [Errno 22] Invalid argument: 'C:\Users\antoi\OneDrive\Bureau\python_secge\=?UTF-8?B?RXh0cmFjdCBzZXggZ8OpIHB1YmxpYy0yMDIyLTAzLTI2LTAwLTAwLTI2LmNzdg==?='

我认为它不能很好地解码 csv 文件的名称，但我不'不知道为什么。

感谢您的帮助！

如果您想查看下面的代码：

import smtplib
import imaplib
import base64
import os
import email

smtp_address = 'smtp.gmail.com'
smtp_port = 465
email_user = 'XXXX'
email_pass = 'XXXXX'

mail = imaplib.IMAP4_SSL('imap.gmail.com',993)
mail.login(email_user, email_pass)
mail.select('Inbox')
type, data = mail.search(None, 'ALL')
mail_ids=data[0]
idlist=mail_ids.split()


for num in data[0].split():
    typ, data = mail.fetch(num, '(RFC822)' )
    raw_email = data[0][1]
# converts byte literal to string removing b''
    raw_email_string = raw_email.decode('utf-8')
    email_message = email.message_from_string(raw_email_string)
# downloading attachments
    for part in email_message.walk():
        if part.get_content_maintype() == 'multipart':
            continue
        if part.get('Content-Disposition') is None:
            continue
        fileName = part.get_filename()
        
        if bool(fileName):
            filePath = os.path.join(r'C:\Users\antoi\OneDrive\Bureau\python_secge', fileName)
            if not os.path.isfile(filePath) :
                fp = open(filePath, 'wb')
                fp.write(part.get_payload(decode=True))
                fp.close()
            subject = str(email_message).split("Subject: ", 1)

我尝试更改 csv 文件的名称，该文件下载良好，但其内容就像未解码一样：

              #   #*%%*525EE\ÿÂ ¿ A"  ÿÄ7 ÿÚ  iÙWúßóÓ ÅIq«‚ÙÊŸ§ˆ˜²‚6`Ø¶ p²#áíîŸÐà ¼ïDù÷.ŽéCÅ >ªþ®|…dÕË' <å8
!õÑàäH¬

原文

first time posting here :)

I have an issue while trying to download all the attachments from an email Inbox.
I download them, then write them into a file, which I specify the path to.
It works perfectly well for .png files, which are directly dowloaded to the file, but when it comes to a .csv file, it gives me this error message :

OSError: [Errno 22] Invalid argument: 'C:\Users\antoi\OneDrive\Bureau\python_secge\=?UTF-8?B?RXh0cmFjdCBzZXggZ8OpIHB1YmxpYy0yMDIyLTAzLTI2LTAwLTAwLTI2LmNzdg==?='

I think it does not decode well the name of the csv file, but I don't know why.

Thanks for your help!

If you want to look at my code below :

import smtplib
import imaplib
import base64
import os
import email

smtp_address = 'smtp.gmail.com'
smtp_port = 465
email_user = 'XXXX'
email_pass = 'XXXXX'

mail = imaplib.IMAP4_SSL('imap.gmail.com',993)
mail.login(email_user, email_pass)
mail.select('Inbox')
type, data = mail.search(None, 'ALL')
mail_ids=data[0]
idlist=mail_ids.split()


for num in data[0].split():
    typ, data = mail.fetch(num, '(RFC822)' )
    raw_email = data[0][1]
# converts byte literal to string removing b''
    raw_email_string = raw_email.decode('utf-8')
    email_message = email.message_from_string(raw_email_string)
# downloading attachments
    for part in email_message.walk():
        if part.get_content_maintype() == 'multipart':
            continue
        if part.get('Content-Disposition') is None:
            continue
        fileName = part.get_filename()
        
        if bool(fileName):
            filePath = os.path.join(r'C:\Users\antoi\OneDrive\Bureau\python_secge', fileName)
            if not os.path.isfile(filePath) :
                fp = open(filePath, 'wb')
                fp.write(part.get_payload(decode=True))
                fp.close()
            subject = str(email_message).split("Subject: ", 1)

I tried to change the name of the csv file, which downloaded well, but the content of it was as if it was not decoded :

              #   #*%%*525EE\ÿÂ ¿ A"  ÿÄ7 ÿÚ  iÙWúßóÓ ÅIq«‚ÙÊŸ§ˆ˜²‚6`Ø¶ p²#áíîŸÐà ¼ïDù÷.ŽéCÅ >ªþ®|…dÕË' <å8
!õÑàäH¬

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

温馨耳语 2025-01-24 17:22:50

https://dmorgan.info/posts/encoded-word-syntax/”中被盗。

import re
import base64
import quopri

def encoded_words_to_text(encoded_words):
    encoded_word_regex = r'=\?{1}(.+)\?{1}([B|Q])\?{1}(.+)\?{1}='
    charset, encoding, encoded_text = re.match(encoded_word_regex,
                                               encoded_words).groups()
    if encoding == 'B':
        byte_string = base64.b64decode(encoded_text)
    elif encoding == 'Q':
        byte_string = quopri.decodestring(encoded_text)
    return byte_string.decode(charset)

从

filename = '=?UTF-8?B?RXh0cmFjdCBzZXggZ8OpIHB1YmxpYy0yMDIyLTAzLTI2LTAwLTAwLTI2LmNzdg==?='
encoded_words_to_text(filename)

 '提取性爱公共-2022-03-26-00-00-26.csv'

^{¹和消除语法：“ IS”具有字面意思。您的意思是“ ==”？}

Stolen from Encoded-word Syntax¹:

import re
import base64
import quopri

def encoded_words_to_text(encoded_words):
    encoded_word_regex = r'=\?{1}(.+)\?{1}([B|Q])\?{1}(.+)\?{1}='
    charset, encoding, encoded_text = re.match(encoded_word_regex,
                                               encoded_words).groups()
    if encoding == 'B':
        byte_string = base64.b64decode(encoded_text)
    elif encoding == 'Q':
        byte_string = quopri.decodestring(encoded_text)
    return byte_string.decode(charset)

Apply as follows:

filename = '=?UTF-8?B?RXh0cmFjdCBzZXggZ8OpIHB1YmxpYy0yMDIyLTAzLTI2LTAwLTAwLTI2LmNzdg==?='
encoded_words_to_text(filename)

'Extract sex gé public-2022-03-26-00-00-26.csv'

^{¹ and eliminated SyntaxWarning: "is" with a literal. Did you mean "=="?}

回复收藏 0 原文

~没有更多了~

关于作者

蓝天白云

暂无简介

文章

26 人气

关注发私信

友情链接

文江博客

解码UTF-8编码的CSV文件时出错

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

饮湿

明月

02

hs1283

风向决定发型

落花浅忆

友情链接

解码UTF-8编码的CSV文件时出错

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

饮湿

明月

02

hs1283

风向决定发型

落花浅忆

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。