写入文件时出现 UnicodeEncodeError

发布于 2024-11-27 15:21:40 字数 264 浏览 0 评论 0原文

我正在尝试将一些字符串写入文件(这些字符串是由 HTML 解析器 BeautifulSoup 提供给我的)。

我可以使用“print”来显示它们,但是当我使用 file.write() 时,出现以下错误:

UnicodeEncodeError: 'ascii' codec can't encode character u'\xa3' in position 6: ordinal not in range(128)

如何解析它?

I am trying to write some strings to a file (the strings have been given to me by the HTML parser BeautifulSoup).

I can use "print" to display them, but when I use file.write() I get the following error:

UnicodeEncodeError: 'ascii' codec can't encode character u'\xa3' in position 6: ordinal not in range(128)

How can I parse this?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

焚却相思 2024-12-04 15:21:40

当您将包含非英语字符(超过 128 的 Unicode 字符)的 Unicode 字符串传递给需要 ASCII 字节字符串的内容时,会发生此错误。 Python 字节串的默认编码是 ASCII,“它正好处理 128 个(英语)字符”。这就是尝试转换超过 128 的 Unicode 字符会产生错误的原因。

unicode()

unicode(string[, encoding, errors])

构造函数的签名为 unicode(string[,encoding,errors])。它的所有参数都应该是 8 位字符串。

第一个参数使用指定的编码转换为 Unicode; 如果省略编码参数,则使用 ASCII 编码进行转换,因此大于 127 的字符将被视为错误

s = u'La Pe\xf1a' 
print s.encode('latin-1')

或者

write(s.encode('latin-1'))

使用 latin-1 进行编码

This error occurs when you pass a Unicode string containing non-English characters (Unicode characters beyond 128) to something that expects an ASCII bytestring. The default encoding for a Python bytestring is ASCII, "which handles exactly 128 (English) characters". This is why trying to convert Unicode characters beyond 128 produces the error.

The unicode()

unicode(string[, encoding, errors])

constructor has the signature unicode(string[, encoding, errors]). All of its arguments should be 8-bit strings.

The first argument is converted to Unicode using the specified encoding; if you leave off the encoding argument, the ASCII encoding is used for the conversion, so characters greater than 127 will be treated as errors

for example

s = u'La Pe\xf1a' 
print s.encode('latin-1')

or

write(s.encode('latin-1'))

will encode using latin-1

迷路的信 2024-12-04 15:21:40

我试过这个效果很好

with open(r"C:\rag\sampleoutput.txt", 'w', encoding="utf-8") as f:  

I tried this it works fine

with open(r"C:\rag\sampleoutput.txt", 'w', encoding="utf-8") as f:  
余生再见 2024-12-04 15:21:40

您问题的答案是“使用编解码器”。附加的代码还展示了一些 gettext 魔法,FWIW。 http://wiki.wxpython.org/Internationalization

import codecs

import gettext

localedir = './locale'
langid = wx.LANGUAGE_DEFAULT # use OS default; or use LANGUAGE_JAPANESE, etc.
domain = "MyApp"             
mylocale = wx.Locale(langid)
mylocale.AddCatalogLookupPathPrefix(localedir)
mylocale.AddCatalog(domain)

translater = gettext.translation(domain, localedir, 
                                 [mylocale.GetCanonicalName()], fallback = True)
translater.install(unicode = True)

# translater.install() installs the gettext _() translater function into our namespace...

msg = _("A message that gettext will translate, probably putting Unicode in here")

# use codecs.open() to convert Unicode strings to UTF8

Logfile = codecs.open(logfile_name, 'w', encoding='utf-8')

Logfile.write(msg + '\n')

尽管 Google 对这个问题充满了点击,但我发现很难找到这个简单的解决方案(它实际上在关于 Unicode 的 Python 文档中,但是被埋藏起来了)。

所以...HTH...

GaJ

The answer to your question is "use codecs". The appeded code also shows some gettext magic, FWIW. http://wiki.wxpython.org/Internationalization

import codecs

import gettext

localedir = './locale'
langid = wx.LANGUAGE_DEFAULT # use OS default; or use LANGUAGE_JAPANESE, etc.
domain = "MyApp"             
mylocale = wx.Locale(langid)
mylocale.AddCatalogLookupPathPrefix(localedir)
mylocale.AddCatalog(domain)

translater = gettext.translation(domain, localedir, 
                                 [mylocale.GetCanonicalName()], fallback = True)
translater.install(unicode = True)

# translater.install() installs the gettext _() translater function into our namespace...

msg = _("A message that gettext will translate, probably putting Unicode in here")

# use codecs.open() to convert Unicode strings to UTF8

Logfile = codecs.open(logfile_name, 'w', encoding='utf-8')

Logfile.write(msg + '\n')

Despite Google being full of hits on this problem, I found it rather hard to find this simple solution (it is actually in the Python docs about Unicode, but rather burried).

So ... HTH...

GaJ

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文