写入文件时出现 UnicodeEncodeError

发布于 2024-11-27 15:21:40 字数 264 浏览 0 评论 0原文

我正在尝试将一些字符串写入文件（这些字符串是由 HTML 解析器 BeautifulSoup 提供给我的）。

我可以使用“print”来显示它们，但是当我使用 file.write() 时，出现以下错误：

UnicodeEncodeError: 'ascii' codec can't encode character u'\xa3' in position 6: ordinal not in range(128)

如何解析它？

原文

I am trying to write some strings to a file (the strings have been given to me by the HTML parser BeautifulSoup).

I can use "print" to display them, but when I use file.write() I get the following error:

UnicodeEncodeError: 'ascii' codec can't encode character u'\xa3' in position 6: ordinal not in range(128)

How can I parse this?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

焚却相思 2024-12-04 15:21:40

当您将包含非英语字符（超过 128 的 Unicode 字符）的 Unicode 字符串传递给需要 ASCII 字节字符串的内容时，会发生此错误。 Python 字节串的默认编码是 ASCII，“它正好处理 128 个（英语）字符”。这就是尝试转换超过 128 的 Unicode 字符会产生错误的原因。

unicode()

unicode(string[, encoding, errors])

构造函数的签名为 unicode(string[,encoding,errors])。它的所有参数都应该是 8 位字符串。

第一个参数使用指定的编码转换为 Unicode； 如果省略编码参数，则使用 ASCII 编码进行转换，因此大于 127 的字符将被视为错误

，

s = u'La Pe\xf1a' 
print s.encode('latin-1')

或者

write(s.encode('latin-1'))

使用 latin-1 进行编码

This error occurs when you pass a Unicode string containing non-English characters (Unicode characters beyond 128) to something that expects an ASCII bytestring. The default encoding for a Python bytestring is ASCII, "which handles exactly 128 (English) characters". This is why trying to convert Unicode characters beyond 128 produces the error.

The unicode()

unicode(string[, encoding, errors])

constructor has the signature unicode(string[, encoding, errors]). All of its arguments should be 8-bit strings.

The first argument is converted to Unicode using the specified encoding; if you leave off the encoding argument, the ASCII encoding is used for the conversion, so characters greater than 127 will be treated as errors

for example

s = u'La Pe\xf1a' 
print s.encode('latin-1')

write(s.encode('latin-1'))

will encode using latin-1

回复收藏 0 原文

迷路的信 2024-12-04 15:21:40

我试过这个效果很好

with open(r"C:\rag\sampleoutput.txt", 'w', encoding="utf-8") as f:

I tried this it works fine

with open(r"C:\rag\sampleoutput.txt", 'w', encoding="utf-8") as f:

回复收藏 0 原文

余生再见 2024-12-04 15:21:40

您问题的答案是“使用编解码器”。附加的代码还展示了一些 gettext 魔法，FWIW。 http://wiki.wxpython.org/Internationalization

import codecs

import gettext

localedir = './locale'
langid = wx.LANGUAGE_DEFAULT # use OS default; or use LANGUAGE_JAPANESE, etc.
domain = "MyApp"             
mylocale = wx.Locale(langid)
mylocale.AddCatalogLookupPathPrefix(localedir)
mylocale.AddCatalog(domain)

translater = gettext.translation(domain, localedir, 
                                 [mylocale.GetCanonicalName()], fallback = True)
translater.install(unicode = True)

# translater.install() installs the gettext _() translater function into our namespace...

msg = _("A message that gettext will translate, probably putting Unicode in here")

# use codecs.open() to convert Unicode strings to UTF8

Logfile = codecs.open(logfile_name, 'w', encoding='utf-8')

Logfile.write(msg + '\n')

尽管 Google 对这个问题充满了点击，但我发现很难找到这个简单的解决方案（它实际上在关于 Unicode 的 Python 文档中，但是被埋藏起来了）。

所以...HTH...

GaJ

The answer to your question is "use codecs". The appeded code also shows some gettext magic, FWIW. http://wiki.wxpython.org/Internationalization

import codecs

import gettext

localedir = './locale'
langid = wx.LANGUAGE_DEFAULT # use OS default; or use LANGUAGE_JAPANESE, etc.
domain = "MyApp"             
mylocale = wx.Locale(langid)
mylocale.AddCatalogLookupPathPrefix(localedir)
mylocale.AddCatalog(domain)

translater = gettext.translation(domain, localedir, 
                                 [mylocale.GetCanonicalName()], fallback = True)
translater.install(unicode = True)

# translater.install() installs the gettext _() translater function into our namespace...

msg = _("A message that gettext will translate, probably putting Unicode in here")

# use codecs.open() to convert Unicode strings to UTF8

Logfile = codecs.open(logfile_name, 'w', encoding='utf-8')

Logfile.write(msg + '\n')

Despite Google being full of hits on this problem, I found it rather hard to find this simple solution (it is actually in the Python docs about Unicode, but rather burried).

So ... HTH...

GaJ

回复收藏 0 原文

~没有更多了~