“UnicodeEncodeError：”ascii“编解码器无法编码字符”

发布于 2024-08-09 17:22:43 字数 1710 浏览 6 评论 0原文

我试图通过正则表达式传递大的随机 html 字符串，而我的 Python 2.6 脚本对此感到窒息：

UnicodeEncodeError: 'ascii' 编解码器无法编码字符

我将其追溯到该单词末尾的商标上标： Protection™——我希望将来能遇到类似的人。

有没有处理非ascii字符的模块？或者，在 python 中处理/转义非 ascii 内容的最佳方法是什么？

谢谢！完整错误：

E
======================================================================
ERROR: test_untitled (__main__.Untitled)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\Python26\Test2.py", line 26, in test_untitled
    ofile.write(Whois + '\n')
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2122' in position 1005: ordinal not in range(128)

完整脚本：

from selenium import selenium
import unittest, time, re, csv, logging

class Untitled(unittest.TestCase):
    def setUp(self):
        self.verificationErrors = []
        self.selenium = selenium("localhost", 4444, "*firefox", "http://www.BaseDomain.com/")
        self.selenium.start()
        self.selenium.set_timeout("90000")

    def test_untitled(self):
        sel = self.selenium
        spamReader = csv.reader(open('SubDomainList.csv', 'rb'))
        for row in spamReader:
            sel.open(row[0])
            time.sleep(10)
            Test = sel.get_text("//html/body/div/table/tbody/tr/td/form/div/table/tbody/tr[7]/td")
            Test = Test.replace(",","")
            Test = Test.replace("\n", "")
            ofile = open('TestOut.csv', 'ab')
            ofile.write(Test + '\n')
            ofile.close()

    def tearDown(self):
        self.selenium.stop()
        self.assertEqual([], self.verificationErrors)

if __name__ == "__main__":
    unittest.main()

原文

I'm trying to pass big strings of random html through regular expressions and my Python 2.6 script is choking on this:

UnicodeEncodeError: 'ascii' codec can't encode character

I traced it back to a trademark superscript on the end of this word: Protection™ -- and I expect to encounter others like it in the future.

Is there a module to process non-ascii characters? or, what is the best way to handle/escape non-ascii stuff in python?

Thanks!
Full error:

E
======================================================================
ERROR: test_untitled (__main__.Untitled)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\Python26\Test2.py", line 26, in test_untitled
    ofile.write(Whois + '\n')
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2122' in position 1005: ordinal not in range(128)

Full Script:

from selenium import selenium
import unittest, time, re, csv, logging

class Untitled(unittest.TestCase):
    def setUp(self):
        self.verificationErrors = []
        self.selenium = selenium("localhost", 4444, "*firefox", "http://www.BaseDomain.com/")
        self.selenium.start()
        self.selenium.set_timeout("90000")

    def test_untitled(self):
        sel = self.selenium
        spamReader = csv.reader(open('SubDomainList.csv', 'rb'))
        for row in spamReader:
            sel.open(row[0])
            time.sleep(10)
            Test = sel.get_text("//html/body/div/table/tbody/tr/td/form/div/table/tbody/tr[7]/td")
            Test = Test.replace(",","")
            Test = Test.replace("\n", "")
            ofile = open('TestOut.csv', 'ab')
            ofile.write(Test + '\n')
            ofile.close()

    def tearDown(self):
        self.selenium.stop()
        self.assertEqual([], self.verificationErrors)

if __name__ == "__main__":
    unittest.main()

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

穿透光 2024-08-16 17:22:43

您正在尝试在“严格”模式下将 unicode 转换为 ascii：

>>> help(str.encode)
Help on method_descriptor:

encode(...)
    S.encode([encoding[,errors]]) -> object

    Encodes S using the codec registered for encoding. encoding defaults
    to the default encoding. errors may be given to set a different error
    handling scheme. Default is 'strict' meaning that encoding errors raise
    a UnicodeEncodeError. Other possible values are 'ignore', 'replace' and
    'xmlcharrefreplace' as well as any other name registered with
    codecs.register_error that is able to handle UnicodeEncodeErrors.

您可能需要类似以下内容之一：

s = u'Protection™'

print s.encode('ascii', 'ignore')    # removes the ™
print s.encode('ascii', 'replace')   # replaces with ?
print s.encode('ascii','xmlcharrefreplace') # turn into xml entities
print s.encode('ascii', 'strict')    # throw UnicodeEncodeErrors

You're trying to convert unicode to ascii in "strict" mode:

>>> help(str.encode)
Help on method_descriptor:

encode(...)
    S.encode([encoding[,errors]]) -> object

    Encodes S using the codec registered for encoding. encoding defaults
    to the default encoding. errors may be given to set a different error
    handling scheme. Default is 'strict' meaning that encoding errors raise
    a UnicodeEncodeError. Other possible values are 'ignore', 'replace' and
    'xmlcharrefreplace' as well as any other name registered with
    codecs.register_error that is able to handle UnicodeEncodeErrors.

You probably want something like one of the following:

s = u'Protection™'

print s.encode('ascii', 'ignore')    # removes the ™
print s.encode('ascii', 'replace')   # replaces with ?
print s.encode('ascii','xmlcharrefreplace') # turn into xml entities
print s.encode('ascii', 'strict')    # throw UnicodeEncodeErrors

回复收藏 0 原文

泅人 2024-08-16 17:22:43

您尝试将字节字符串传递给某些内容，但不可能（由于您提供的信息匮乏）告诉您尝试将其传递给什么。您从无法编码为 ASCII（默认编解码器）的 Unicode 字符串开始，因此，您必须使用某种不同的编解码器进行编码（或音译它，如 @R.Pate 所建议的那样）——但它不可能用于说出您应该使用什么编解码器，因为我们不知道您传递的字节串是什么，因此不知道未知子系统将能够正确接受和处理什么编解码器。

在你让我们陷入一片漆黑的情况下，utf-8 是一个合理的盲目猜测（因为它是一种编解码器，可以将任何 Unicode 字符串精确地表示为字节串，并且它是用于许多用途的标准编解码器，例如 XML）——但这只能是一种盲目的猜测，除非您要告诉我们更多关于您试图将该字节串传递到的什么的信息，并且出于什么目的。

传递 thestring.encode('utf-8') 而不是裸露的 thestring 肯定会避免您现在看到的特定错误，但它可能会导致特殊的显示（或无论你试图用该字节串做什么！）除非接收者准备好、愿意并且能够接受 utf-8 编码（我们怎么可能知道，对什么有绝对零的了解）收件人可能是？！-)

回复收藏 0 原文

陪你搞怪i 2024-08-16 17:22:43

“最好”的方法始终取决于您的要求；那么，你的是什么？忽略非 ASCII 是否合适？您应该将 ™ 替换为“(tm)”吗？（对于这个示例来说，这看起来很奇特，但对于其他代码点来说很快就会崩溃 - 但它可能正是您想要的。）异常是否正是您所需要的？现在你只需要以某种方式处理它？

只有你才能真正回答这个问题。

回复收藏 0 原文

幸福还没到 2024-08-16 17:22:43

首先，尝试安装英语语言（或任何其他语言，如果需要）的翻译：

sudo apt-get install language-pack-en

它为所有支持的软件包（包括 Python）提供翻译数据更新。

并确保在代码中使用正确的编码。

例如：

open(foo, encoding='utf-8')

然后仔细检查您的系统配置，例如 LANG 的值或区域设置的配置 (/etc/default/locale)，并且不要忘记重新登录您的会话。

First of all, try installing translations for English language (or any other if needed):

sudo apt-get install language-pack-en

which provides translation data updates for all supported packages (including Python).

And make sure you use the right encoding in your code.

For example:

open(foo, encoding='utf-8')

Then double check your system configuration like value of LANG or configuration of locale (/etc/default/locale) and don't forget to re-login your session.

回复收藏 0 原文

~没有更多了~