UnicodeEncodeError:“ascii”编解码器无法对字符 u'\xef' 进行编码位置 0:序号不在范围内 (128)

发布于 2024-10-19 13:34:50 字数 1944 浏览 5 评论 0原文

我想解析我的 XML 文档。所以我已经存储了我的 XML 文档,如下所示

class XMLdocs(db.Expando):  
   id = db.IntegerProperty()    
   name=db.StringProperty()  
   content=db.BlobProperty()  

现在我的下面是我的代码

parser = make_parser()     
curHandler = BasketBallHandler()  
parser.setContentHandler(curHandler)  
for q in XMLdocs.all():  
        parser.parse(StringIO.StringIO(q.content))

我收到以下错误

'ascii' codec can't encode character u'\xef' in position 0: ordinal not in range(128)
Traceback (most recent call last):  
  File "/base/python_runtime/python_lib/versions/1/google/appengine/ext/webapp/__init__.py", line 517, in __call__
    handler.post(*groups)   
  File "/base/data/home/apps/parsepython/1.348669006354245654/mapreduce/base_handler.py", line 59, in post
    self.handle()   
  File "/base/data/home/apps/parsepython/1.348669006354245654/mapreduce/handlers.py", line 168, in handle
    scan_aborted = not self.process_entity(entity, ctx)   
  File "/base/data/home/apps/parsepython/1.348669006354245654/mapreduce/handlers.py", line 233, in process_entity
    handler(entity)   
  File "/base/data/home/apps/parsepython/1.348669006354245654/parseXML.py", line 71, in process
    parser.parse(StringIO.StringIO(q.content))   
  File "/base/python_runtime/python_dist/lib/python2.5/xml/sax/expatreader.py", line 107, in parse
    xmlreader.IncrementalParser.parse(self, source)   
  File "/base/python_runtime/python_dist/lib/python2.5/xml/sax/xmlreader.py", line 123, in parse
    self.feed(buffer)  
  File "/base/python_runtime/python_dist/lib/python2.5/xml/sax/expatreader.py", line 207, in feed
    self._parser.Parse(data, isFinal)   
  File "/base/data/home/apps/parsepython/1.348669006354245654/parseXML.py", line 136, in characters   
    print ch   
UnicodeEncodeError: 'ascii' codec can't encode character u'\xef' in position 0: ordinal not in range(128)   

I want to parse my XML document. So I have stored my XML document as below

class XMLdocs(db.Expando):  
   id = db.IntegerProperty()    
   name=db.StringProperty()  
   content=db.BlobProperty()  

Now my below is my code

parser = make_parser()     
curHandler = BasketBallHandler()  
parser.setContentHandler(curHandler)  
for q in XMLdocs.all():  
        parser.parse(StringIO.StringIO(q.content))

I am getting below error

'ascii' codec can't encode character u'\xef' in position 0: ordinal not in range(128)
Traceback (most recent call last):  
  File "/base/python_runtime/python_lib/versions/1/google/appengine/ext/webapp/__init__.py", line 517, in __call__
    handler.post(*groups)   
  File "/base/data/home/apps/parsepython/1.348669006354245654/mapreduce/base_handler.py", line 59, in post
    self.handle()   
  File "/base/data/home/apps/parsepython/1.348669006354245654/mapreduce/handlers.py", line 168, in handle
    scan_aborted = not self.process_entity(entity, ctx)   
  File "/base/data/home/apps/parsepython/1.348669006354245654/mapreduce/handlers.py", line 233, in process_entity
    handler(entity)   
  File "/base/data/home/apps/parsepython/1.348669006354245654/parseXML.py", line 71, in process
    parser.parse(StringIO.StringIO(q.content))   
  File "/base/python_runtime/python_dist/lib/python2.5/xml/sax/expatreader.py", line 107, in parse
    xmlreader.IncrementalParser.parse(self, source)   
  File "/base/python_runtime/python_dist/lib/python2.5/xml/sax/xmlreader.py", line 123, in parse
    self.feed(buffer)  
  File "/base/python_runtime/python_dist/lib/python2.5/xml/sax/expatreader.py", line 207, in feed
    self._parser.Parse(data, isFinal)   
  File "/base/data/home/apps/parsepython/1.348669006354245654/parseXML.py", line 136, in characters   
    print ch   
UnicodeEncodeError: 'ascii' codec can't encode character u'\xef' in position 0: ordinal not in range(128)   

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

南渊 2024-10-26 13:34:50

此问题的实际最佳答案取决于您的环境,特别是您的终端期望的编码。

最快的一行解决方案是将您打印的所有内容编码为 ASCII,您的终端几乎肯定会接受它,同时丢弃无法打印的字符:

print ch #fails
print ch.encode('ascii', 'ignore')

更好的解决方案是将终端的编码更改为 utf-8,并将所有内容编码为打印前使用 utf-8。您应该养成每次打印或读取字符串时考虑 unicode 编码的习惯。

The actual best answer for this problem depends on your environment, specifically what encoding your terminal expects.

The quickest one-line solution is to encode everything you print to ASCII, which your terminal is almost certain to accept, while discarding characters that you cannot print:

print ch #fails
print ch.encode('ascii', 'ignore')

The better solution is to change your terminal's encoding to utf-8, and encode everything as utf-8 before printing. You should get in the habit of thinking about your unicode encoding EVERY time you print or read a string.

盗琴音 2024-10-26 13:34:50

您似乎遇到了 UTF-8 字节顺序标记 (BOM)。尝试使用此 unicode 字符串并提取出 BOM:

import codecs

content = unicode(q.content.strip(codecs.BOM_UTF8), 'utf-8')
parser.parse(StringIO.StringIO(content))

我使用 strip 而不是 lstrip 因为在您的情况下,您多次出现 BOM,可能是由于串联的文件内容。

It seems you are hitting a UTF-8 byte order mark (BOM). Try using this unicode string with BOM extracted out:

import codecs

content = unicode(q.content.strip(codecs.BOM_UTF8), 'utf-8')
parser.parse(StringIO.StringIO(content))

I used strip instead of lstrip because in your case you had multiple occurences of BOM, possibly due to concatenated file contents.

半﹌身腐败 2024-10-26 13:34:50

这对我有用:

from django.utils.encoding import smart_str
content = smart_str(content)

This worked for me:

from django.utils.encoding import smart_str
content = smart_str(content)
所谓喜欢 2024-10-26 13:34:50

根据您的回溯,问题在于 parseXML.py 第 136 行的 print 语句。不幸的是,您认为不适合发布这部分代码,但我猜它只是用于调试。如果您将其更改为:

print repr(ch)

那么您至少应该看到您要打印的内容。

The problem according to your traceback is the print statement on line 136 of parseXML.py. Unfortunately you didn't see fit to post that part of your code, but I'm going to guess it is just there for debugging. If you change it to:

print repr(ch)

then you should at least see what you are trying to print.

完美的未来在梦里 2024-10-26 13:34:50

问题是您正在尝试将 unicode 字符打印到可能的非 unicode 终端。在打印之前,您需要使用'replace选项对其进行编码,例如print ch.encode(sys.stdout.encoding, 'replace')

The problem is that you're trying to print an unicode character to a possibly non-unicode terminal. You need to encode it with the 'replace option before printing it, e.g. print ch.encode(sys.stdout.encoding, 'replace').

南冥有猫 2024-10-26 13:34:50

解决此问题的一个简单解决方案是将默认编码设置为 utf8。下面是一个例子

import sys

reload(sys)
sys.setdefaultencoding('utf8')

An easy solution to overcome this problem is to set your default encoding to utf8. Follow is an example

import sys

reload(sys)
sys.setdefaultencoding('utf8')
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文