UnicodeEncodeError:“ascii”编解码器无法对字符 u'\xef' 进行编码位置 0:序号不在范围内 (128)
我想解析我的 XML 文档。所以我已经存储了我的 XML 文档,如下所示
class XMLdocs(db.Expando):
id = db.IntegerProperty()
name=db.StringProperty()
content=db.BlobProperty()
现在我的下面是我的代码
parser = make_parser()
curHandler = BasketBallHandler()
parser.setContentHandler(curHandler)
for q in XMLdocs.all():
parser.parse(StringIO.StringIO(q.content))
我收到以下错误
'ascii' codec can't encode character u'\xef' in position 0: ordinal not in range(128)
Traceback (most recent call last):
File "/base/python_runtime/python_lib/versions/1/google/appengine/ext/webapp/__init__.py", line 517, in __call__
handler.post(*groups)
File "/base/data/home/apps/parsepython/1.348669006354245654/mapreduce/base_handler.py", line 59, in post
self.handle()
File "/base/data/home/apps/parsepython/1.348669006354245654/mapreduce/handlers.py", line 168, in handle
scan_aborted = not self.process_entity(entity, ctx)
File "/base/data/home/apps/parsepython/1.348669006354245654/mapreduce/handlers.py", line 233, in process_entity
handler(entity)
File "/base/data/home/apps/parsepython/1.348669006354245654/parseXML.py", line 71, in process
parser.parse(StringIO.StringIO(q.content))
File "/base/python_runtime/python_dist/lib/python2.5/xml/sax/expatreader.py", line 107, in parse
xmlreader.IncrementalParser.parse(self, source)
File "/base/python_runtime/python_dist/lib/python2.5/xml/sax/xmlreader.py", line 123, in parse
self.feed(buffer)
File "/base/python_runtime/python_dist/lib/python2.5/xml/sax/expatreader.py", line 207, in feed
self._parser.Parse(data, isFinal)
File "/base/data/home/apps/parsepython/1.348669006354245654/parseXML.py", line 136, in characters
print ch
UnicodeEncodeError: 'ascii' codec can't encode character u'\xef' in position 0: ordinal not in range(128)
I want to parse my XML document. So I have stored my XML document as below
class XMLdocs(db.Expando):
id = db.IntegerProperty()
name=db.StringProperty()
content=db.BlobProperty()
Now my below is my code
parser = make_parser()
curHandler = BasketBallHandler()
parser.setContentHandler(curHandler)
for q in XMLdocs.all():
parser.parse(StringIO.StringIO(q.content))
I am getting below error
'ascii' codec can't encode character u'\xef' in position 0: ordinal not in range(128)
Traceback (most recent call last):
File "/base/python_runtime/python_lib/versions/1/google/appengine/ext/webapp/__init__.py", line 517, in __call__
handler.post(*groups)
File "/base/data/home/apps/parsepython/1.348669006354245654/mapreduce/base_handler.py", line 59, in post
self.handle()
File "/base/data/home/apps/parsepython/1.348669006354245654/mapreduce/handlers.py", line 168, in handle
scan_aborted = not self.process_entity(entity, ctx)
File "/base/data/home/apps/parsepython/1.348669006354245654/mapreduce/handlers.py", line 233, in process_entity
handler(entity)
File "/base/data/home/apps/parsepython/1.348669006354245654/parseXML.py", line 71, in process
parser.parse(StringIO.StringIO(q.content))
File "/base/python_runtime/python_dist/lib/python2.5/xml/sax/expatreader.py", line 107, in parse
xmlreader.IncrementalParser.parse(self, source)
File "/base/python_runtime/python_dist/lib/python2.5/xml/sax/xmlreader.py", line 123, in parse
self.feed(buffer)
File "/base/python_runtime/python_dist/lib/python2.5/xml/sax/expatreader.py", line 207, in feed
self._parser.Parse(data, isFinal)
File "/base/data/home/apps/parsepython/1.348669006354245654/parseXML.py", line 136, in characters
print ch
UnicodeEncodeError: 'ascii' codec can't encode character u'\xef' in position 0: ordinal not in range(128)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
此问题的实际最佳答案取决于您的环境,特别是您的终端期望的编码。
最快的一行解决方案是将您打印的所有内容编码为 ASCII,您的终端几乎肯定会接受它,同时丢弃无法打印的字符:
更好的解决方案是将终端的编码更改为 utf-8,并将所有内容编码为打印前使用 utf-8。您应该养成每次打印或读取字符串时考虑 unicode 编码的习惯。
The actual best answer for this problem depends on your environment, specifically what encoding your terminal expects.
The quickest one-line solution is to encode everything you print to ASCII, which your terminal is almost certain to accept, while discarding characters that you cannot print:
The better solution is to change your terminal's encoding to utf-8, and encode everything as utf-8 before printing. You should get in the habit of thinking about your unicode encoding EVERY time you print or read a string.
您似乎遇到了 UTF-8 字节顺序标记 (BOM)。尝试使用此 unicode 字符串并提取出 BOM:
我使用
strip
而不是lstrip
因为在您的情况下,您多次出现 BOM,可能是由于串联的文件内容。It seems you are hitting a UTF-8 byte order mark (BOM). Try using this unicode string with BOM extracted out:
I used
strip
instead oflstrip
because in your case you had multiple occurences of BOM, possibly due to concatenated file contents.这对我有用:
This worked for me:
根据您的回溯,问题在于
parseXML.py
第 136 行的print
语句。不幸的是,您认为不适合发布这部分代码,但我猜它只是用于调试。如果您将其更改为:那么您至少应该看到您要打印的内容。
The problem according to your traceback is the
print
statement on line 136 ofparseXML.py
. Unfortunately you didn't see fit to post that part of your code, but I'm going to guess it is just there for debugging. If you change it to:then you should at least see what you are trying to print.
问题是您正在尝试将 unicode 字符打印到可能的非 unicode 终端。在打印之前,您需要使用
'replace
选项对其进行编码,例如print ch.encode(sys.stdout.encoding, 'replace')
。The problem is that you're trying to print an unicode character to a possibly non-unicode terminal. You need to encode it with the
'replace
option before printing it, e.g.print ch.encode(sys.stdout.encoding, 'replace')
.解决此问题的一个简单解决方案是将默认编码设置为 utf8。下面是一个例子
An easy solution to overcome this problem is to set your default encoding to utf8. Follow is an example