如何在 Google App Engine 数据存储区中存储非 ASCII 字符

发布于 2024-11-04 11:15:16 字数 2082 浏览 3 评论 0原文

我已经尝试了不少于 5 种不同的“解决方案”,但我无法让它工作,请帮忙。

这是错误

  'ascii' codec can't decode byte 0xc3 in position 1: ordinal not in range(128)
  Traceback (most recent call last):
  File "/base/python_runtime/python_lib/versions/1/google/appengine/ext/webapp/__init__.py", line 636, in __call__
    handler.post(*groups)
  File "/base/data/home/apps/elmovieplace/1.350096827241428223/script/pftv.py", line 114, in post
    movie.put()
  File "/base/python_runtime/python_lib/versions/1/google/appengine/ext/db/__init__.py", line 984, in put
    return datastore.Put(self._entity, config=config)
  File "/base/python_runtime/python_lib/versions/1/google/appengine/api/datastore.py", line 455, in Put
    return _GetConnection().async_put(config, entities, extra_hook).get_result()
  File "/base/python_runtime/python_lib/versions/1/google/appengine/datastore/datastore_rpc.py", line 1219, in async_put
    for pbs in pbsgen:
  File "/base/python_runtime/python_lib/versions/1/google/appengine/datastore/datastore_rpc.py", line 1070, in __generate_pb_lists
    pb = value_to_pb(value)
  File "/base/python_runtime/python_lib/versions/1/google/appengine/api/datastore.py", line 239, in entity_to_pb
    return entity._ToPb()
  File "/base/python_runtime/python_lib/versions/1/google/appengine/api/datastore.py", line 841, in _ToPb
    properties = datastore_types.ToPropertyPb(name, values)
  File "/base/python_runtime/python_lib/versions/1/google/appengine/api/datastore_types.py", line 1672, in ToPropertyPb
    pbvalue = pack_prop(name, v, pb.mutable_value())
  File "/base/python_runtime/python_lib/versions/1/google/appengine/api/datastore_types.py", line 1485, in PackString
    pbvalue.set_stringvalue(unicode(value).encode('utf-8'))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1: ordinal not in range(128)

这是给我带来问题的代码部分。

if imdbValues[5] == 'N/A':
    movie.diector = ''
else:
    movie.director = imdbValues[5]

...

movie.put()

在本例中,imdbValues[5] 等于 Claudio Fàh

I've tried no less then 5 different "solutions" and i cant get it to work, please help.

This is the error

  'ascii' codec can't decode byte 0xc3 in position 1: ordinal not in range(128)
  Traceback (most recent call last):
  File "/base/python_runtime/python_lib/versions/1/google/appengine/ext/webapp/__init__.py", line 636, in __call__
    handler.post(*groups)
  File "/base/data/home/apps/elmovieplace/1.350096827241428223/script/pftv.py", line 114, in post
    movie.put()
  File "/base/python_runtime/python_lib/versions/1/google/appengine/ext/db/__init__.py", line 984, in put
    return datastore.Put(self._entity, config=config)
  File "/base/python_runtime/python_lib/versions/1/google/appengine/api/datastore.py", line 455, in Put
    return _GetConnection().async_put(config, entities, extra_hook).get_result()
  File "/base/python_runtime/python_lib/versions/1/google/appengine/datastore/datastore_rpc.py", line 1219, in async_put
    for pbs in pbsgen:
  File "/base/python_runtime/python_lib/versions/1/google/appengine/datastore/datastore_rpc.py", line 1070, in __generate_pb_lists
    pb = value_to_pb(value)
  File "/base/python_runtime/python_lib/versions/1/google/appengine/api/datastore.py", line 239, in entity_to_pb
    return entity._ToPb()
  File "/base/python_runtime/python_lib/versions/1/google/appengine/api/datastore.py", line 841, in _ToPb
    properties = datastore_types.ToPropertyPb(name, values)
  File "/base/python_runtime/python_lib/versions/1/google/appengine/api/datastore_types.py", line 1672, in ToPropertyPb
    pbvalue = pack_prop(name, v, pb.mutable_value())
  File "/base/python_runtime/python_lib/versions/1/google/appengine/api/datastore_types.py", line 1485, in PackString
    pbvalue.set_stringvalue(unicode(value).encode('utf-8'))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1: ordinal not in range(128)

This is the part of the code that's giving me problems.

if imdbValues[5] == 'N/A':
    movie.diector = ''
else:
    movie.director = imdbValues[5]

...

movie.put()

In this case imdbValues[5] is equal to Claudio Fäh

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

望喜 2024-11-11 11:15:16

这行代码引发了异常:

pbvalue.set_stringvalue(unicode(value).encode('utf-8'))

当您将一个值传递给 movie.director 时,该值首先会转换为 unicode:

unicode(value)

然后使用 encode('utf-8 ')

unicode() 函数通常使用 ASCII 作为默认解码编码;这意味着您只传递这些类型的值是安全的:

  1. unicode 字符串
  2. 8 位字符串

您的代码可能正在传递带有某种编码的字节字符串,而 unicode(value) 无法以 ASCII 进行解码。

建议:
如果您正在处理字节字符串,您必须知道它们的编码,否则您的程序将遇到这种编码/解码问题。

如何修复它:
发现您正在处理的字节字符串中使用的编码(utf-8?)并将它们转换为 unicode 字符串。
例如,如果 imdbValues 是一些奇特的 Imdb python 库 返回的列表,其中包含 utf-8 编码的字节字符串,则应使用以下方法转换它们:

 movie.director = imdbValues[5].decode('utf-8')

The exception is raised by this line of code:

pbvalue.set_stringvalue(unicode(value).encode('utf-8'))

When you pass a value to movie.director , that value is first converted in unicode with:

unicode(value)

then it is encoded with encode('utf-8').

The unicode() function tipically uses ASCII as default decode encoding; it means that you are safe only passing these kind of values:

  1. A unicode string
  2. A 8 bit string

Your code is probably passing a byte string with some encoding that the unicode(value) fails to decode in ASCII.

Recommendation:
if you are dealing with byte strings, you MUST know their encoding or your program will suffer this kind of encoding/decoding problem.

How to fix it:
discover the encoding used in the byte strings you are dealing with (utf-8?) and convert them in unicode strings.
If, for example, imdbValues is a list returned by some fancy Imdb python libraries that contains utf-8 encoded byte strings, you should convert them using:

 movie.director = imdbValues[5].decode('utf-8')
你的往事 2024-11-11 11:15:16

您应该开始使用 unicode 作为您的文本数据。

无论您从何处获取数据,它们都是编码为字节的 Unicode 字符。编码可以是 UTF-8UTF-16Windows-1252ISO-8859-1 code> 或许多其他编码。如果数据存在于您的系统上,您就知道编码。如果它们来自网页,则编码包含在响应标头中,并且通常包含在页面的开头。使用该编码,.decode 为非常有用的 unicode Python 对象,并在代码中使用它。

对输入进行解码,对输出进行编码(如有必要)。在通过 App Engine 使用数据之前无需进行编码。

PS 这个 Unicode 相关问题的答案可能会有所帮助。

You should start using unicode for your textual data.

Wherever you get your data, they are Unicode characters encoded as bytes. The encoding could be UTF-8, or UTF-16, or Windows-1252, or ISO-8859-1 or many other encodings. If the data exist on your system, you know the encoding. If they came from a web page, the encoding is included in the response headers, and often in the beginning of the page. Using that encoding, .decode to the very useful unicode Python object and use that in your code.

Decode on input, encode (if necessary) on output. It's not necessary to encode before using the data with App Engine.

PS that answer in a Unicode-related question might be of help.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文