子类化 db.TextProperty 以将 python dict 存储为 JSON 并将默认编码设置为除 ASCII 之外的任何内容

发布于 2024-11-04 03:38:28 字数 2849 浏览 3 评论 0原文

使用 Google App Engine(python SDK),我创建了一个自定义 JSONProperty() 作为 db.TextProperty() 的子类。我的目标是将 python 字典动态存储为 JSON 并轻松检索它。我按照通过 Google 找到的各种示例进行操作,设置自定义 Property 类和方法非常简单。

但是,我的一些字典值(字符串)是用 utf-8 编码的。将模型保存到数据存储中时,我收到可怕的 Unicode 错误(数据存储文本属性的默认编码为 ASCII)。子类化 db.BlobProperty 并没有解决问题。

基本上,我的代码执行以下操作:将资源实体存储到数据存储中(将 URL 作为 StringProperty 并将 POST/GET 有效负载作为 JSONProperty 存储在字典中),稍后获取它们(不包括代码)。我选择不使用 pickle 来存储有效负载,因为我是一个 JSON 怪胎,没有用存储对象。

自定义 JSONProperty :

class JSONProperty(db.TextProperty):
    def get_value_for_datastore(self, model_instance):
        value = super(JSONProperty, self).get_value_for_datastore(model_instance)
        return json.dumps(value)

    def make_value_from_datastore(self, value):
        if value is None:
            return {}
        if isinstance(value, basestring):
            return json.loads(value)
        return value

将模型放入数据存储区:

res = Resource()
res.init_payloads()
res.url = "http://www.somesite.com/someform/"
res.param = { 'name': "SomeField", 'default': u"éàôfoobarç" }
res.put()

这将引发与 ASCII 编码相关的 UnicodeDecodeError。也许值得注意的是,我只在生产服务器上(每次)收到此错误。我在开发中使用 python 2.5.2。

回溯(最近一次调用最后一次): 文件“/base/data/home/apps/delpythian/1.350065314722833389/core/handlers/ResetHandler.py”,第 68 行,在 _res_one 中 返回 res_one.put() 文件“/base/python_runtime/python_lib/versions/1/google/appengine/ext/db/init.py”,第 984 行,输入 返回 datastore.Put(self._entity, config=config) 文件“/base/python_runtime/python_lib/versions/1/google/appengine/api/datastore.py”,第 455 行,位于 Put 中 返回 _GetConnection().async_put(config,Entity,extra_hook).get_result() 文件“/base/python_runtime/python_lib/versions/1/google/appengine/datastore/datastore_rpc.py”,第 1219 行,位于 async_put 中 对于 pbsgen 中的 pbs: 文件“/base/python_runtime/python_lib/versions/1/google/appengine/datastore/datastore_rpc.py”,第 1070 行,位于 __generate_pb_lists 中 pb = value_to_pb(值) 文件“/base/python_runtime/python_lib/versions/1/google/appengine/api/datastore.py”,第 239 行,entity_to_pb 返回实体._ToPb() 文件“/base/python_runtime/python_lib/versions/1/google/appengine/api/datastore.py”,第 841 行,位于 _ToPb 中 属性 = datastore_types.ToPropertyPb(名称,值) ToPropertyPb 中的文件“/base/python_runtime/python_lib/versions/1/google/appengine/api/datastore_types.py”,第 1672 行 pbvalue = pack_prop(名称, v, pb.mutable_value()) 文件“/base/python_runtime/python_lib/versions/1/google/appengine/api/datastore_types.py”,第 1485 行,位于 PackString 中 pbvalue.set_stringvalue(unicode(value).encode('utf-8')) UnicodeDecodeError:“ascii”编解码器无法解码位置 32 中的字节 0xc3:序数不在范围内(128)

我的问题如下:有没有办法子类化 db.TextProperty() 类并设置/强制执行自定义编码?或者我做错了什么?我尽量避免使用 str() 并遵循“尽早解码,到处使用 Unicode,晚编码”规则。

更新:添加了代码和堆栈跟踪。

Using Google App Engine (python SDK), I created a custom JSONProperty() as a subclass of db.TextProperty(). My goal is to store a python dict on the fly as JSON and retrieve it easily. I followed various examples found via Google and setting up the custom Property class and methods is pretty easy.

However, some of my dict values (strings) are encoded in utf-8. When saving the model into the datastore, I get a dreaded Unicode error (for datastore text property default encoding is ASCII). Subclassing db.BlobProperty didn't solve the issue.

Basically, my code does the following thing : store Resource entities into the datastore (with URL as a StringProperty and POST/GET payloads stored in a dict as a JSONProperty), fetch them later (code not included). I choose not to use pickle for storing payloads because I'm a JSON freak and have no use storing objects.

Custom JSONProperty :

class JSONProperty(db.TextProperty):
    def get_value_for_datastore(self, model_instance):
        value = super(JSONProperty, self).get_value_for_datastore(model_instance)
        return json.dumps(value)

    def make_value_from_datastore(self, value):
        if value is None:
            return {}
        if isinstance(value, basestring):
            return json.loads(value)
        return value

Putting model into datastore :

res = Resource()
res.init_payloads()
res.url = "http://www.somesite.com/someform/"
res.param = { 'name': "SomeField", 'default': u"éàôfoobarç" }
res.put()

This will throw a UnicodeDecodeError related to ASCII encoding. Maybe it's worth noting that I only get this error (everytime) on production server. I'm using python 2.5.2 on dev.

Traceback (most recent call last):
File "/base/data/home/apps/delpythian/1.350065314722833389/core/handlers/ResetHandler.py", line 68, in _res_one
return res_one.put()
File "/base/python_runtime/python_lib/versions/1/google/appengine/ext/db/init.py", line 984, in put
return datastore.Put(self._entity, config=config)
File "/base/python_runtime/python_lib/versions/1/google/appengine/api/datastore.py", line 455, in Put
return _GetConnection().async_put(config, entities, extra_hook).get_result()
File "/base/python_runtime/python_lib/versions/1/google/appengine/datastore/datastore_rpc.py", line 1219, in async_put
for pbs in pbsgen:
File "/base/python_runtime/python_lib/versions/1/google/appengine/datastore/datastore_rpc.py", line 1070, in __generate_pb_lists
pb = value_to_pb(value)
File "/base/python_runtime/python_lib/versions/1/google/appengine/api/datastore.py", line 239, in entity_to_pb
return entity._ToPb()
File "/base/python_runtime/python_lib/versions/1/google/appengine/api/datastore.py", line 841, in _ToPb
properties = datastore_types.ToPropertyPb(name, values)
File "/base/python_runtime/python_lib/versions/1/google/appengine/api/datastore_types.py", line 1672, in ToPropertyPb
pbvalue = pack_prop(name, v, pb.mutable_value())
File "/base/python_runtime/python_lib/versions/1/google/appengine/api/datastore_types.py", line 1485, in PackString
pbvalue.set_stringvalue(unicode(value).encode('utf-8'))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 32: ordinal not in range(128)

My question is the following : is there a way to subclass a db.TextProperty() class and set/enforce a custom encoding ? Or am I doing something wrong ? I try to avoid using str() and follow the "Decode early, Unicode everywhere, encode late" rule.

Update : added code and stacktrace.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

薄荷→糖丶微凉 2024-11-11 03:38:28

下面是一个将 unicode 字符串从字典移动到序列化 JSON 字符串到 TextProperty 的最小示例:

class Thing(db.Model):
  json = db.TextProperty()

class MainHandler(webapp.RequestHandler):
  def get(self):
    data = {'word': u"r\xe9sum\xe9"}
    json = simplejson.dumps(data, ensure_ascii=False)
    Thing(json=json).put()

这对我在开发和生产中都有效。

Here's a minimal example of moving a unicode string from a dictionary to a serialized JSON string to a TextProperty:

class Thing(db.Model):
  json = db.TextProperty()

class MainHandler(webapp.RequestHandler):
  def get(self):
    data = {'word': u"r\xe9sum\xe9"}
    json = simplejson.dumps(data, ensure_ascii=False)
    Thing(json=json).put()

This works for me in both dev and prod.

巷子口的你 2024-11-11 03:38:28

查看该行:
PackString pbvalue.set_stringvalue(unicode(value).encode('utf-8')) UnicodeDecodeError: 'ascii'

看来 appengine 期望所有字符串值都是 unicode。调用 unicode(value) 没有指定编码,因此可能默认为 ascii,除非 value 已经是 unicode,例如:

>>> u = u"ąęćźż"
>>> s = u.encode('utf-8')
>>> unicode(u) # fine
>>> unicode(s, 'utf-8') # fine
>>> unicode(s) # blows up (try's ascii) (on my interpreter)

json.dumps 将编码 utf-8 字符串(默认情况下),这就是 unicode 无法处理它的原因。

试试这个:

>>> return unicode(json.dumps(...), 'utf-8')

你应该没问题。

至于为什么 appengine 会崩溃而你的解释器却很好,我的猜测是一些本地的
设置中,unicode 的文档字符串表示它默认为当前的默认编码,对于您来说显然是 utf-8,对于 gae 来说是 ascii。

Looking at the line:
PackString pbvalue.set_stringvalue(unicode(value).encode('utf-8')) UnicodeDecodeError: 'ascii'

it seems that appengine expects all string values to be unicode. the call unicode(value) doesn't specify an encoding so will probably default to ascii unless value is already a unicode, eg:

>>> u = u"ąęćźż"
>>> s = u.encode('utf-8')
>>> unicode(u) # fine
>>> unicode(s, 'utf-8') # fine
>>> unicode(s) # blows up (try's ascii) (on my interpreter)

json.dumps will encode a utf-8 string (by default) and that's why unicode can't handle it.

try this:

>>> return unicode(json.dumps(...), 'utf-8')

and you should be fine.

As for why appengine blows up and your interpreter is fine, my guess would be some local
settings, docstring for unicode says it defaults to the current default encoding, which aparently is utf-8 for you and ascii for gae.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文