如何 JSON 序列化集合？

发布于 2024-12-17 09:02:20 字数 1562 浏览 1 评论 0原文

我有一个 Python set ，其中包含具有 __hash__ 和 __eq__ 方法的对象，以确保集合中不包含重复项。

我需要对这个结果 set 进行 json 编码，但是即使将空的 set 传递给 json.dumps 方法也会引发 TypeError.

  File "/usr/lib/python2.7/json/encoder.py", line 201, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/usr/lib/python2.7/json/encoder.py", line 264, in iterencode
    return _iterencode(o, 0)
  File "/usr/lib/python2.7/json/encoder.py", line 178, in default
    raise TypeError(repr(o) + " is not JSON serializable")
TypeError: set([]) is not JSON serializable

我知道我可以创建一个具有自定义 default 方法的 json.JSONEncoder 类的扩展，但我什至不确定从哪里开始通过 进行转换>设置。我应该根据默认方法中的 set 值创建一个字典，然后返回该字典的编码吗？理想情况下，我想让默认方法能够处理原始编码器阻塞的所有数据类型（我使用 Mongo 作为数据源，因此日期似乎也会引发此错误）

任何正确方向的提示都是赞赏。

编辑：

感谢您的回答！也许我应该更准确一些。

我利用（并赞成）此处的答案来解决正在翻译的 set 的限制，但内部密钥也是一个问题。

set 中的对象是转换为 __dict__ 的复杂对象，但它们本身也可以包含其属性值，这些值可能不适合 json 编码器中的基本类型。

这个集合中有很多不同的类型，散列基本上计算实体的唯一ID，但本着NoSQL的真正精神，无法确切地知道子对象包含什么。

一个对象可能包含开始的日期值，而另一个对象可能具有一些其他模式，其中不包含包含“非原始”对象的键。

这就是为什么我能想到的唯一解决方案是扩展 JSONEncoder 来替换 default 方法以打开不同的情况 - 但我不知道如何去做这和文档是不明确的。在嵌套对象中，从 default 返回的值是按键返回的，还是只是查看整个对象的通用包含/丢弃？该方法如何适应嵌套值？我浏览了之前的问题，似乎找不到特定情况编码的最佳方法（不幸的是，这似乎是我需要在这里做的）。

原文

I have a Python set that contains objects with __hash__ and __eq__ methods in order to make certain no duplicates are included in the collection.

I need to json encode this result set, but passing even an empty set to the json.dumps method raises a TypeError.

  File "/usr/lib/python2.7/json/encoder.py", line 201, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/usr/lib/python2.7/json/encoder.py", line 264, in iterencode
    return _iterencode(o, 0)
  File "/usr/lib/python2.7/json/encoder.py", line 178, in default
    raise TypeError(repr(o) + " is not JSON serializable")
TypeError: set([]) is not JSON serializable

I know I can create an extension to the json.JSONEncoder class that has a custom default method, but I'm not even sure where to begin in converting over the set. Should I create a dictionary out of the set values within the default method, and then return the encoding on that? Ideally, I'd like to make the default method able to handle all the datatypes that the original encoder chokes on (I'm using Mongo as a data source so dates seem to raise this error too)

Any hint in the right direction would be appreciated.

EDIT:

Thanks for the answer! Perhaps I should have been more precise.

I utilized (and upvoted) the answers here to get around the limitations of the set being translated, but there are internal keys that are an issue as well.

The objects in the set are complex objects that translate to __dict__, but they themselves can also contain values for their properties that could be ineligible for the basic types in the json encoder.

There's a lot of different types coming into this set, and the hash basically calculates a unique id for the entity, but in the true spirit of NoSQL there's no telling exactly what the child object contains.

One object might contain a date value for starts, whereas another may have some other schema that includes no keys containing "non-primitive" objects.

That is why the only solution I could think of was to extend the JSONEncoder to replace the default method to turn on different cases - but I'm not sure how to go about this and the documentation is ambiguous. In nested objects, does the value returned from default go by key, or is it just a generic include/discard that looks at the whole object? How does that method accommodate nested values? I've looked through previous questions and can't seem to find the best approach to case-specific encoding (which unfortunately seems like what I'm going to need to do here).

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

睡美人的小仙女 2024-12-24 09:02:20

您可以创建一个自定义编码器，当遇到集合时，该编码器会返回列表。这是一个示例：

import json
class SetEncoder(json.JSONEncoder):
    def default(self, obj):
        if isinstance(obj, set):
            return list(obj)
        return json.JSONEncoder.default(self, obj)

data_str = json.dumps(set([1,2,3,4,5]), cls=SetEncoder)
print(data_str)
# Output: '[1, 2, 3, 4, 5]'

您也可以通过这种方式检测其他类型。如果您需要保留列表实际上是一个集合，则可以使用自定义编码。像 return {'type':'set', 'list':list(obj)} 之类的东西可能会起作用。

为了说明嵌套类型，请考虑对其进行序列化：

class Something(object):
    pass
json.dumps(set([1,2,3,4,5,Something()]), cls=SetEncoder)

这会引发以下错误：

TypeError: <__main__.Something object at 0x1691c50> is not JSON serializable

这表明编码器将获取返回的 list 结果，并在其子级上递归调用序列化器。要为多种类型添加自定义序列化器，您可以执行以下操作：

class SetEncoder(json.JSONEncoder):
    def default(self, obj):
        if isinstance(obj, set):
            return list(obj)
        if isinstance(obj, Something):
            return 'CustomSomethingRepresentation'
        return json.JSONEncoder.default(self, obj)
 
data_str = json.dumps(set([1,2,3,4,5,Something()]), cls=SetEncoder)
print(data_str)
# Output: '[1, 2, 3, 4, 5, "CustomSomethingRepresentation"]'

You can create a custom encoder that returns a list when it encounters a set. Here's an example:

import json
class SetEncoder(json.JSONEncoder):
    def default(self, obj):
        if isinstance(obj, set):
            return list(obj)
        return json.JSONEncoder.default(self, obj)

data_str = json.dumps(set([1,2,3,4,5]), cls=SetEncoder)
print(data_str)
# Output: '[1, 2, 3, 4, 5]'

You can detect other types this way too. If you need to retain that the list was actually a set, you could use a custom encoding. Something like return {'type':'set', 'list':list(obj)} might work.

To illustrate nested types, consider serializing this:

class Something(object):
    pass
json.dumps(set([1,2,3,4,5,Something()]), cls=SetEncoder)

This raises the following error:

TypeError: <__main__.Something object at 0x1691c50> is not JSON serializable

This indicates that the encoder will take the list result returned and recursively call the serializer on its children. To add a custom serializer for multiple types, you can do this:

class SetEncoder(json.JSONEncoder):
    def default(self, obj):
        if isinstance(obj, set):
            return list(obj)
        if isinstance(obj, Something):
            return 'CustomSomethingRepresentation'
        return json.JSONEncoder.default(self, obj)
 
data_str = json.dumps(set([1,2,3,4,5,Something()]), cls=SetEncoder)
print(data_str)
# Output: '[1, 2, 3, 4, 5, "CustomSomethingRepresentation"]'

回复收藏 0 原文

北城半夏 2024-12-24 09:02:20

JSON 表示法只有少数本机数据类型（对象、数组、字符串、数字、布尔值和 null），因此，任何以 JSON 形式序列化的内容都需要表示为这些类型之一。

如 json 模块文档所示，此转换可以通过 JSONEncoder 和 JSONDecoder，但是这样你就会放弃一些你可能需要的其他结构（如果你将集合转换为列表，那么你就失去了恢复常规列表的能力；如果你使用将集合转换为字典dict.fromkeys(s) 然后你就失去了恢复字典的能力）。

更复杂的解决方案是构建可以与其他本机 JSON 类型共存的自定义类型。这使您可以存储包括列表、集合、字典、小数、日期时间对象等的嵌套结构：

from json import dumps, loads, JSONEncoder, JSONDecoder
import pickle

class PythonObjectEncoder(JSONEncoder):
    def default(self, obj):
        try:
            return {'_python_object': pickle.dumps(obj).decode('latin-1')}
        except pickle.PickleError:
            return super().default(obj)

def as_python_object(dct):
    if '_python_object' in dct:
        return pickle.loads(dct['_python_object'].encode('latin-1'))
    return dct

这是一个示例会话，显示它可以处理列表、字典和集合：

>>> data = [1,2,3, set(['knights', 'who', 'say', 'ni']), {'key':'value'}, Decimal('3.14')]

>>> j = dumps(data, cls=PythonObjectEncoder)

>>> loads(j, object_hook=as_python_object)
[1, 2, 3, set(['knights', 'say', 'who', 'ni']), {'key': 'value'}, Decimal('3.14')]

或者，使用更通用的方法可能会很有用目的序列化技术，例如 YAML、Twisted Jelly，或 Python 的 pickle 模块。它们都支持更广泛的数据类型。

JSON notation has only a handful of native datatypes (objects, arrays, strings, numbers, booleans, and null), so anything serialized in JSON needs to be expressed as one of these types.

As shown in the json module docs, this conversion can be done automatically by a JSONEncoder and JSONDecoder, but then you would be giving up some other structure you might need (if you convert sets to a list, then you lose the ability to recover regular lists; if you convert sets to a dictionary using dict.fromkeys(s) then you lose the ability to recover dictionaries).

A more sophisticated solution is to build-out a custom type that can coexist with other native JSON types. This lets you store nested structures that include lists, sets, dicts, decimals, datetime objects, etc.:

from json import dumps, loads, JSONEncoder, JSONDecoder
import pickle

class PythonObjectEncoder(JSONEncoder):
    def default(self, obj):
        try:
            return {'_python_object': pickle.dumps(obj).decode('latin-1')}
        except pickle.PickleError:
            return super().default(obj)

def as_python_object(dct):
    if '_python_object' in dct:
        return pickle.loads(dct['_python_object'].encode('latin-1'))
    return dct

Here is a sample session showing that it can handle lists, dicts, and sets:

>>> data = [1,2,3, set(['knights', 'who', 'say', 'ni']), {'key':'value'}, Decimal('3.14')]

>>> j = dumps(data, cls=PythonObjectEncoder)

>>> loads(j, object_hook=as_python_object)
[1, 2, 3, set(['knights', 'say', 'who', 'ni']), {'key': 'value'}, Decimal('3.14')]

Alternatively, it may be useful to use a more general purpose serialization technique such as YAML, Twisted Jelly, or Python's pickle module. These each support a much greater range of datatypes.

回复收藏 0 原文

一百个冬季 2024-12-24 09:02:20

您不需要创建自定义编码器类来提供 default 方法 - 它可以作为关键字参数传入：

import json

def serialize_sets(obj):
    if isinstance(obj, set):
        return list(obj)

    return obj

json_str = json.dumps(set([1,2,3]), default=serialize_sets)
print(json_str)

结果为 [1, 2, 3]在所有受支持的 Python 版本中。

You don't need to make a custom encoder class to supply the default method - it can be passed in as a keyword argument:

import json

def serialize_sets(obj):
    if isinstance(obj, set):
        return list(obj)

    return obj

json_str = json.dumps(set([1,2,3]), default=serialize_sets)
print(json_str)

results in [1, 2, 3] in all supported Python versions.

回复收藏 0 原文

凯凯我们等你回来 2024-12-24 09:02:20

如果您确定唯一的不可序列化数据将是 set，那么有一个非常简单（且肮脏）的解决方案：

json.dumps({"Hello World": {1, 2}}, default=tuple)

只有不可序列化数据将使用 给出的函数进行处理默认，因此只有set会被转换为tuple。

If you know for sure that the only non-serializable data will be sets, there's a very simple (and dirty) solution:

json.dumps({"Hello World": {1, 2}}, default=tuple)

Only non-serializable data will be treated with the function given as default, so only the set will be converted to a tuple.

回复收藏 0 原文

遥远的绿洲 2024-12-24 09:02:20

我将 Raymond Hettinger 的解决方案改编为 python 3。

以下是更改的内容：

unicode 消失
更新了通过 super() 调用父级的 default
，使用 base64 将 bytes 类型序列化为str （因为python 3中的bytes似乎无法转换为JSON）

from decimal import Decimal
from base64 import b64encode, b64decode
from json import dumps, loads, JSONEncoder
import pickle

class PythonObjectEncoder(JSONEncoder):
    def default(self, obj):
        if isinstance(obj, (list, dict, str, int, float, bool, type(None))):
            return super().default(obj)
        return {'_python_object': b64encode(pickle.dumps(obj)).decode('utf-8')}

def as_python_object(dct):
    if '_python_object' in dct:
        return pickle.loads(b64decode(dct['_python_object'].encode('utf-8')))
    return dct

data = [1,2,3, set(['knights', 'who', 'say', 'ni']), {'key':'value'}, Decimal('3.14')]
j = dumps(data, cls=PythonObjectEncoder)
print(loads(j, object_hook=as_python_object))
# prints: [1, 2, 3, {'knights', 'who', 'say', 'ni'}, {'key': 'value'}, Decimal('3.14')]

I adapted Raymond Hettinger's solution to python 3.

Here is what has changed:

unicode disappeared
updated the call to the parents' default with super()
using base64 to serialize the bytes type into str (because it seems that bytes in python 3 can't be converted to JSON)

from decimal import Decimal
from base64 import b64encode, b64decode
from json import dumps, loads, JSONEncoder
import pickle

class PythonObjectEncoder(JSONEncoder):
    def default(self, obj):
        if isinstance(obj, (list, dict, str, int, float, bool, type(None))):
            return super().default(obj)
        return {'_python_object': b64encode(pickle.dumps(obj)).decode('utf-8')}

def as_python_object(dct):
    if '_python_object' in dct:
        return pickle.loads(b64decode(dct['_python_object'].encode('utf-8')))
    return dct

data = [1,2,3, set(['knights', 'who', 'say', 'ni']), {'key':'value'}, Decimal('3.14')]
j = dumps(data, cls=PythonObjectEncoder)
print(loads(j, object_hook=as_python_object))
# prints: [1, 2, 3, {'knights', 'who', 'say', 'ni'}, {'key': 'value'}, Decimal('3.14')]

回复收藏 0 原文

最近可好 2024-12-24 09:02:20

如果您只需要快速转储并且不想实现自定义编码器。您可以使用以下内容：

json_string = json.dumps(data, iterable_as_array=True)

这会将所有集合（和其他可迭代对象）转换为数组。请注意，当您解析回 JSON 时，这些字段将保留为数组。如果你想保留类型，你需要编写自定义编码器。

另请确保已安装并需要 simplejson。
您可以在 PyPi 上找到它。

If you need just quick dump and don't want to implement custom encoder. You can use the following:

json_string = json.dumps(data, iterable_as_array=True)

This will convert all sets (and other iterables) into arrays. Just beware that those fields will stay arrays when you parse the JSON back. If you want to preserve the types, you need to write custom encoder.

Also make sure to have simplejson installed and required.
You can find it on PyPi.

回复收藏 0 原文

吐个泡泡 2024-12-24 09:02:20

JSON 中仅提供字典、列表和原始对象类型（int、string、bool）。

回复收藏 0 原文

倾城花音 2024-12-24 09:02:20

@AnttiHaapala 的缩短版本：

json.dumps(dict_with_sets, default=lambda x: list(x) if isinstance(x, set) else x)

Shortened version of @AnttiHaapala:

json.dumps(dict_with_sets, default=lambda x: list(x) if isinstance(x, set) else x)

回复收藏 0 原文

向日葵 2024-12-24 09:02:20

如果您只需要对集合进行编码，而不是一般的 Python 对象，并且希望使其易于人类阅读，则可以使用 Raymond Hettinger 答案的简化版本：

import json
import collections

class JSONSetEncoder(json.JSONEncoder):
    """Use with json.dumps to allow Python sets to be encoded to JSON

    Example
    -------

    import json

    data = dict(aset=set([1,2,3]))

    encoded = json.dumps(data, cls=JSONSetEncoder)
    decoded = json.loads(encoded, object_hook=json_as_python_set)
    assert data == decoded     # Should assert successfully

    Any object that is matched by isinstance(obj, collections.Set) will
    be encoded, but the decoded value will always be a normal Python set.

    """

    def default(self, obj):
        if isinstance(obj, collections.Set):
            return dict(_set_object=list(obj))
        else:
            return json.JSONEncoder.default(self, obj)

def json_as_python_set(dct):
    """Decode json {'_set_object': [1,2,3]} to set([1,2,3])

    Example
    -------
    decoded = json.loads(encoded, object_hook=json_as_python_set)

    Also see :class:`JSONSetEncoder`

    """
    if '_set_object' in dct:
        return set(dct['_set_object'])
    return dct

If you only need to encode sets, not general Python objects, and want to keep it easily human-readable, a simplified version of Raymond Hettinger's answer can be used:

import json
import collections

class JSONSetEncoder(json.JSONEncoder):
    """Use with json.dumps to allow Python sets to be encoded to JSON

    Example
    -------

    import json

    data = dict(aset=set([1,2,3]))

    encoded = json.dumps(data, cls=JSONSetEncoder)
    decoded = json.loads(encoded, object_hook=json_as_python_set)
    assert data == decoded     # Should assert successfully

    Any object that is matched by isinstance(obj, collections.Set) will
    be encoded, but the decoded value will always be a normal Python set.

    """

    def default(self, obj):
        if isinstance(obj, collections.Set):
            return dict(_set_object=list(obj))
        else:
            return json.JSONEncoder.default(self, obj)

def json_as_python_set(dct):
    """Decode json {'_set_object': [1,2,3]} to set([1,2,3])

    Example
    -------
    decoded = json.loads(encoded, object_hook=json_as_python_set)

    Also see :class:`JSONSetEncoder`

    """
    if '_set_object' in dct:
        return set(dct['_set_object'])
    return dct

回复收藏 0 原文

城歌 2024-12-24 09:02:20

公认的解决方案的一个缺点是它的输出非常特定于Python。即它的原始 json 输出无法被人类观察或被其他语言（例如 javascript）加载。
示例：

db = {
        "a": [ 44, set((4,5,6)) ],
        "b": [ 55, set((4,3,2)) ]
        }

j = dumps(db, cls=PythonObjectEncoder)
print(j)

会得到你：

{"a": [44, {"_python_object": "gANjYnVpbHRpbnMKc2V0CnEAXXEBKEsESwVLBmWFcQJScQMu"}], "b": [55, {"_python_object": "gANjYnVpbHRpbnMKc2V0CnEAXXEBKEsCSwNLBGWFcQJScQMu"}]}

我可以提出一个解决方案，将集合降级为包含出路列表的字典，并在使用相同编码器加载到 python 时返回到集合，从而保留可观察性和语言不可知论：

from decimal import Decimal
from base64 import b64encode, b64decode
from json import dumps, loads, JSONEncoder
import pickle

class PythonObjectEncoder(JSONEncoder):
    def default(self, obj):
        if isinstance(obj, (list, dict, str, int, float, bool, type(None))):
            return super().default(obj)
        elif isinstance(obj, set):
            return {"__set__": list(obj)}
        return {'_python_object': b64encode(pickle.dumps(obj)).decode('utf-8')}

def as_python_object(dct):
    if '__set__' in dct:
        return set(dct['__set__'])
    elif '_python_object' in dct:
        return pickle.loads(b64decode(dct['_python_object'].encode('utf-8')))
    return dct

db = {
        "a": [ 44, set((4,5,6)) ],
        "b": [ 55, set((4,3,2)) ]
        }

j = dumps(db, cls=PythonObjectEncoder)
print(j)
ob = loads(j)
print(ob["a"])

这可以得到你：

{"a": [44, {"__set__": [4, 5, 6]}], "b": [55, {"__set__": [2, 3, 4]}]}
[44, {'__set__': [4, 5, 6]}]

请注意，序列化包含带有键 "__set__" 元素的字典将破坏此机制。所以 __set__ 现在已经成为保留的 dict 键。显然，可以随意使用另一个更深度混淆的密钥。

One shortcoming of the accepted solution is that its output is very python specific. I.e. its raw json output cannot be observed by a human or loaded by another language (e.g. javascript).
example:

db = {
        "a": [ 44, set((4,5,6)) ],
        "b": [ 55, set((4,3,2)) ]
        }

j = dumps(db, cls=PythonObjectEncoder)
print(j)

Will get you:

{"a": [44, {"_python_object": "gANjYnVpbHRpbnMKc2V0CnEAXXEBKEsESwVLBmWFcQJScQMu"}], "b": [55, {"_python_object": "gANjYnVpbHRpbnMKc2V0CnEAXXEBKEsCSwNLBGWFcQJScQMu"}]}

I can propose a solution which downgrades the set to a dict containing a list on the way out, and back to a set when loaded into python using the same encoder, therefore preserving observability and language agnosticism:

from decimal import Decimal
from base64 import b64encode, b64decode
from json import dumps, loads, JSONEncoder
import pickle

class PythonObjectEncoder(JSONEncoder):
    def default(self, obj):
        if isinstance(obj, (list, dict, str, int, float, bool, type(None))):
            return super().default(obj)
        elif isinstance(obj, set):
            return {"__set__": list(obj)}
        return {'_python_object': b64encode(pickle.dumps(obj)).decode('utf-8')}

def as_python_object(dct):
    if '__set__' in dct:
        return set(dct['__set__'])
    elif '_python_object' in dct:
        return pickle.loads(b64decode(dct['_python_object'].encode('utf-8')))
    return dct

db = {
        "a": [ 44, set((4,5,6)) ],
        "b": [ 55, set((4,3,2)) ]
        }

j = dumps(db, cls=PythonObjectEncoder)
print(j)
ob = loads(j)
print(ob["a"])

Which gets you:

{"a": [44, {"__set__": [4, 5, 6]}], "b": [55, {"__set__": [2, 3, 4]}]}
[44, {'__set__': [4, 5, 6]}]

Note that serializing a dictionary which has an element with a key "__set__" will break this mechanism. So __set__ has now become a reserved dict key. Obviously feel free to use another, more deeply obfuscated key.

回复收藏 0 原文

请远离我 2024-12-24 09:02:20

>>> import json
>>> set_object = set([1,2,3,4])
>>> json.dumps(list(set_object))
'[1, 2, 3, 4]'

>>> import json
>>> set_object = set([1,2,3,4])
>>> json.dumps(list(set_object))
'[1, 2, 3, 4]'

回复收藏 0 原文

淡写薰衣草的香 2024-12-24 09:02:20

你应该尝试 jsonwhatever

https://pypi.org/project/jsonwhatever/

pip install jsonwhatever

from jsonwhatever import JsonWhatEver

set_a = {1,2,3}

jsonwe = JsonWhatEver()

string_res = jsonwe.jsonwhatever('set_string', set_a)

print(string_res)

you should try jsonwhatever

https://pypi.org/project/jsonwhatever/

pip install jsonwhatever

from jsonwhatever import JsonWhatEver

set_a = {1,2,3}

jsonwe = JsonWhatEver()

string_res = jsonwe.jsonwhatever('set_string', set_a)

print(string_res)

回复收藏 0 原文

~没有更多了~