SimpleJson 处理相同命名实体

发布于 2024-12-10 23:52:32 字数 1437 浏览 0 评论 0原文

我在应用程序引擎中使用 Alchemy API,因此我使用 simplejson 库来解析响应。问题是响应中的条目具有 sme 名称

 {
    "status": "OK",
    "usage": "By accessing AlchemyAPI or using information generated by AlchemyAPI, you are agreeing to be bound by the AlchemyAPI Terms of Use: http://www.alchemyapi.com/company/terms.html",
    "url": "",
    "language": "english",
    "entities": [
        {
            "type": "Person",
            "relevance": "0.33",
            "count": "1",
            "text": "Michael Jordan",
            "disambiguated": {
                "name": "Michael Jordan",
                "subType": "Athlete",
                "subType": "AwardWinner",
                "subType": "BasketballPlayer",
                "subType": "HallOfFameInductee",
                "subType": "OlympicAthlete",
                "subType": "SportsLeagueAwardWinner",
                "subType": "FilmActor",
                "subType": "TVActor",
                "dbpedia": "http://dbpedia.org/resource/Michael_Jordan",
                "freebase": "http://rdf.freebase.com/ns/guid.9202a8c04000641f8000000000029161",
                "umbel": "http://umbel.org/umbel/ne/wikipedia/Michael_Jordan",
                "opencyc": "http://sw.opencyc.org/concept/Mx4rvViVq5wpEbGdrcN5Y29ycA",
                "yago": "http://mpii.de/yago/resource/Michael_Jordan"
            }
        }
    ]
}

因此问题是“subType”重复,因此加载返回的字典只是“TVActor”而不是列表。有办法解决这个问题吗?

I'm using the Alchemy API in app engine so I'm using the simplejson library to parse responses. The problem is that the responses have entries that have the sme name

 {
    "status": "OK",
    "usage": "By accessing AlchemyAPI or using information generated by AlchemyAPI, you are agreeing to be bound by the AlchemyAPI Terms of Use: http://www.alchemyapi.com/company/terms.html",
    "url": "",
    "language": "english",
    "entities": [
        {
            "type": "Person",
            "relevance": "0.33",
            "count": "1",
            "text": "Michael Jordan",
            "disambiguated": {
                "name": "Michael Jordan",
                "subType": "Athlete",
                "subType": "AwardWinner",
                "subType": "BasketballPlayer",
                "subType": "HallOfFameInductee",
                "subType": "OlympicAthlete",
                "subType": "SportsLeagueAwardWinner",
                "subType": "FilmActor",
                "subType": "TVActor",
                "dbpedia": "http://dbpedia.org/resource/Michael_Jordan",
                "freebase": "http://rdf.freebase.com/ns/guid.9202a8c04000641f8000000000029161",
                "umbel": "http://umbel.org/umbel/ne/wikipedia/Michael_Jordan",
                "opencyc": "http://sw.opencyc.org/concept/Mx4rvViVq5wpEbGdrcN5Y29ycA",
                "yago": "http://mpii.de/yago/resource/Michael_Jordan"
            }
        }
    ]
}

So the problem is that the "subType" is repeated so the dict that a loads returns is just "TVActor" rather than a list. Is there anyway to go around this?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

殊姿 2024-12-17 23:52:32

定义 application/jsonrfc 4627 > 说:

An object is an unordered collection of zero or more name/value pairs

并且:

The names within an object SHOULD be unique.

这意味着 AlchemyAPI 不应在同一对象内返回多个 "subType" 名称并声明它是 JSON。

您可以尝试以 XML 格式 (outputMode=xml) 请求相同内容,以避免结果出现歧义或将重复的键值转换为列表:

import simplejson as json
from collections import defaultdict

def multidict(ordered_pairs):
    """Convert duplicate keys values to lists."""
    # read all values into lists
    d = defaultdict(list)
    for k, v in ordered_pairs:
        d[k].append(v)

    # unpack lists that have only 1 item
    for k, v in d.items():
        if len(v) == 1:
            d[k] = v[0]
    return dict(d)

print json.JSONDecoder(object_pairs_hook=multidict).decode(text)

示例

text = """{
  "type": "Person",
  "subType": "Athlete",
  "subType": "AwardWinner"
}"""

输出

{u'subType': [u'Athlete', u'AwardWinner'], u'type': u'Person'}

The rfc 4627 that defines application/json says:

An object is an unordered collection of zero or more name/value pairs

And:

The names within an object SHOULD be unique.

It means that AlchemyAPI should not return multiple "subType" names inside the same object and claim that it is a JSON.

You could try to request the same in XML format (outputMode=xml) to avoid ambiguity in the results or to convert duplicate keys values into lists:

import simplejson as json
from collections import defaultdict

def multidict(ordered_pairs):
    """Convert duplicate keys values to lists."""
    # read all values into lists
    d = defaultdict(list)
    for k, v in ordered_pairs:
        d[k].append(v)

    # unpack lists that have only 1 item
    for k, v in d.items():
        if len(v) == 1:
            d[k] = v[0]
    return dict(d)

print json.JSONDecoder(object_pairs_hook=multidict).decode(text)

Example

text = """{
  "type": "Person",
  "subType": "Athlete",
  "subType": "AwardWinner"
}"""

Output

{u'subType': [u'Athlete', u'AwardWinner'], u'type': u'Person'}
聚集的泪 2024-12-17 23:52:32

application/json 媒体类型的 rfc 4627 建议使用唯一键,但不会明确禁止它们:

对象内的名称应该是唯一的。

来自 rfc 2119

应该这个词,或者形容词“推荐”,意味着有
在特定情况下可能存在忽略a的正当理由
特定项目,但必须理解其全部含义并且
在选择不同的课程之前仔细权衡。

这是一个已知问题。

您可以通过修改重复的键或将其保存到数组中来解决此问题。
如果需要,您可以使用此代码。

import json

def parse_object_pairs(pairs):
    """
    This function get list of tuple's
    and check if have duplicate keys.
    if have then return the pairs list itself.
    but if haven't return dict that contain pairs.

    >>> parse_object_pairs([("color": "red"), ("size": 3)])
    {"color": "red", "size": 3}

    >>> parse_object_pairs([("color": "red"), ("size": 3), ("color": "blue")])
    [("color": "red"), ("size": 3), ("color": "blue")]

    :param pairs: list of tuples.
    :return dict or list that contain pairs.
    """
    dict_without_duplicate = dict()
    for k, v in pairs:
        if k in dict_without_duplicate:
            return pairs
        else:
            dict_without_duplicate[k] = v

    return dict_without_duplicate

decoder = json.JSONDecoder(object_pairs_hook=parse_object_pairs)

str_json_can_be_with_duplicate_keys = '{"color": "red", "size": 3, "color": "red"}'

data_after_decode = decoder.decode(str_json_can_be_with_duplicate_keys)

The rfc 4627 for application/json media type recommends unique keys but it doesn't forbid them explicitly:

The names within an object SHOULD be unique.

From rfc 2119:

SHOULD This word, or the adjective "RECOMMENDED", mean that there
may exist valid reasons in particular circumstances to ignore a
particular item, but the full implications must be understood and
carefully weighed before choosing a different course.

This is a known problam.

You can solve this by modify the duplicate key, or save him into array.
You can use this code if you want.

import json

def parse_object_pairs(pairs):
    """
    This function get list of tuple's
    and check if have duplicate keys.
    if have then return the pairs list itself.
    but if haven't return dict that contain pairs.

    >>> parse_object_pairs([("color": "red"), ("size": 3)])
    {"color": "red", "size": 3}

    >>> parse_object_pairs([("color": "red"), ("size": 3), ("color": "blue")])
    [("color": "red"), ("size": 3), ("color": "blue")]

    :param pairs: list of tuples.
    :return dict or list that contain pairs.
    """
    dict_without_duplicate = dict()
    for k, v in pairs:
        if k in dict_without_duplicate:
            return pairs
        else:
            dict_without_duplicate[k] = v

    return dict_without_duplicate

decoder = json.JSONDecoder(object_pairs_hook=parse_object_pairs)

str_json_can_be_with_duplicate_keys = '{"color": "red", "size": 3, "color": "red"}'

data_after_decode = decoder.decode(str_json_can_be_with_duplicate_keys)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文