如何在 Python 中解析(读取)和使用 JSON?

发布于 2024-12-10 02:29:06 字数 1558 浏览 0 评论 0原文

我的 Python 程序接收 JSON 数据,我需要从中获取一些信息。如何解析数据并使用结果?我认为我需要使用 json.loads 来完成此任务,但我不明白如何做到这一点。

例如,假设我有 jsonStr = '{"one" : "1", "two" : "2", " Three" : "3"}'。给定这个 JSON 和输入 "two",我如何获取相应的数据 "2"


注意 .load 用于文件; .loads 用于字符串。另请参阅:从文件读取 JSON

有时,JSON 文档旨在表示表格数据。如果您有类似的东西并尝试将其与 Pandas 一起使用,请参阅 Python - 如何将 JSON 文件转换为 Dataframe。。 sub>

有些数据表面上看起来像 JSON,但实际上不是 JSON

例如,有时数据来自将 repr 应用到原生Python 数据结构。结果可能会以不同的方式使用引号,使用标题大小写的 TrueFalse 而不是 JSON 强制的 truefalse等。对于此类数据,请参阅将字典的字符串表示形式转换为字典如何将列表的字符串表示形式转换为列表

另一种常见的变体格式将单独有效的 JSON 格式数据放在每一行上输入。 (正确的 JSON 无法逐行解析,因为它使用可以相隔多行的平衡括号。)这种格式称为 JSONL。请参阅将 JSONL 文件加载为 JSON 对象

有时,来自 Web 源的 JSON 数据会填充一些额外的文本。在某些情况下,这可以解决浏览器中的安全限制。这称为 JSONP,在 什么是 JSONP,为什么创建它? 中有描述。在其他情况下,额外的文本会实施安全措施,如为什么 Google 在前面添加 while(1); 中所述。他们的 JSON 响应?。无论哪种方式,在 Python 中处理这个问题都很简单:只需识别并删除多余的文本,然后像以前一样继续。

My Python program receives JSON data, and I need to get bits of information out of it. How can I parse the data and use the result? I think I need to use json.loads for this task, but I can't understand how to do it.

For example, suppose that I have jsonStr = '{"one" : "1", "two" : "2", "three" : "3"}'. Given this JSON, and an input of "two", how can I get the corresponding data, "2"?


Beware that .load is for files; .loads is for strings. See also: Reading JSON from a file.

Occasionally, a JSON document is intended to represent tabular data. If you have something like this and are trying to use it with Pandas, see Python - How to convert JSON File to Dataframe.

Some data superficially looks like JSON, but is not JSON.

For example, sometimes the data comes from applying repr to native Python data structures. The result may use quotes differently, use title-cased True and False rather than JSON-mandated true and false, etc. For such data, see Convert a String representation of a Dictionary to a dictionary or How to convert string representation of list to a list.

Another common variant format puts separate valid JSON-formatted data on each line of the input. (Proper JSON cannot be parsed line by line, because it uses balanced brackets that can be many lines apart.) This format is called JSONL. See Loading JSONL file as JSON objects.

Sometimes JSON data from a web source is padded with some extra text. In some contexts, this works around security restrictions in browsers. This is called JSONP and is described at What is JSONP, and why was it created?. In other contexts, the extra text implements a security measure, as described at Why does Google prepend while(1); to their JSON responses?. Either way, handling this in Python is straightforward: simply identify and remove the extra text, and proceed as before.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

恏ㄋ傷疤忘ㄋ疼 2024-12-17 02:29:06

很简单:

import json
data = json.loads('{"one" : "1", "two" : "2", "three" : "3"}')
print(data['two'])  # or `print data['two']` in Python 2

Very simple:

import json
data = json.loads('{"one" : "1", "two" : "2", "three" : "3"}')
print(data['two'])  # or `print data['two']` in Python 2
愁以何悠 2024-12-17 02:29:06

有时您的 JSON 不是字符串。例如,如果您要从如下 URL 获取 JSON:

j = urllib2.urlopen('http://example.com/data.json')

您将需要使用 json.load(),而不是 json.loads():(

j_obj = json.load(j)

它是很容易忘记:“s”代表“字符串”)

Sometimes your JSON is not a string. For example, if you are getting a JSON from a URL like this:

j = urllib2.urlopen('http://example.com/data.json')

You will need to use json.load(), not json.loads():

j_obj = json.load(j)

(It is easy to forget: the 's' is for 'string')

維他命╮ 2024-12-17 02:29:06

对于 URL 或文件,请使用 json.load()。对于包含 .json 内容的字符串,请使用 json.loads()。

#! /usr/bin/python

import json
# from pprint import pprint

json_file = 'my_cube.json'
cube = '1'

with open(json_file) as json_data:
    data = json.load(json_data)

# pprint(data)

print "Dimension: ", data['cubes'][cube]['dim']
print "Measures:  ", data['cubes'][cube]['meas']

For URL or file, use json.load(). For string with .json content, use json.loads().

#! /usr/bin/python

import json
# from pprint import pprint

json_file = 'my_cube.json'
cube = '1'

with open(json_file) as json_data:
    data = json.load(json_data)

# pprint(data)

print "Dimension: ", data['cubes'][cube]['dim']
print "Measures:  ", data['cubes'][cube]['meas']
浅唱々樱花落 2024-12-17 02:29:06

以下是可能对您有所帮助的简单示例:

json_string = """
{
    "pk": 1, 
    "fa": "cc.ee", 
    "fb": {
        "fc": "", 
        "fd_id": "12345"
    }
}"""

import json
data = json.loads(json_string)
if data["fa"] == "cc.ee":
    data["fb"]["new_key"] = "cc.ee was present!"

print json.dumps(data)

上述代码的输出将是:

{"pk": 1, "fb": {"new_key": "cc.ee was present!", "fd_id": "12345", 
 "fc": ""}, "fa": "cc.ee"}

请注意,您可以设置 dump 的 ident 参数来打印它,如下所示(例如,当使用 print json.dumps(data , indent=4) 时):

{
    "pk": 1, 
    "fb": {
        "new_key": "cc.ee was present!", 
        "fd_id": "12345", 
        "fc": ""
    }, 
    "fa": "cc.ee"
}

Following is simple example that may help you:

json_string = """
{
    "pk": 1, 
    "fa": "cc.ee", 
    "fb": {
        "fc": "", 
        "fd_id": "12345"
    }
}"""

import json
data = json.loads(json_string)
if data["fa"] == "cc.ee":
    data["fb"]["new_key"] = "cc.ee was present!"

print json.dumps(data)

The output for the above code will be:

{"pk": 1, "fb": {"new_key": "cc.ee was present!", "fd_id": "12345", 
 "fc": ""}, "fa": "cc.ee"}

Note that you can set the ident argument of dump to print it like so (for example,when using print json.dumps(data , indent=4)):

{
    "pk": 1, 
    "fb": {
        "new_key": "cc.ee was present!", 
        "fd_id": "12345", 
        "fc": ""
    }, 
    "fa": "cc.ee"
}
毁梦 2024-12-17 02:29:06

解析数据

使用标准库 json 模块

对于字符串数据,使用 json.loads

import json

text = '{"one" : "1", "two" : "2", "three" : "3"}'
parsed = json.loads(example)

对于来自文件或其他类似文件的数据对象,使用json.load

import io, json
# create an in-memory file-like object for demonstration purposes.
text = '{"one" : "1", "two" : "2", "three" : "3"}'
stream = io.StringIO(text)
parsed = json.load(stream) # load, not loads

很容易记住区别:loads 后面的 s 代表“字符串”。 (不可否认,这可能不符合标准的现代命名实践。)

请注意,json.load 接受文件路径:

>>> json.load('example.txt')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.8/json/__init__.py", line 293, in load
    return loads(fp.read(),
AttributeError: 'str' object has no attribute 'read'

这两个函数提供相同的用于自定义解析过程的附加选项集。从 3.6 开始,选项仅限关键字。

对于字符串数据,还可以使用 JSONDecoder库提供的类,如下所示:

import json
text = '{"one" : "1", "two" : "2", "three" : "3"}'
decoder = json.JSONDecoder()
parsed = decoder.decode(text)

相同的关键字参数可用,但现在它们被传递给 JSONDecoder 的构造函数,而不是传递给 JSONDecoder 的构造函数。 .decode 方法。该类的主要优点是它还提供了.raw_decode方法,该方法将忽略JSON结束后的额外数据:

import json
text_with_junk = '{"one" : "1", "two" : "2", "three" : "3"} ignore this'
decoder = json.JSONDecoder()
# `amount` will count how many characters were parsed.
parsed, amount = decoder.raw_decode(text_with_junk)

使用requests或其他隐式支持

当数据时使用流行的第三方requests库从互联网上检索,没有必要提取.text(或创建任何类型的文件-like 对象)来自 Response 对象并单独解析它。相反,Response 对象直接提供一个 .json 方法来执行此解析:

import requests
response = requests.get('https://www.example.com')
parsed = response.json()

此方法接受与标准库 json 相同的关键字参数功能。

使用结果

默认情况下,通过上述任何方法进行解析都会产生一个完全普通的 Python 数据结构,由完全普通的内置类型 字典列表strint浮点布尔 (JSON truefalse 成为 Python 常量 TrueFalse)和 NoneType (JSON null 成为Python 常量)。

因此,使用此结果的工作方式与使用任何其他技术获得相同数据的方式相同。

因此,继续问题中的例子:

>>> parsed
{'one': '1', 'two': '2', 'three': '3'}
>>> parsed['two']
'2'

我强调这一点是因为许多人似乎期望结果有一些特别之处;没有。它只是一个嵌套数据结构,尽管处理嵌套有时很难理解。

例如,考虑像 result = {'a': [{'b': 'c'}, {'d': 'e'}]} 这样的解析结果。要获取 'e' 需要一次执行一个适当的步骤:在字典中查找 a 键会给出一个列表 [{'b': ' c'}, {'d': 'e'}];该列表的第二个元素(索引 1)是 {'d': 'e'};并在其中查找 'd' 键给出 'e' 值。因此,相应的代码是 result['a'][1]['d']:每个索引步骤按顺序应用。

另请参阅如何从嵌套数据结构(例如从解析 JSON)中提取单个值?

有时,人们想要应用更复杂的选择标准、迭代嵌套列表、过滤或转换数据等。这些是更复杂的主题,将在其他地方处理。

常见的混淆来源

JSON 相似

在尝试解析 JSON 数据之前,确保数据实际上是 JSON 非常重要。检查 JSON 格式规范 以验证预期内容。要点:

  • 文档表示一个值(通常是一个JSON“对象”,对应于Python dict,但JSON表示的所有其他类型都是允许的) 。特别是,它没有在每一行上有一个单独的条目 - 这就是 JSONL。

  • 使用标准文本编码(通常为 UTF-8)后,数据是人类可读的。几乎所有文本都包含在双引号内,并在适当的情况下使用转义序列。

处理嵌入数据

考虑一个示例文件,其中包含:

{"one": "{\"two\": \"three\", \"backslash\": \"\\\\\"}"}

这里的反斜杠用于 JSON 转义机制。
当使用上述方法之一进行解析时,我们得到如下结果:

>>> example = input()
{"one": "{\"two\": \"three\", \"backslash\": \"\\\\\"}"}
>>> parsed = json.loads(example)
>>> parsed
{'one': '{"two": "three", "backslash": "\\\\"}'}

注意 parsed['one']str,而不是字典。不过,碰巧的是,该字符串本身代表“嵌入”的 JSON 数据。

要用解析结果替换嵌入数据,只需访问数据,使用相同的解析技术,然后从那里继续(例如,通过就地更新原始结果):

>>> parsed['one'] = json.loads(parsed['one'])
>>> parsed
{'one': {'two': 'three', 'backslash': '\\'}}

请注意,'这里的 \\' 部分是包含一个实际反斜杠而不是两个的字符串的表示。这遵循常见的 Python 字符串转义规则,这让我们...

JSON 转义与 Python 字符串文字转义

有时,人们在尝试测试涉及解析 JSON 的代码时会感到困惑,并将输入作为不正确< /strong> Python 源代码中的字符串文字。当尝试测试需要使用嵌入式 JSON 的代码时,尤其会发生这种情况。

问题在于 JSON 格式和字符串文字格式每种都有单独的数据转义策略。 Python 将处理字符串文字中的转义符以创建字符串,然后该字符串仍然需要包含 JSON 格式使用的转义序列。

在上面的示例中,我在解释器提示符处使用 input 来显示示例数据,以避免与转义混淆。以下是在源代码中使用字符串文字的一个类似示例:

>>> json.loads('{"one": "{\\"two\\": \\"three\\", \\"backslash\\": \\"\\\\\\\\\\"}"}')
{'one': '{"two": "three", "backslash": "\\\\"}'}

要使用双引号字符串文字,还需要对字符串文字中的双引号进行转义。因此:

>>> json.loads('{\"one\": \"{\\\"two\\\": \\\"three\\\", \\\"backslash\\\": \\\"\\\\\\\\\\\"}\"}')
{'one': '{"two": "three", "backslash": "\\\\"}'}

输入中的每个 \\\" 序列在实际 JSON 数据中都变为 \",进而变为 " (嵌入到字符串中) ) 当由 JSON 解析器解析时,类似地, \\\\\\\\\\\" (五对反斜杠,然后是转义引号)变为 \\\\\" (五个反斜杠和一个引号;相当于两对实际 JSON 数据中的反斜杠,然后是转义的引号),当由 JSON 解析器解析时,它变成 \\" (两个反斜杠和一个引号),它变成 \\\\"< /code> (两个转义的反斜杠和一个引号)出现在解析结果的字符串表示形式中(从现在起,引号不需要需要转义,因为 Python 可以对字符串使用单引号;但是反斜杠仍然如此)。

简单的自定义

除了strict选项之外,还可以使用json.loadjson.loads的关键字选项。 code> 应该是回调。解析器将调用它们,传入部分数据,并使用返回的任何内容来创建总体结果。

“parse”钩子是相当不言自明的。例如,我们可以指定将浮点值转换为 decimal.Decimal 实例,而不是使用本机 Python float

>>> import decimal
>>> json.loads('123.4', parse_float=decimal.Decimal)
Decimal('123.4')

或者对每个值使用浮点数,即使它们可以改为转换为整数:

>>> json.loads('123', parse_int=float)
123.0

或拒绝转换特殊浮点值的 JSON 表示:

>>> def reject_special_floats(value):
...     raise ValueError
... 
>>> json.loads('Infinity')
inf
>>> json.loads('Infinity', parse_constant=reject_special_floats)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.8/json/__init__.py", line 370, in loads
    return cls(**kw).decode(s)
  File "/usr/lib/python3.8/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python3.8/json/decoder.py", line 353, in raw_decode
    obj, end = self.scan_once(s, idx)
  File "<stdin>", line 2, in reject_special_floats
ValueError

使用 object_hookobject_pairs_hook

object_hook 和的 自定义示例object_pairs_hook 可用于控制解析器在给定 JSON 对象时执行的操作,而不是创建 Python dict
将使用一个参数调用提供的 object_pairs_hook ,该参数是一组键值对列表,否则将用于 dict 。它应该返回所需的 dict 或其他结果:

>>> def process_object_pairs(items):
...     return {k: f'processed {v}' for k, v in items}
... 
>>> json.loads('{"one": 1, "two": 2}', object_pairs_hook=process_object_pairs)
{'one': 'processed 1', 'two': 'processed 2'}

将使用原本创建的 dict 来调用提供的 object_hook 以及结果将替代:

>>> def make_items_list(obj):
...     return list(obj.items())
... 
>>> json.loads('{"one": 1, "two": 2}', object_hook=make_items_list)
[('one', 1), ('two', 2)]

如果两者都提供,object_hook 将被忽略并且仅使用object_items_hook

文本编码问题和 bytes/unicode 混淆

JSON 从根本上来说是一种文本格式。在解析文件之前,应首先使用适当的编码将输入数据从原始字节转换为文本。

在 3.x 中,支持从 bytes 对象加载,并且将隐式使用 UTF-8 编码:

>>> json.loads('"text"')
'text'
>>> json.loads(b'"text"')
'text'
>>> json.loads('"\xff"') # Unicode code point 255
'ÿ'
>>> json.loads(b'"\xff"') # Not valid UTF-8 encoded data!
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.8/json/__init__.py", line 343, in loads
    s = s.decode(detect_encoding(s), 'surrogatepass')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 1: invalid start byte

UTF-8 通常被认为是 JSON 的默认值。虽然原始规范,ECMA-404 并没有强制编码(它仅描述“JSON 文本”,而不是 JSON 文件或文档),RFC 8259 要求:

不属于封闭生态系统的系统之间交换的 JSON 文本必须使用 UTF-8 [RFC3629] 进行编码。

在这样一个“封闭的生态系统”中(即对于编码不同且不会公开共享的本地文档),首先明确应用适当的编码:

>>> json.loads(b'"\xff"'.decode('iso-8859-1'))
'ÿ'

同样,JSON文件应该以文本模式打开,而不是二进制模式。如果文件使用不同的编码,只需在打开它时指定:

with open('example.json', encoding='iso-8859-1') as f:
    print(json.load(f))

在 2.x 中,字符串和字节序列没有正确区分,导致了很多问题和混乱,特别是在使用 JSON 时。

积极维护 2.x 代码库(请注意,2.x 本身 自 1 月起就不再维护2020 年 1 月 1 日)应始终使用 unicode 值表示文本,使用 str 值表示原始数据(str 是2.x 中的 bytes),并接受 unicode 值的 repr 将具有 u 前缀(之后总之,代码应该关心值的实际值,而不是它在 REPL 中的样子)。

历史记录:simplejson

simplejson 只是标准库json模块,但外部维护和开发。它最初是在 JSON 支持添加到 Python 标准库之前创建的。 在 2.6 中,< code>simplejson 项目已作为 json 合并到标准库中。当前的开发保持了对 2.5 的兼容性,尽管还有一个未维护的遗留分支应该支持早至 2.2。

标准库通常使用相当旧版本的包;例如,我的安装报告是 3.8.10

>>> json.__version__
'2.0.9'

,而最新版本(截至撰写本文时)是 3.18.1。 (Github 存储库中标记的版本只能追溯到 3.8.2;2.0.9 发布日期 到 2009 年

我还无法找到哪些 simplejson 版本对应于哪些 Python 版本的综合文档。

Parsing the data

Using the standard library json module

For string data, use json.loads:

import json

text = '{"one" : "1", "two" : "2", "three" : "3"}'
parsed = json.loads(example)

For data that comes from a file, or other file-like object, use json.load:

import io, json
# create an in-memory file-like object for demonstration purposes.
text = '{"one" : "1", "two" : "2", "three" : "3"}'
stream = io.StringIO(text)
parsed = json.load(stream) # load, not loads

It's easy to remember the distinction: the trailing s of loads stands for "string". (This is, admittedly, probably not in keeping with standard modern naming practice.)

Note that json.load does not accept a file path:

>>> json.load('example.txt')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.8/json/__init__.py", line 293, in load
    return loads(fp.read(),
AttributeError: 'str' object has no attribute 'read'

Both of these functions provide the same set of additional options for customizing the parsing process. Since 3.6, the options are keyword-only.

For string data, it is also possible to use the JSONDecoder class provided by the library, like so:

import json
text = '{"one" : "1", "two" : "2", "three" : "3"}'
decoder = json.JSONDecoder()
parsed = decoder.decode(text)

The same keyword parameters are available, but now they are passed to the constructor of the JSONDecoder, not the .decode method. The main advantage of the class is that it also provides a .raw_decode method, which will ignore extra data after the end of the JSON:

import json
text_with_junk = '{"one" : "1", "two" : "2", "three" : "3"} ignore this'
decoder = json.JSONDecoder()
# `amount` will count how many characters were parsed.
parsed, amount = decoder.raw_decode(text_with_junk)

Using requests or other implicit support

When data is retrieved from the Internet using the popular third-party requests library, it is not necessary to extract .text (or create any kind of file-like object) from the Response object and parse it separately. Instead, the Response object directly provides a .json method which will do this parsing:

import requests
response = requests.get('https://www.example.com')
parsed = response.json()

This method accepts the same keyword parameters as the standard library json functionality.

Using the results

Parsing by any of the above methods will result, by default, in a perfectly ordinary Python data structure, composed of the perfectly ordinary built-in types dict, list, str, int, float, bool (JSON true and false become Python constants True and False) and NoneType (JSON null becomes the Python constant None).

Working with this result, therefore, works the same way as if the same data had been obtained using any other technique.

Thus, to continue the example from the question:

>>> parsed
{'one': '1', 'two': '2', 'three': '3'}
>>> parsed['two']
'2'

I emphasize this because many people seem to expect that there is something special about the result; there is not. It's just a nested data structure, though dealing with nesting is sometimes difficult to understand.

Consider, for example, a parsed result like result = {'a': [{'b': 'c'}, {'d': 'e'}]}. To get 'e' requires following the appropriate steps one at a time: looking up the a key in the dict gives a list [{'b': 'c'}, {'d': 'e'}]; the second element of that list (index 1) is {'d': 'e'}; and looking up the 'd' key in there gives the 'e' value. Thus, the corresponding code is result['a'][1]['d']: each indexing step is applied in order.

See also How can I extract a single value from a nested data structure (such as from parsing JSON)?.

Sometimes people want to apply more complex selection criteria, iterate over nested lists, filter or transform the data, etc. These are more complex topics that will be dealt with elsewhere.

Common sources of confusion

JSON lookalikes

Before attempting to parse JSON data, it is important to ensure that the data actually is JSON. Check the JSON format specification to verify what is expected. Key points:

  • The document represents one value (normally a JSON "object", which corresponds to a Python dict, but every other type represented by JSON is permissible). In particular, it does not have a separate entry on each line - that's JSONL.

  • The data is human-readable after using a standard text encoding (normally UTF-8). Almost all of the text is contained within double quotes, and uses escape sequences where appropriate.

Dealing with embedded data

Consider an example file that contains:

{"one": "{\"two\": \"three\", \"backslash\": \"\\\\\"}"}

The backslashes here are for JSON's escape mechanism.
When parsed with one of the above approaches, we get a result like:

>>> example = input()
{"one": "{\"two\": \"three\", \"backslash\": \"\\\\\"}"}
>>> parsed = json.loads(example)
>>> parsed
{'one': '{"two": "three", "backslash": "\\\\"}'}

Notice that parsed['one'] is a str, not a dict. As it happens, though, that string itself represents "embedded" JSON data.

To replace the embedded data with its parsed result, simply access the data, use the same parsing technique, and proceed from there (e.g. by updating the original result in place):

>>> parsed['one'] = json.loads(parsed['one'])
>>> parsed
{'one': {'two': 'three', 'backslash': '\\'}}

Note that the '\\' part here is the representation of a string containing one actual backslash, not two. This is following the usual Python rules for string escapes, which brings us to...

JSON escaping vs. Python string literal escaping

Sometimes people get confused when trying to test code that involves parsing JSON, and supply input as an incorrect string literal in the Python source code. This especially happens when trying to test code that needs to work with embedded JSON.

The issue is that the JSON format and the string literal format each have separate policies for escaping data. Python will process escapes in the string literal in order to create the string, which then still needs to contain escape sequences used by the JSON format.

In the above example, I used input at the interpreter prompt to show the example data, in order to avoid confusion with escaping. Here is one analogous example using a string literal in the source:

>>> json.loads('{"one": "{\\"two\\": \\"three\\", \\"backslash\\": \\"\\\\\\\\\\"}"}')
{'one': '{"two": "three", "backslash": "\\\\"}'}

To use a double-quoted string literal instead, double-quotes in the string literal also need to be escaped. Thus:

>>> json.loads('{\"one\": \"{\\\"two\\\": \\\"three\\\", \\\"backslash\\\": \\\"\\\\\\\\\\\"}\"}')
{'one': '{"two": "three", "backslash": "\\\\"}'}

Each sequence of \\\" in the input becomes \" in the actual JSON data, which becomes " (embedded within a string) when parsed by the JSON parser. Similarly, \\\\\\\\\\\" (five pairs of backslashes, then an escaped quote) becomes \\\\\" (five backslashes and a quote; equivalently, two pairs of backslashes, then an escaped quote) in the actual JSON data, which becomes \\" (two backslashes and a quote) when parsed by the JSON parser, which becomes \\\\" (two escaped backslashes and a quote) in the string representation of the parsed result (since now, the quote does not need escaping, as Python can use single quotes for the string; but the backslashes still do).

Simple customization

Aside from the strict option, the keyword options available for json.load and json.loads should be callbacks. The parser will call them, passing in portions of the data, and use whatever is returned to create the overall result.

The "parse" hooks are fairly self-explanatory. For example, we can specify to convert floating-point values to decimal.Decimal instances instead of using the native Python float:

>>> import decimal
>>> json.loads('123.4', parse_float=decimal.Decimal)
Decimal('123.4')

or use floats for every value, even if they could be converted to integer instead:

>>> json.loads('123', parse_int=float)
123.0

or refuse to convert JSON's representations of special floating-point values:

>>> def reject_special_floats(value):
...     raise ValueError
... 
>>> json.loads('Infinity')
inf
>>> json.loads('Infinity', parse_constant=reject_special_floats)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.8/json/__init__.py", line 370, in loads
    return cls(**kw).decode(s)
  File "/usr/lib/python3.8/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python3.8/json/decoder.py", line 353, in raw_decode
    obj, end = self.scan_once(s, idx)
  File "<stdin>", line 2, in reject_special_floats
ValueError

Customization example using object_hook and object_pairs_hook

object_hook and object_pairs_hook can be used to control what the parser does when given a JSON object, rather than creating a Python dict.
A supplied object_pairs_hook will be called with one argument, which is a list of the key-value pairs that would otherwise be used for the dict. It should return the desired dict or other result:

>>> def process_object_pairs(items):
...     return {k: f'processed {v}' for k, v in items}
... 
>>> json.loads('{"one": 1, "two": 2}', object_pairs_hook=process_object_pairs)
{'one': 'processed 1', 'two': 'processed 2'}

A supplied object_hook will instead be called with the dict that would otherwise be created, and the result will substitute:

>>> def make_items_list(obj):
...     return list(obj.items())
... 
>>> json.loads('{"one": 1, "two": 2}', object_hook=make_items_list)
[('one', 1), ('two', 2)]

If both are supplied, the object_hook will be ignored and only the object_items_hook will be used.

Text encoding issues and bytes/unicode confusion

JSON is fundamentally a text format. Input data should be converted from raw bytes to text first, using an appropriate encoding, before the file is parsed.

In 3.x, loading from a bytes object is supported, and will implicitly use UTF-8 encoding:

>>> json.loads('"text"')
'text'
>>> json.loads(b'"text"')
'text'
>>> json.loads('"\xff"') # Unicode code point 255
'ÿ'
>>> json.loads(b'"\xff"') # Not valid UTF-8 encoded data!
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.8/json/__init__.py", line 343, in loads
    s = s.decode(detect_encoding(s), 'surrogatepass')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 1: invalid start byte

UTF-8 is generally considered the default for JSON. While the original specification, ECMA-404 does not mandate an encoding (it only describes "JSON text", rather than JSON files or documents), RFC 8259 demands:

JSON text exchanged between systems that are not part of a closed ecosystem MUST be encoded using UTF-8 [RFC3629].

In such a "closed ecosystem" (i.e. for local documents that are encoded differently and will not be shared publicly), explicitly apply the appropriate encoding first:

>>> json.loads(b'"\xff"'.decode('iso-8859-1'))
'ÿ'

Similarly, JSON files should be opened in text mode, not binary mode. If the file uses a different encoding, simply specify that when opening it:

with open('example.json', encoding='iso-8859-1') as f:
    print(json.load(f))

In 2.x, strings and byte-sequences were not properly distinguished, which resulted in a lot of problems and confusion particularly when working with JSON.

Actively maintained 2.x codebases (please note that 2.x itself has not been maintained since Jan 1, 2020) should consistently use unicode values to represent text and str values to represent raw data (str is an alias for bytes in 2.x), and accept that the repr of unicode values will have a u prefix (after all, the code should be concerned with what the value actually is, not what it looks like at the REPL).

Historical note: simplejson

simplejson is simply the standard library json module, but maintained and developed externally. It was originally created before JSON support was added to the Python standard library. In 2.6, the simplejson project was incorporated into the standard library as json. Current development maintains compatibility back to 2.5, although there is also an unmaintained, legacy branch that should support as far back as 2.2.

The standard library generally uses quite old versions of the package; for example, my 3.8.10 installation reports

>>> json.__version__
'2.0.9'

whereas the most recent release (as of this writing) is 3.18.1. (The tagged releases in the Github repository only go as far back as 3.8.2; the 2.0.9 release dates to 2009.

I have as yet been unable to find comprehensive documentation of which simplejson versions correspond to which Python releases.

黑色毁心梦 2024-12-17 02:29:06

pathlib.Path 对象是处理文件路径的安全、高效且通用的方法。

因此,这是使用 pathlib.Path 对象从文件中单行读取 json 的另一种解决方案:

import json
from pathlib import Path

# Create the path object that points to the file which contains json formatted data
json_file_path = Path("mydata.json")

# One-liner for reading the data and converting it to a `dict`
data = json.loads(json_file_path.read_text())

print(data["two"])

免责声明:这个问题及其从文件中读取 重复 使用 pathlib.Path 对象时没有单行解决方案。


pathlib.Path objects are safe, efficient, and versatile way of dealing with file paths.

Hence, here is another solution for a one-liner read a json from file using pathlib.Path objects:

import json
from pathlib import Path

# Create the path object that points to the file which contains json formatted data
json_file_path = Path("mydata.json")

# One-liner for reading the data and converting it to a `dict`
data = json.loads(json_file_path.read_text())

print(data["two"])

Disclaimer: this question and its reading from file duplicate do not have the one liner solution when using pathlib.Path objects.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文