如何在新文件中将Json null值写入为空行(将基于json的日志转换为列格式,即每列一个文件)

发布于 2025-01-09 00:17:42 字数 646 浏览 1 评论 0 原文

日志文件示例:

{"timestamp": "2022-01-14T00:12:21.000", "Field1": 10, "Field_Doc": {"f1": 0}}
{"timestamp": "2022-01-18T00:15:51.000", "Field_Doc": {"f1": 0, "f2": 1.7, "f3": 2}}

它将生成 5 个文件:

1.timestamp.column

2.Field1.column

3.Field_Doc.f1.column

4.Field_Doc.f2.column

5.Field_Doc.f3.column

timestamp.column 的示例内容:

2022-01-14T00:12:21.000
2022-01-18T00:15:51.000

I当键的值为 null 时遇到问题,例如当值 us 为 null 时未定义:

{"timestamp": "2022-01-14T00:12:21.000", "Field1": null, "Field_Doc": {"f1": undefined}}

有人可以帮助我吗?

example of log file:

{"timestamp": "2022-01-14T00:12:21.000", "Field1": 10, "Field_Doc": {"f1": 0}}
{"timestamp": "2022-01-18T00:15:51.000", "Field_Doc": {"f1": 0, "f2": 1.7, "f3": 2}}

It will generate 5 files:

1.timestamp.column

2.Field1.column

3.Field_Doc.f1.column

4.Field_Doc.f2.column

5.Field_Doc.f3.column

Example content of timestamp.column:

2022-01-14T00:12:21.000
2022-01-18T00:15:51.000

I'm facing a problem while the values of keys are null, undefined as when the value us is null for example:

{"timestamp": "2022-01-14T00:12:21.000", "Field1": null, "Field_Doc": {"f1": undefined}}

can someone help me out here?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

夏末染殇 2025-01-16 00:17:43

请注意,输入文件实际上是 NDJSON。请参阅文档

话虽这么说,因为 furas 已经给出了 请参阅 PyPI

他的代码只需进行最少的调整即可处理未定义边缘情况。 null 值是一个有效的 JSON 值,因此他的代码不会因此而中断。

您可以在执行 json.loads() 时通过 string.replace() 轻松修复此问题,使其成为有效的 JSON,然后您可以在编写时检查是否 value == None 将值替换为空字符串。请注意,None 是 JSON 的 null 的 Python 等效项。

请注意替换函数中包含 : ,这是为了防止漏报...

主循环逻辑

for line in file_obj:
    # the replace function makes it valid JSON
    data = json.loads(line.replace(': undefined', ': null'))
    print(data)
    process_dict(data, write_func)

write_func() 函数调整

def write_func(key, value):
    with open(key + '.column', "a") as f:
        # if the value == None, make it an empty string.
        if value == None:
            value = ''
        f.write(str(value) + "\n")

我使用以下作为输入字符串:

{"timestamp": "2022-01-14T00:12:21.000", "Field1": 10, "Field_Doc": {"f1": 0}}
{"timestamp": "2022-01-18T00:15:51.000", "Field_Doc": {"f1": 0, "f2": 1.7, "f3": 2}}
{"timestamp": "2022-01-14T00:12:21.000", "Field1": null, "Field_Doc": {"f1": undefined}}

Note, the input file is actually an NDJSON. See the docs.

That being said, since furas already gave an excellent answer on how to process the NDJSON logfile I'm going to skip that part. Do note that there's a library to deal with NDJSON files. See PyPI.

His code needs minimal adjustment to deal with the undefined edge case. The null value is a valid JSON value, so his code doesn't break on that.

You can fix this easily by a string.replace() while doing the json.loads() so it becomes valid JSON, and then you can check while writing if value == None to replace the value with an empty string. Note that None is the python equivalent of JSON's null.

Please note the inclusion of : in the replace function, it's to prevent false negatives...

main loop logic

for line in file_obj:
    # the replace function makes it valid JSON
    data = json.loads(line.replace(': undefined', ': null'))
    print(data)
    process_dict(data, write_func)

write_func() function adjustment

def write_func(key, value):
    with open(key + '.column', "a") as f:
        # if the value == None, make it an empty string.
        if value == None:
            value = ''
        f.write(str(value) + "\n")

I used the following as the input string:

{"timestamp": "2022-01-14T00:12:21.000", "Field1": 10, "Field_Doc": {"f1": 0}}
{"timestamp": "2022-01-18T00:15:51.000", "Field_Doc": {"f1": 0, "f2": 1.7, "f3": 2}}
{"timestamp": "2022-01-14T00:12:21.000", "Field1": null, "Field_Doc": {"f1": undefined}}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文