如何在Python中删除两个定界符之间的文本

发布于 2025-01-21 08:49:58 字数 2804 浏览 1 评论 0原文

我正在尝试在“分割”短语“分割”之后删除[]括号之间的所有文本:'请参阅File中的下面的摘要以获取上下文。

 "annotations": [
        {
            "id": 1,
            "image_id": 1,
            "segmentation": [
                [
                    621.63,
                    1085.67,
                    621.63,
                    1344.71,
                    841.66,
                    1344.71,
                    841.66,
                    1085.67
                ]
            ],
            "iscrowd": 0,
            "bbox": [
                621.63,
                1085.67,
                220.02999999999997,
                259.03999999999996
            ],
            "area": 56996,
            "category_id": 1124044
        },
        {
            "id": 2,
            "image_id": 1,
            "segmentation": [
                [
                    887.62,
                    1355.7,
                    887.62,
                    1615.54,
                    1114.64,
                    1615.54,
                    1114.64,
                    1355.7
                ]
            ],
            "iscrowd": 0,
            "bbox": [
                887.62,
                1355.7,
                227.0200000000001,
                259.8399999999999
            ],
            "area": 58988,
            "category_id": 1124044
        },
        {
            "id": 3,
            "image_id": 1,
            "segmentation": [
                [
                    1157.61,
                    1411.84,
                    1157.61,
                    1661.63,
                    1404.89,
                    1661.63,
                    1404.89,
                    1411.84
                ]
            ],
            "iscrowd": 0,
            "bbox": [
                1157.61,
                1411.84,
                247.2800000000002,
                249.7900000000002
            ],
            "area": 61768,
            "category_id": 1124044
        },
        ........... and so on.....

最终,我只想在出现单词分割后删除方括号之间的所有文本。换句话说,输出看起来像(首先):

"annotations": [
            {
                "id": 1,
                "image_id": 1,
                "segmentation": [],
                "iscrowd": 0,
                "bbox": [
                    621.63,
                    1085.67,
                    220.02999999999997,
                    259.03999999999996
                ],
                "area": 56996,
                "category_id": 1124044
            },

我尝试使用以下代码,但目前还没有运气。由于新线条,我有什么问题吗?

import re
f = open('samplfile.json')
text = f.read()
f.close()

clean = re.sub('"segmentation":(.*)\]', '', text)

print(clean)

f = open('cleanedfile.json', 'w')
f.write(clean)
f.close()

我感谢我在干净的线路中所拥有的确切位置可能不太正确,但是此代码目前尚未删除任何内容。

I am trying to remove all text between the [] brackets after the phrase '"segmentation":' Please see below snippet from file for context.

 "annotations": [
        {
            "id": 1,
            "image_id": 1,
            "segmentation": [
                [
                    621.63,
                    1085.67,
                    621.63,
                    1344.71,
                    841.66,
                    1344.71,
                    841.66,
                    1085.67
                ]
            ],
            "iscrowd": 0,
            "bbox": [
                621.63,
                1085.67,
                220.02999999999997,
                259.03999999999996
            ],
            "area": 56996,
            "category_id": 1124044
        },
        {
            "id": 2,
            "image_id": 1,
            "segmentation": [
                [
                    887.62,
                    1355.7,
                    887.62,
                    1615.54,
                    1114.64,
                    1615.54,
                    1114.64,
                    1355.7
                ]
            ],
            "iscrowd": 0,
            "bbox": [
                887.62,
                1355.7,
                227.0200000000001,
                259.8399999999999
            ],
            "area": 58988,
            "category_id": 1124044
        },
        {
            "id": 3,
            "image_id": 1,
            "segmentation": [
                [
                    1157.61,
                    1411.84,
                    1157.61,
                    1661.63,
                    1404.89,
                    1661.63,
                    1404.89,
                    1411.84
                ]
            ],
            "iscrowd": 0,
            "bbox": [
                1157.61,
                1411.84,
                247.2800000000002,
                249.7900000000002
            ],
            "area": 61768,
            "category_id": 1124044
        },
        ........... and so on.....

I ultimately just want to delete all text between the square brackets after the word segmentation appears. In other words, the output to look like (for the first instance):

"annotations": [
            {
                "id": 1,
                "image_id": 1,
                "segmentation": [],
                "iscrowd": 0,
                "bbox": [
                    621.63,
                    1085.67,
                    220.02999999999997,
                    259.03999999999996
                ],
                "area": 56996,
                "category_id": 1124044
            },

I've tried using the below code, but not quite having the luck currently. Is there something I am getting wrong due to the new lines?

import re
f = open('samplfile.json')
text = f.read()
f.close()

clean = re.sub('"segmentation":(.*)\]', '', text)

print(clean)

f = open('cleanedfile.json', 'w')
f.write(clean)
f.close()

I appreciate that the exact positioning I have for the [s in the clean line may not be quite right, but this code isn't removing anything at the moment.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

柒七 2025-01-28 08:49:58

Python具有一个内置的JSON用于解析和修改JSON的模块。正则表达可能会脆弱,头痛比可能值得。

您可以执行以下操作:

import json

with open('samplfile.json') as input_file, open('output.json', 'w') as output_file:
    data = json.load(input_file)
    for i in range(len(data['annotations'])):
        data['annotations'][i]['segmentation'] = []

    json.dump(data, output_file, indent=4)

然后,output.json包含:

{
    "annotations": [
        {
            "id": 1,
            "image_id": 1,
            "segmentation": [],
            "iscrowd": 0,
            "bbox": [
                621.63,
                1085.67,
                220.02999999999997,
                259.03999999999996
            ],
            "area": 56996,
            "category_id": 1124044
        },
        {
            "id": 2,
            "image_id": 1,
            "segmentation": [],
            "iscrowd": 0,
            "bbox": [
                887.62,
                1355.7,
                227.0200000000001,
                259.8399999999999
            ],
            "area": 58988,
            "category_id": 1124044
        },
        {
            "id": 3,
            "image_id": 1,
            "segmentation": [],
            "iscrowd": 0,
            "bbox": [
                1157.61,
                1411.84,
                247.2800000000002,
                249.7900000000002
            ],
            "area": 61768,
            "category_id": 1124044
        }
    ]
}

Python has a built in json module for parsing and modifying JSON. A regular expression is likely to be fragile and more headache than it's probably worth.

You can do the following:

import json

with open('samplfile.json') as input_file, open('output.json', 'w') as output_file:
    data = json.load(input_file)
    for i in range(len(data['annotations'])):
        data['annotations'][i]['segmentation'] = []

    json.dump(data, output_file, indent=4)

Then, output.json contains:

{
    "annotations": [
        {
            "id": 1,
            "image_id": 1,
            "segmentation": [],
            "iscrowd": 0,
            "bbox": [
                621.63,
                1085.67,
                220.02999999999997,
                259.03999999999996
            ],
            "area": 56996,
            "category_id": 1124044
        },
        {
            "id": 2,
            "image_id": 1,
            "segmentation": [],
            "iscrowd": 0,
            "bbox": [
                887.62,
                1355.7,
                227.0200000000001,
                259.8399999999999
            ],
            "area": 58988,
            "category_id": 1124044
        },
        {
            "id": 3,
            "image_id": 1,
            "segmentation": [],
            "iscrowd": 0,
            "bbox": [
                1157.61,
                1411.84,
                247.2800000000002,
                249.7900000000002
            ],
            "area": 61768,
            "category_id": 1124044
        }
    ]
}
久随 2025-01-28 08:49:58

您的方法主要是正确的,但是Python Regrex不接受\ n,要修复它,添加flags = re.dotall作为一个re.sub()中的参数。

顺便说一句,您可能需要在regrex中使用\“,而不是

Your approach is mostly correct, but Python regrex does not accept \n as ., to fix it, add flags=re.DOTALL as a parameter in re.sub().

By the way, you may need to use \" instead of " in regrex.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文