如何在Python中删除两个定界符之间的文本
我正在尝试在“分割”短语“分割”之后删除[]括号之间的所有文本:'请参阅File中的下面的摘要以获取上下文。
"annotations": [
{
"id": 1,
"image_id": 1,
"segmentation": [
[
621.63,
1085.67,
621.63,
1344.71,
841.66,
1344.71,
841.66,
1085.67
]
],
"iscrowd": 0,
"bbox": [
621.63,
1085.67,
220.02999999999997,
259.03999999999996
],
"area": 56996,
"category_id": 1124044
},
{
"id": 2,
"image_id": 1,
"segmentation": [
[
887.62,
1355.7,
887.62,
1615.54,
1114.64,
1615.54,
1114.64,
1355.7
]
],
"iscrowd": 0,
"bbox": [
887.62,
1355.7,
227.0200000000001,
259.8399999999999
],
"area": 58988,
"category_id": 1124044
},
{
"id": 3,
"image_id": 1,
"segmentation": [
[
1157.61,
1411.84,
1157.61,
1661.63,
1404.89,
1661.63,
1404.89,
1411.84
]
],
"iscrowd": 0,
"bbox": [
1157.61,
1411.84,
247.2800000000002,
249.7900000000002
],
"area": 61768,
"category_id": 1124044
},
........... and so on.....
最终,我只想在出现单词分割后删除方括号之间的所有文本。换句话说,输出看起来像(首先):
"annotations": [
{
"id": 1,
"image_id": 1,
"segmentation": [],
"iscrowd": 0,
"bbox": [
621.63,
1085.67,
220.02999999999997,
259.03999999999996
],
"area": 56996,
"category_id": 1124044
},
我尝试使用以下代码,但目前还没有运气。由于新线条,我有什么问题吗?
import re
f = open('samplfile.json')
text = f.read()
f.close()
clean = re.sub('"segmentation":(.*)\]', '', text)
print(clean)
f = open('cleanedfile.json', 'w')
f.write(clean)
f.close()
我感谢我在干净的线路中所拥有的确切位置可能不太正确,但是此代码目前尚未删除任何内容。
I am trying to remove all text between the [] brackets after the phrase '"segmentation":' Please see below snippet from file for context.
"annotations": [
{
"id": 1,
"image_id": 1,
"segmentation": [
[
621.63,
1085.67,
621.63,
1344.71,
841.66,
1344.71,
841.66,
1085.67
]
],
"iscrowd": 0,
"bbox": [
621.63,
1085.67,
220.02999999999997,
259.03999999999996
],
"area": 56996,
"category_id": 1124044
},
{
"id": 2,
"image_id": 1,
"segmentation": [
[
887.62,
1355.7,
887.62,
1615.54,
1114.64,
1615.54,
1114.64,
1355.7
]
],
"iscrowd": 0,
"bbox": [
887.62,
1355.7,
227.0200000000001,
259.8399999999999
],
"area": 58988,
"category_id": 1124044
},
{
"id": 3,
"image_id": 1,
"segmentation": [
[
1157.61,
1411.84,
1157.61,
1661.63,
1404.89,
1661.63,
1404.89,
1411.84
]
],
"iscrowd": 0,
"bbox": [
1157.61,
1411.84,
247.2800000000002,
249.7900000000002
],
"area": 61768,
"category_id": 1124044
},
........... and so on.....
I ultimately just want to delete all text between the square brackets after the word segmentation appears. In other words, the output to look like (for the first instance):
"annotations": [
{
"id": 1,
"image_id": 1,
"segmentation": [],
"iscrowd": 0,
"bbox": [
621.63,
1085.67,
220.02999999999997,
259.03999999999996
],
"area": 56996,
"category_id": 1124044
},
I've tried using the below code, but not quite having the luck currently. Is there something I am getting wrong due to the new lines?
import re
f = open('samplfile.json')
text = f.read()
f.close()
clean = re.sub('"segmentation":(.*)\]', '', text)
print(clean)
f = open('cleanedfile.json', 'w')
f.write(clean)
f.close()
I appreciate that the exact positioning I have for the [s in the clean line may not be quite right, but this code isn't removing anything at the moment.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
Python具有一个内置的
JSON
用于解析和修改JSON的模块。正则表达可能会脆弱,头痛比可能值得。您可以执行以下操作:
然后,
output.json
包含:Python has a built in
json
module for parsing and modifying JSON. A regular expression is likely to be fragile and more headache than it's probably worth.You can do the following:
Then,
output.json
contains:您的方法主要是正确的,但是Python Regrex不接受
\ n
为。
,要修复它,添加flags = re.dotall
作为一个re.sub()中的参数。顺便说一句,您可能需要在regrex中使用
\“
,而不是”
。Your approach is mostly correct, but Python regrex does not accept
\n
as.
, to fix it, addflags=re.DOTALL
as a parameter in re.sub().By the way, you may need to use
\"
instead of"
in regrex.