如何通过python删除大括号包围的块

发布于 2024-08-16 04:20:55 字数 600 浏览 4 评论 0原文

示例文本：字符串 -> rev 标签内的内容（通过 lxml）。

我正在尝试删除文本中的 {{BLOCKS}}。

我使用以下正则表达式来删除简单的单行块：

p = re.compile('\{\{*.*\}\}')
nonBracketedString = p.sub('', bracketedString)

但是，这不会删除内容开头的第一个多行括号部分。如何删除多行大括号块？

编辑：

答案的解决方案：

p = re.compile('\{\{*?.*?\}\}', re.DOTALL)
nonBracketedString = p.sub('', bracketedString)

原文

Sample text: String -> content within the rev tag (via lxml).

I'm trying to remove the {{BLOCKS}} within the text.

I've used the following regex to remove simple, one line blocks:

p = re.compile('\{\{*.*\}\}')
nonBracketedString = p.sub('', bracketedString)

However this does not remove the first multi line bracketed section at the beginning of the content. How can one remove the multi-line, curly bracketed blocks?

EDIT:

Solution from answer:

p = re.compile('\{\{*?.*?\}\}', re.DOTALL)
nonBracketedString = p.sub('', bracketedString)

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

波浪屿的海角声 2024-08-23 04:20:55

设置 dotall 标志。

p = re.compile('\{\{*.*?\}\}', re.DOTALL)
nonBracketedString = p.sub('', bracketedString)

在默认模式下，. 匹配除换行符之外的任何字符。如果指定了 DOTALL 标志，则它匹配包括换行符在内的任何字符。

http://docs.python.org/library/re.html

另外，您括号之间需要非贪婪匹配：.*?

Set the dotall flag.

p = re.compile('\{\{*.*?\}\}', re.DOTALL)
nonBracketedString = p.sub('', bracketedString)

In the default mode, . matches any character except a newline. If the DOTALL flag has been specified, this matches any character including a newline.

http://docs.python.org/library/re.html

Also, you'll need a non-greedy match between the brackets: .*?

回复收藏 0 原文

妥活 2024-08-23 04:20:55

>>> import urllib2
>>> import re
>>> s = "".join(urllib2.urlopen('http://en.wikipedia.org/w/api.php?action=query&prop=revisions&titles=Italian%20War%20of%201542-1546&redirects&rvprop=content&format=xml').readlines())
>>> p = re.compile('\{\{.*?\}\}', re.DOTALL)
>>> re.sub(p, '', s)
'<?xml version="1.0"?><api><query><redirects><r from="Italian War of 1542-1546" to="Italian War of 1542\xe2\x80\x931546" /></redirects><pages><page pageid="3719774" ns="0" title="Italian War of 1542\xe2\x80\x931546"><revisions><rev xml:space="preserve">\n\n\n\nThe \'\'\'Italian War of 1542\xe2\x80\x9346\'\'\' was a conflict late in the [[Italian Wars]], ...

我在这里截断了输出，但足以看出它正在工作。

>>> import urllib2
>>> import re
>>> s = "".join(urllib2.urlopen('http://en.wikipedia.org/w/api.php?action=query&prop=revisions&titles=Italian%20War%20of%201542-1546&redirects&rvprop=content&format=xml').readlines())
>>> p = re.compile('\{\{.*?\}\}', re.DOTALL)
>>> re.sub(p, '', s)
'<?xml version="1.0"?><api><query><redirects><r from="Italian War of 1542-1546" to="Italian War of 1542\xe2\x80\x931546" /></redirects><pages><page pageid="3719774" ns="0" title="Italian War of 1542\xe2\x80\x931546"><revisions><rev xml:space="preserve">\n\n\n\nThe \'\'\'Italian War of 1542\xe2\x80\x9346\'\'\' was a conflict late in the [[Italian Wars]], ...

I've truncated the output here, but there's enough to see that it's working.

回复收藏 0 原文

俏︾媚 2024-08-23 04:20:55

设置 dotall 标志——这允许 .匹配换行符。

p = re.compile('\{\{*.*\}\}', re.DOTALL)
nonBracketedString = p.sub('', bracketedString)

Set the dotall flag-- this allows . to match newlines.

p = re.compile('\{\{*.*\}\}', re.DOTALL)
nonBracketedString = p.sub('', bracketedString)

回复收藏 0 原文

~没有更多了~

关于作者

紫南

暂无简介

0 文章

0 评论

22 人气

关注发私信

1CH1MKgiKxn9p

文章 0 评论 0

关注

ゞ记忆︶ㄣ

文章 0 评论 0

关注

JackDx

文章 0 评论 0

关注

信远

文章 0 评论 0

关注

yaoduoduo1995

文章 0 评论 0

关注

霞映澄塘

文章 0 评论 0

友情链接

文江博客

如何通过python删除大括号包围的块

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

关于作者

相关话题

热门标签

推荐作者

1CH1MKgiKxn9p

ゞ记忆︶ㄣ

JackDx

信远

yaoduoduo1995

霞映澄塘

友情链接

如何通过python删除大括号包围的块

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

关于作者

相关话题

热门标签

推荐作者

1CH1MKgiKxn9p

ゞ记忆︶ㄣ

JackDx

信远

yaoduoduo1995

霞映澄塘

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。