python re.search (regex) 只搜索具有类似 {{world}} 模式的单词

发布于 2024-11-04 17:10:12 字数 404 浏览 1 评论 0原文

我在 HTML 文件中插入了自定义标签,例如 {{name}}{{surname}}。现在我想搜索与 {{world}} 等模式完全匹配的标签,甚至不包括 {world}}{{world} >、{world}{ word }{{ world }} 等。 我编写了一段小代码,

re.findall(r'\{(\w.+?)\}', html_string)

它返回遵循模式 {{world}} ,{world},{world}} 的单词 我不想要的。我想完全匹配 {{world}}。有人可以指导我吗?

I have on HTML file in which I have inserted the custom tags like {{name}}, {{surname}}. Now I want to search the tags who exactly match the pattern like {{world}} only not even {world}}, {{world}, {world}, { word }, {{ world }}, etc.
I wrote the small code for the

re.findall(r'\{(\w.+?)\}', html_string)

It returns the words which follow the pattern {{world}} ,{world},{world}}
that I don't want. I want to match exactly the {{world}}. Can anybody please guide me?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

紅太極 2024-11-11 17:10:12

嗯,正则表达式不应该是:

'\{\{(\w.+?)\}\}'

好的,在评论之后,我更了解您的要求:

'\{\{\w+?\}\}'

应该适合您。

基本上,您需要 {{any nnumber 个单词字符,包括下划线}}。在这种情况下,实际上您甚至不需要惰性匹配,因此您可以删除表达式中的 ?

{{keyword1}} other stuff {{keyword2}} 这样的东西现在不会作为一个整体匹配。

要仅获取关键字而不获取 {{}},请使用以下命令:

'(?<=\{\{)\w+?(?=\}\})'

Um, shouldn't the regex be:

'\{\{(\w.+?)\}\}'

Ok, after the comments, I understand your requirements more:

'\{\{\w+?\}\}'

should work for you.

Basically, you want {{any nnumber of word characters including underscore}}. You don't even need the lazy match in this case actually so you may remove th ? in the expression.

Something like {{keyword1}} other stuff {{keyword2}} will not match as a whole now.

To get only the keyword without getting the {{}} use below:

'(?<=\{\{)\w+?(?=\}\})'
眼眸印温柔 2024-11-11 17:10:12

这个怎么样?

re.findall('{{(\w+)}}', html_string)

或者,如果您希望结果中包含大括号:

re.findall('({{\w+}})', html_string)

不过,如果您尝试完成 html 模板化,我建议使用 良好的模板引擎

How about this?

re.findall('{{(\w+)}}', html_string)

Or, if you want the curly braces included in the results:

re.findall('({{\w+}})', html_string)

If you're trying to accomplish html templating, though, I recommend using a good template engine.

死开点丶别碍眼 2024-11-11 17:10:12

这将与您的结果中的大括号不匹配,您想要吗?

'\{\{(\w[^\{\}]+?)\}\}'

http://rubular.com/r/79YwR13MS0

This will match no curly braces within your result, do you want that?

'\{\{(\w[^\{\}]+?)\}\}'

http://rubular.com/r/79YwR13MS0

我只土不豪 2024-11-11 17:10:12

如果您想匹配双花括号,您应该在正则表达式中指定它们:

re.findall(r'\{\{(\w[^}]?)\}\}', html_string)

If you want to match doubled curly brackets, you should specify them in your regex:

re.findall(r'\{\{(\w[^}]?)\}\}', html_string)
猥︴琐丶欲为 2024-11-11 17:10:12

您说其他答案不起作用,但它们似乎对我来说:

>>> import re
>>> html_string = '{{realword}} {fake1}} {{fake2} {fake3} fake4'
>>> re.findall(r'\{\{(\w.+?)\}\}', html_string)
['realword']

如果它对您不起作用,您将需要提供更多详细信息。

编辑:以下怎么样?去掉点 (.) 并仅使用 \w 还允许您使用贪婪限定符,并适用于注释中的示例 HTML:

>>> html_string = 'html>\n <head>\n </head>\n <title>\n </title>\n <body>\n <h1>\n T - Shirts\n </h1>\n <img src="March-Tshirts/skull_headphones_tshirt.jpg" />\n <img src="/March-Tshirts/star-wars-t-shirts-6.jpeg" />\n <h2>\n we - we - we\n </h2>\n {{unsubscribe}} -- {{tracking_beacon} -- {web_url}} -- {name} \n </body>\n</html>\n'
>>> re.findall(r'\{\{(\w+)\}\}', html_string)
['unsubscribe']

\w 匹配字母数字字符和下划线;如果您需要匹配更多字符,您可以将其添加到一个集合中(例如, [\w\+] 也可以匹配加号)。

You say the other answers don't work, but they seem to for me:

>>> import re
>>> html_string = '{{realword}} {fake1}} {{fake2} {fake3} fake4'
>>> re.findall(r'\{\{(\w.+?)\}\}', html_string)
['realword']

If it doesn't work for you, you'll need to give more details.

Edit: How about the following? Getting rid of the dot (.) and using only \w also allows you to use greedy qualifiers and works for the example HTML from your comment:

>>> html_string = 'html>\n <head>\n </head>\n <title>\n </title>\n <body>\n <h1>\n T - Shirts\n </h1>\n <img src="March-Tshirts/skull_headphones_tshirt.jpg" />\n <img src="/March-Tshirts/star-wars-t-shirts-6.jpeg" />\n <h2>\n we - we - we\n </h2>\n {{unsubscribe}} -- {{tracking_beacon} -- {web_url}} -- {name} \n </body>\n</html>\n'
>>> re.findall(r'\{\{(\w+)\}\}', html_string)
['unsubscribe']

The \w matches alphanumeric characters and the underscore; if you need to match more characters you could add it to a set (e.g., [\w\+] to also match the plus sign).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文