python re.search (regex) 只搜索具有类似 {{world}} 模式的单词

发布于 2024-11-04 17:10:12 字数 404 浏览 1 评论 0原文

我在 HTML 文件中插入了自定义标签，例如 {{name}}、{{surname}}。现在我想搜索与 {{world}} 等模式完全匹配的标签，甚至不包括 {world}}、{{world} >、{world}、{ word }、{{ world }} 等。我编写了一段小代码，

re.findall(r'\{(\w.+?)\}', html_string)

它返回遵循模式 {{world}} ,{world},{world}} 的单词我不想要的。我想完全匹配 {{world}}。有人可以指导我吗？

原文

I have on HTML file in which I have inserted the custom tags like {{name}}, {{surname}}. Now I want to search the tags who exactly match the pattern like {{world}} only not even {world}}, {{world}, {world}, { word }, {{ world }}, etc.
I wrote the small code for the

re.findall(r'\{(\w.+?)\}', html_string)

It returns the words which follow the pattern {{world}} ,{world},{world}}
that I don't want. I want to match exactly the {{world}}. Can anybody please guide me?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

紅太極 2024-11-11 17:10:12

嗯，正则表达式不应该是：

'\{\{(\w.+?)\}\}'

好的，在评论之后，我更了解您的要求：

'\{\{\w+?\}\}'

应该适合您。

基本上，您需要 {{any nnumber 个单词字符，包括下划线}}。在这种情况下，实际上您甚至不需要惰性匹配，因此您可以删除表达式中的 ? 。

像 {{keyword1}} other stuff {{keyword2}} 这样的东西现在不会作为一个整体匹配。

要仅获取关键字而不获取 {{}}，请使用以下命令：

'(?<=\{\{)\w+?(?=\}\})'

Um, shouldn't the regex be:

'\{\{(\w.+?)\}\}'

Ok, after the comments, I understand your requirements more:

'\{\{\w+?\}\}'

should work for you.

Basically, you want {{any nnumber of word characters including underscore}}. You don't even need the lazy match in this case actually so you may remove th ? in the expression.

Something like {{keyword1}} other stuff {{keyword2}} will not match as a whole now.

To get only the keyword without getting the {{}} use below:

'(?<=\{\{)\w+?(?=\}\})'

回复收藏 0 原文

眼眸印温柔 2024-11-11 17:10:12

这个怎么样？

re.findall('{{(\w+)}}', html_string)

或者，如果您希望结果中包含大括号：

re.findall('({{\w+}})', html_string)

不过，如果您尝试完成 html 模板化，我建议使用良好的模板引擎。

How about this?

re.findall('{{(\w+)}}', html_string)

Or, if you want the curly braces included in the results:

re.findall('({{\w+}})', html_string)

If you're trying to accomplish html templating, though, I recommend using a good template engine.

回复收藏 0 原文

死开点丶别碍眼 2024-11-11 17:10:12

这将与您的结果中的大括号不匹配，您想要吗？

'\{\{(\w[^\{\}]+?)\}\}'

http://rubular.com/r/79YwR13MS0

This will match no curly braces within your result, do you want that?

'\{\{(\w[^\{\}]+?)\}\}'

http://rubular.com/r/79YwR13MS0

回复收藏 0 原文

我只土不豪 2024-11-11 17:10:12

如果您想匹配双花括号，您应该在正则表达式中指定它们：

re.findall(r'\{\{(\w[^}]?)\}\}', html_string)

If you want to match doubled curly brackets, you should specify them in your regex:

re.findall(r'\{\{(\w[^}]?)\}\}', html_string)

回复收藏 0 原文

猥︴琐丶欲为 2024-11-11 17:10:12

您说其他答案不起作用，但它们似乎对我来说：

>>> import re
>>> html_string = '{{realword}} {fake1}} {{fake2} {fake3} fake4'
>>> re.findall(r'\{\{(\w.+?)\}\}', html_string)
['realword']

如果它对您不起作用，您将需要提供更多详细信息。

编辑：以下怎么样？去掉点 (.) 并仅使用 \w 还允许您使用贪婪限定符，并适用于注释中的示例 HTML：

>>> html_string = 'html>\n <head>\n </head>\n <title>\n </title>\n <body>\n <h1>\n T - Shirts\n </h1>\n <img src="March-Tshirts/skull_headphones_tshirt.jpg" />\n <img src="/March-Tshirts/star-wars-t-shirts-6.jpeg" />\n <h2>\n we - we - we\n </h2>\n {{unsubscribe}} -- {{tracking_beacon} -- {web_url}} -- {name} \n </body>\n</html>\n'
>>> re.findall(r'\{\{(\w+)\}\}', html_string)
['unsubscribe']

\w 匹配字母数字字符和下划线；如果您需要匹配更多字符，您可以将其添加到一个集合中（例如， [\w\+] 也可以匹配加号）。

You say the other answers don't work, but they seem to for me:

>>> import re
>>> html_string = '{{realword}} {fake1}} {{fake2} {fake3} fake4'
>>> re.findall(r'\{\{(\w.+?)\}\}', html_string)
['realword']

If it doesn't work for you, you'll need to give more details.

Edit: How about the following? Getting rid of the dot (.) and using only \w also allows you to use greedy qualifiers and works for the example HTML from your comment:

>>> html_string = 'html>\n <head>\n </head>\n <title>\n </title>\n <body>\n <h1>\n T - Shirts\n </h1>\n <img src="March-Tshirts/skull_headphones_tshirt.jpg" />\n <img src="/March-Tshirts/star-wars-t-shirts-6.jpeg" />\n <h2>\n we - we - we\n </h2>\n {{unsubscribe}} -- {{tracking_beacon} -- {web_url}} -- {name} \n </body>\n</html>\n'
>>> re.findall(r'\{\{(\w+)\}\}', html_string)
['unsubscribe']

The \w matches alphanumeric characters and the underscore; if you need to match more characters you could add it to a set (e.g., [\w\+] to also match the plus sign).

回复收藏 0 原文

~没有更多了~