仅捕获字符串的一部分而不进行格式化

发布于 2024-12-13 12:20:20 字数 503 浏览 2 评论 0原文

我试图仅捕获 & 之间的数字 标签，没有 &使用基本正则表达式的 标记。我尝试过想办法，也许是环顾四周，但我只是还没有那么熟练。这是原始 HTML 的示例：

<em>4<b>4</b>9/<b>5</b>-<b>7</b>0</em>

这是我想要的结果：

问题是有时这些字符串具有格式化 HTML，有时则没有。有时有额外的 - 和 / 符号，有时没有。我正在使用 .*<\/em> 这非常简单！

感谢您的帮助：）

原文

I'm trying to capture only the digits between the  &  tags, without the  &  tags using basic regex. I've tried to think of ways, maybe lookarounds, but I'm just not that skilled...yet. Here's an example of the raw HTML:

<em>4<b>4</b>9/<b>5</b>-<b>7</b>0</em>

Here is what I'd like the result to be:

The problem is sometimes these strings have the formatting HTML, and sometimes not. Sometimes there are extra - and / symbols, sometimes not. I'm using .*<\/em> which is about as simple as it gets!

Thanks for your help :)

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

柏拉图鍀咏恒 2024-12-20 12:20:20

正如之前所说，正则表达式可能不是最简单的解决方案。但是，如果您确实想使用它，那么您最好分两遍进行：

echo "<em>4<b>4</b>9/<b>5</b>-<b>7</b>0</em>" | sed 's|<[^>]\+>||g' | sed 's|[^0-9]||g'

第一个 sed 操作删除所有 html 标签。第二个删除所有非数字字符。

As has been said before, regex is probably not the easiest solution for this. But, if you really want to use it then you're probably best doing it in two passes:

echo "<em>4<b>4</b>9/<b>5</b>-<b>7</b>0</em>" | sed 's|<[^>]\+>||g' | sed 's|[^0-9]||g'

The first sed operation removes all html tags. The second removes all non-numeric characters.

回复收藏 0 原文

柠檬心 2024-12-20 12:20:20

第一：一如既往，您可能不应该在 html 上使用正则表达式。总会有一些边缘情况它无法捕获。

如果您使用某种纯正则表达式，则情况更是如此，并且由于您没有指定其他任何内容，因此我假设这就是您正在使用的。所以真的，不要使用正则表达式。

也就是说，我会将其作为两个正则表达式来执行 - 捕获字符串，然后从捕获的字符串中子出您不需要的任何标签（请记住使用非贪婪匹配来匹配它们！）

回复收藏 0 原文

晨曦慕雪 2024-12-20 12:20:20

例如，如果您使用的是 javascript，请尝试以下操作：

var str = "<em>4<b>4</b>9<b>5</b><b>7</b>0</em>";
str.replace(/<em>([^e]+)<\/em>/g, function(match, emInner) {
  console.log(emInner.replace(/[^0-9]/g, ''));
});

此打印 449570。

E.g. if you're in javascript, try this:

var str = "<em>4<b>4</b>9<b>5</b><b>7</b>0</em>";
str.replace(/<em>([^e]+)<\/em>/g, function(match, emInner) {
  console.log(emInner.replace(/[^0-9]/g, ''));
});

This prints 449570.

回复收藏 0 原文