使用正则表达式替换 html 字体标签

发布于 2024-10-06 21:41:21 字数 1234 浏览 3 评论 0原文

我想替换字符串中的 HTML 标记(所有出现的情况)。

示例字符串:

Line1<div><font class="blablabla" color="#33FF33">Line2</font></div><div>Line3

或:

Line1<div><font color="#33FF33">Line2</font></div><div><font color="#FF3300">Li</font>ne3

字体(starttag)应替换为颜色,以便根据我们得到的两个示例:

Line1<div>33FF33Line2</font></div><div>Line3
Line1<div>33FF33Line2</font></div><div>FF3300Li</font>ne3

我尝试了以下方法(除其他外:P):

preg_replace('/<font.*color="#([0-9a-fA-F]){6}">/', '{1}', $string)

我认为我的方向是正确的,然而我认为它更像是那么近却又那么远:)

当我在只有 1 个字体标签的字符串上使用它时,它会删除字体标签(我一定是用替换的 {1} 弄乱了一些东西)。 当我在包含多个字体标签的字符串上使用它时,它会执行相同的操作。但不仅删除第一个字体标签,还删除从第一个字体标签到下一个(或最后一个)字体标签的所有内容。

好的。

让我们暂时忘掉 HTML 代码解析讨论。

如果我有以下文本怎么办:

This colorcode (#333333) is so cool
This colorcode (orange: #ff3300) is way cooler

我希望文本变成:

This colorcode 333333 is so cool
This colorcode ff3300 is way cooler

与我看到的情况相同,还是我现在无知?

I would like to replace (all occurences) of the HTML <font>-tag in a string.

Example string:

Line1<div><font class="blablabla" color="#33FF33">Line2</font></div><div>Line3

or:

Line1<div><font color="#33FF33">Line2</font></div><div><font color="#FF3300">Li</font>ne3

The font (starttag) should be replaced by the color, so that based on the two examples we get:

Line1<div>33FF33Line2</font></div><div>Line3
Line1<div>33FF33Line2</font></div><div>FF3300Li</font>ne3

I've tried the following (among others :P):

preg_replace('/<font.*color="#([0-9a-fA-F]){6}">/', '{1}', $string)

I think I'm in the right direction, however I think it's more like so close yet so far away :)

When I use it on the string with only 1 fonttag in it, it removes the font tag (I must have messed something up with the replacement {1}).
When I use it on the string with multiple fonttags in it, it does the same. But not only removing the first fonttag but everything from the first fonttag to the next (or last) fonttag.

Ok.

Let's just forget about the HTML code parsing discussion for a sec.

What if I had the following texts:

This colorcode (#333333) is so cool
This colorcode (orange: #ff3300) is way cooler

And I wanted the texts to become:

This colorcode 333333 is so cool
This colorcode ff3300 is way cooler

Same situation as I see it, or am I being ignorant now?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

左岸枫 2024-10-13 21:41:21
preg_replace('~<font[^>]*\scolor="#([0-9a-fA-F]{6})"[^>]*>~', '$1', $string);

* 和其他量词默认是贪婪的,这就是为什么你会意外地收缩带有多个字体标签的字符串;只是匹配太多了。您可以通过添加问号 (.*?) 来使它们不贪婪,但其他因素仍然可能导致它们消耗超出您想要的数量。在这种情况下,最好使用更具体的表达式 ([^>]*),该表达式无法匹配超出其开头的标记。

除此之外,在您发布的代码中,您使用的是 < code>{1} 而不是 $1 作为反向引用,并且量词 ({6}) 位于括号之外,因此您只能捕获最后一位数字,不是您想要的全部六位。该代码不应该返回您发布的结果,更不用说正确的结果了。

至于您更新的问题:

preg_replace('~\([^)]*#([0-9a-fA-F]{6})[^)]*\)~', '$1', $string);
preg_replace('~<font[^>]*\scolor="#([0-9a-fA-F]{6})"[^>]*>~', '$1', $string);

* and other quantifiers are greedy by default, which is why you got the unintended contraction of the string with multiple font tags; it's just matching too much. You can make them non-greedy by adding a question mark (.*?), but other factors can still cause them to consume more than you want. It's better in this case to use a more specific expression ([^>]*) that can't match beyond the tag it starts in.

Besides that, in the code you posted you were using {1} instead of $1 for the backreference, and you had the quantifier ({6}) outside the parentheses, so you would only ever capture the last digit, not all six as you intended. That code shouldn't have returned the result you posted, to say nothing of the correct result.

As for your updated question:

preg_replace('~\([^)]*#([0-9a-fA-F]{6})[^)]*\)~', '$1', $string);
终难遇 2024-10-13 21:41:21

RegEx 很好而且很方便,但我怀疑你是否可以使用 RegEx 捕获所有情况。字符串中的标签等怎么样?

我编写了一些蜘蛛代码,最终只是逐个元素地解析整个文档。这是我发现使它可靠的唯一方法。

请参阅:http://blackbeltcoder.com/Articles/strings/parsing- html-tags-in-c/

RegEx is nice and convenient, but I would question whether or not you could catch every case using RegEx. What about tags within strings, etc?

I wrote some spidering code and ended up just parsing the entire document, element by element. That was the only way I found to make it reliable.

See: http://blackbeltcoder.com/Articles/strings/parsing-html-tags-in-c/

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文