将粗体/强标签包裹在内容片段中第一次出现的关键字周围？

发布于 2024-10-15 03:36:02 字数 783 浏览 4 评论 0原文

我正在寻找一种最简单的方法，当该短语未出现在标题标签中或作为 html 属性值时，将粗体标签包裹在预定义关键字短语的第一次出现周围。找到第一个匹配项后，退出例程。

例如，如果关键字是“blue widgets”，内容是：

blue widgets and accessories for blue widgets can be found here

那么在例程过滤内容之后，它将返回：

<b>blue widgets</b> and accessories for blue widgets can be found here

但是，如果单词“blue widgets”第一次出现在属性或标题标记中，它会跳过那些并转到下一个。例如，

<img src="foo.png" title="A site about blue widgets" alt="blue-widget" />
<h2>This is a site about blue widgets</h2>
<p>We've got lots of blue widgets and blue widget accessories...

在上面的内容中，只有句子“We’ve got much of blue widgets and blue widget Accessories”中出现的关键字会被加粗。

有人能给我举个例子来说明如何做到这一点吗？

原文

I'm looking for the simplest way to wrap bold tags around the first appearance of a predefined keyword phrase, when that phrase does not appear in a heading tag or as an html attribute value. After the first match is found, exits the routine.

For example, if the keyword is "blue widgets", and the content was:

blue widgets and accessories for blue widgets can be found here

Then after the routine filters the content, it would return:

<b>blue widgets</b> and accessories for blue widgets can be found here

However, if the first occurrence of the word "blue widgets" were in an attribute or a heading tag, it would skip over those and go to the next one. For example,

<img src="foo.png" title="A site about blue widgets" alt="blue-widget" />
<h2>This is a site about blue widgets</h2>
<p>We've got lots of blue widgets and blue widget accessories...

In the above content, only the appearance of the keyword in the sentence "We've got lots of blue widgets and blue widget accessories"... would be bolded.

Can someone give me an example of how this can be done?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

糖粟与秋泊 2024-10-22 03:36:02

如果您仍在考虑使用正则表达式，请查看以下内容：

$source = <<<EOS
<img src="foo.png" title="A site about blue widgets" alt="blue-widget" />
<h2>This is a site about blue widgets</h2>
<p>We've got lots of blue widgets and blue widget accessories...';
EOS;

$term = 'blue widgets';

// convert search term to valid regex
$term0 = preg_replace(array('~\A\b~', '~\b\z~', '~\s+~'), 
                      array('\b', '\b', '\s+'),
                      preg_quote(trim($term), '~'));

$regex = <<<EOR
~\A   # anchoring at string start ensures only one match can occur
(?>
   <(h[1-6])[^>]*>.*?</\\1>   # a complete h<n> element
 | </?\w+[^>]*+>              # any other tag
 | (?:(?!<|{$term0}).)*+      # anything else, but stop before '<' or the search term
)*+
\K    # pretend the match really started here; only the next part gets replaced
{$term0}
~isx
EOR;

echo preg_replace($regex, "<strong>$0</strong>", $source);

在 ideone.com 上运行它< /strong>

我什至不确定是否可以使用正则表达式来做到这一点，这就是为什么我不厌其烦地解决这个问题。尽管这个解决方案很糟糕，但它是我能做到的最简单的。为此，我必须忽略许多可能破坏它的因素，例如 CDATA 部分、SGML 注释、

尽管这很有趣，但我希望它能说服您一劳永逸地忘记正则表达式并使用专用工具，正如其他响应者所建议的那样。

If you're still thinking about using a regex, check this out:

$source = <<<EOS
<img src="foo.png" title="A site about blue widgets" alt="blue-widget" />
<h2>This is a site about blue widgets</h2>
<p>We've got lots of blue widgets and blue widget accessories...';
EOS;

$term = 'blue widgets';

// convert search term to valid regex
$term0 = preg_replace(array('~\A\b~', '~\b\z~', '~\s+~'), 
                      array('\b', '\b', '\s+'),
                      preg_quote(trim($term), '~'));

$regex = <<<EOR
~\A   # anchoring at string start ensures only one match can occur
(?>
   <(h[1-6])[^>]*>.*?</\\1>   # a complete h<n> element
 | </?\w+[^>]*+>              # any other tag
 | (?:(?!<|{$term0}).)*+      # anything else, but stop before '<' or the search term
)*+
\K    # pretend the match really started here; only the next part gets replaced
{$term0}
~isx
EOR;

echo preg_replace($regex, "<strong>$0</strong>", $source);

run it on ideone.com

I wasn't even sure it was possible to do this with a regex, which is why I went to the trouble of working it out. Hideous as this solution is, it's about as simple as I could make it. And to do that I had to ignore many factors that can break it--things like CDATA sections, SGML comments, <script> elements, and angle brackets in attribute values, to name a few. And that's just in valid HTML.

Fun as this was, I hope it persuades you once and for all to forget about regexes and use a dedicated tool, as the other responders advised.

回复收藏 0 原文

~没有更多了~