PHP Preg_替换标签之间的数据，尊重文档中的其他标签

发布于 2024-10-22 22:07:08 字数 1350 浏览 6 评论 0原文

对此可能有一个非常简单的答案，但我想尽可能详细，这样您就不需要我来澄清。

我正在尝试收集每个内容

<content><div>CONTENT</div></content>

需要作为反向引用返回（$1）。内容和 div 都有不同的参数（例如 style="color:white;"）。这些参数并不重要，但仍然存在。

复杂的是 div 可能包含子 div。这些并不重要，但与我当前的正则表达式冲突 - 提前停止比赛。

这是代码示例，想象一下这个复制/粘贴多次并且格式不同。

<entry> 
<title>A general title of a post</title> 
<content type="xhtml"> 
    <div xmlns="http://www.w3.org/1999/xhtml"> 
    This is a description of the title. It may <b>contain bold text</b> or <div>even divs</div>, and everything else. It is not quite important to save these tags, but they exist nonetheless.
    </div> 
</content> 
</entry>

目前，我正在使用两个正则表达式代码。一份用于声明，一份用于结束标签。这可行，但现在我需要对内容执行代码。因此，我将使用 preg_replace_callback()，但我不知道如何连接两者以便中间是回调。

声明：

<content \w+\s*=\s*\".*?\">[\r\n\s]{0,}<div \w+\s*=\s*\".*?\">

结束语：

</div>[\r\n\s]{0,}</content>

我需要将这些组合起来，并将内容作为回调返回。我尝试过类似 ([\w\W]{0,}) 的方法，它绝对返回所有内容，但这场匹配不会在结束 div 处停止。

所以我发现了 \bFULLWORD\b 命令，并将 \bdiv\b 扔在了上面......但我没有成功让它发挥作用。也许 PHP 不支持？或者我很愚蠢。

我不知道。

请帮忙！

原文

There is probably a very simple answer to this, but I want to be as detailed as possible so that you do not need me to clarify.

I am trying to collect the contents of every

<content><div>CONTENT</div></content>

The content needs to be returned as a backreference ($1). Both the content and the div have differing parameters (such as style="color: white;"). These parameters are unimportant, but exist nonetheless.

The complication is that the div may contain child div's. These are not important, but conflict with my current regex - stopping the match early.

Here is a sample of the code, imagine this copy/pasted several times and formatted differently.

<entry> 
<title>A general title of a post</title> 
<content type="xhtml"> 
    <div xmlns="http://www.w3.org/1999/xhtml"> 
    This is a description of the title. It may <b>contain bold text</b> or <div>even divs</div>, and everything else. It is not quite important to save these tags, but they exist nonetheless.
    </div> 
</content> 
</entry>

Currently, I am using two regex codes. One for the declaration, and one for the closing tags. This works, but now I need to execute code on the contents. So, I will use preg_replace_callback(), but I can't figure out how to connect the two so that the middle is a callback.

Declaration:

<content \w+\s*=\s*\".*?\">[\r\n\s]{0,}<div \w+\s*=\s*\".*?\">

Closing:

</div>[\r\n\s]{0,}</content>

I need these combined, with the contents returned as a callback. I have tried something like ([\w\W]{0,}), which returns absolutely everything, but this match doesn't stop at the closing div.

So I found out about the \bFULLWORD\b command, and threw \bdiv\b on that... But I have had no success getting that to work. Perhaps it is not supported by PHP? Or I am stupid.

I do not know.

Please help!

分享到QQ

分享到微博