PHP Preg_替换标签之间的数据,尊重文档中的其他标签

发布于 2024-10-22 22:07:08 字数 1350 浏览 6 评论 0原文

对此可能有一个非常简单的答案,但我想尽可能详细,这样您就不需要我来澄清。

我正在尝试收集每个内容

<content><div>CONTENT</div></content>

需要作为反向引用返回($1)。内容和 div 都有不同的参数(例如 style="color:white;")。这些参数并不重要,但仍然存在。

复杂的是 div 可能包含子 div。这些并不重要,但与我当前的正则表达式冲突 - 提前停止比赛。

这是代码示例,想象一下这个复制/粘贴多次并且格式不同。

<entry> 
<title>A general title of a post</title> 
<content type="xhtml"> 
    <div xmlns="http://www.w3.org/1999/xhtml"> 
    This is a description of the title. It may <b>contain bold text</b> or <div>even divs</div>, and everything else. It is not quite important to save these tags, but they exist nonetheless.
    </div> 
</content> 
</entry>

目前,我正在使用两个正则表达式代码。一份用于声明,一份用于结束标签。这可行,但现在我需要对内容执行代码。因此,我将使用 preg_replace_callback(),但我不知道如何连接两者以便中间是回调。

声明:

<content \w+\s*=\s*\".*?\">[\r\n\s]{0,}<div \w+\s*=\s*\".*?\">

结束语:

</div>[\r\n\s]{0,}</content>

我需要将这些组合起来,并将内容作为回调返回。我尝试过类似 ([\w\W]{0,}) 的方法,它绝对返回所有内容,但这场匹配不会在结束 div 处停止。

所以我发现了 \bFULLWORD\b 命令,并将 \bdiv\b 扔在了上面......但我没有成功让它发挥作用。也许 PHP 不支持?或者我很愚蠢。

我不知道。

请帮忙!

There is probably a very simple answer to this, but I want to be as detailed as possible so that you do not need me to clarify.

I am trying to collect the contents of every

<content><div>CONTENT</div></content>

The content needs to be returned as a backreference ($1). Both the content and the div have differing parameters (such as style="color: white;"). These parameters are unimportant, but exist nonetheless.

The complication is that the div may contain child div's. These are not important, but conflict with my current regex - stopping the match early.

Here is a sample of the code, imagine this copy/pasted several times and formatted differently.

<entry> 
<title>A general title of a post</title> 
<content type="xhtml"> 
    <div xmlns="http://www.w3.org/1999/xhtml"> 
    This is a description of the title. It may <b>contain bold text</b> or <div>even divs</div>, and everything else. It is not quite important to save these tags, but they exist nonetheless.
    </div> 
</content> 
</entry>

Currently, I am using two regex codes. One for the declaration, and one for the closing tags. This works, but now I need to execute code on the contents. So, I will use preg_replace_callback(), but I can't figure out how to connect the two so that the middle is a callback.

Declaration:

<content \w+\s*=\s*\".*?\">[\r\n\s]{0,}<div \w+\s*=\s*\".*?\">

Closing:

</div>[\r\n\s]{0,}</content>

I need these combined, with the contents returned as a callback. I have tried something like ([\w\W]{0,}), which returns absolutely everything, but this match doesn't stop at the closing div.

So I found out about the \bFULLWORD\b command, and threw \bdiv\b on that... But I have had no success getting that to work. Perhaps it is not supported by PHP? Or I am stupid.

I do not know.

Please help!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

_蜘蛛 2024-10-29 22:07:08

以前已经说过了,现在也已经说过了,不幸的是,还会再说一遍。正则表达式是一个很棒的工具。它非常适合操作字符串正则表达式的模式匹配。

HTML 不是字符串。 HTML 是一种标记语言,而不是常规语言。它实际上并不是一个字符串,但可以被解释为一个字符串(这就是为什么我们在技术上可以使用正则表达式来操作 HTML)。 HTML 是它自己的基于元素节点的语言,如果要更改某些内容,则需要操作这些元素。

正如评论中指出的,您可以轻松使用 DOM 类在 PHP 中。

你想要这样做有很多原因:

  • 它更容易,你不需要制作一些看起来像猫走过键盘的疯狂图案
  • 它更容易(再次),你可以导航到特定节点,而不是使用整个文件。
  • 它更安全,您不会意外地更改您不想要的内容。
  • 它更安全(再次),源数据可以更改,并且您可以检测它并解释它。
  • 它更安全(再次),你可以优雅地失败。

如何?

It's been said before and it's being said now, and unfortunately it's going to be said again. Regex is a wonderful tool. It's great for manipulating strings and pattern matching of regular expressions.

HTML is not a string. HTML is a markup language, not a regular language. It's not truthfully a string, but can be interpreted as one (and thus, why we can technically use regex to manipulate HTML). HTML is it's own language based on element nodes, you need to manipulate those elements if you're going to change something.

As pointed out in the comments, you can easily use the DOM class in PHP.

You want to do this for quite a few reasons:

  • It's easier, you don't need to make some crazy pattern that looks like a cat walked across your keyboard
  • It's easier (again), you can navigate to the specific node, not work with the whole document.
  • It's safer, you don't accidentaly change something you didn't want to
  • It's safer (again), the source data can change, and you can detect it and account for it.
  • It's safer (again again), you can fail gracefully.

How?

断舍离 2024-10-29 22:07:08

使用 DOM 解析器。这是一个示例: http://htmlparsing.com/php.html

Use a DOM parser. Here's an example: http://htmlparsing.com/php.html

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文