转换之间的所有内容使用 PHP 转换为 HTML 实体

发布于 2024-11-29 10:24:30 字数 985 浏览 0 评论 0原文

我如何将标签之间的所有内容转换为 html 实体:

Lorem ipsum dolor sit amet, consetetur sadipscing elitr,
sed diam nonumy eirmod tempor invidunt ut labore et dolore
magna aliquyam erat, sed diam voluptua.
<code class="highlight sql">
    CREATE TABLE `comments`
</code>

<h1>Next step</h1>

Lorem ipsum dolor sit amet, consetetur sadipscing elitr,
sed diam nonumy eirmod tempor invidunt ut labore et
dolore magna aliquyam erat, sed diam voluptua.
At vero eos et accusam et justo duo dolores et ea rebum.
<b>Stet clita kasd gubergren, no sea takimata sanctus</b> est Lorem
dolor sit amet. Lorem ipsum dolor sit amet, consetetur
sadipscing elitr, sed diam nonumy eirmod tempor invidunt
ut labore et dolore magna aliquyam erat, sed diam voluptua:
<code class="highlight php">
    <?php
        $host = "localhost";
    ?>
</code>

Lorem ipsum dolor sit amet, consetetur sadipscing elitr.

注意:上面的示例是一个我可以在 PHP 中转换的字符串。

How could I convert everyting between a tag to html enities:

Lorem ipsum dolor sit amet, consetetur sadipscing elitr,
sed diam nonumy eirmod tempor invidunt ut labore et dolore
magna aliquyam erat, sed diam voluptua.
<code class="highlight sql">
    CREATE TABLE `comments`
</code>

<h1>Next step</h1>

Lorem ipsum dolor sit amet, consetetur sadipscing elitr,
sed diam nonumy eirmod tempor invidunt ut labore et
dolore magna aliquyam erat, sed diam voluptua.
At vero eos et accusam et justo duo dolores et ea rebum.
<b>Stet clita kasd gubergren, no sea takimata sanctus</b> est Lorem
dolor sit amet. Lorem ipsum dolor sit amet, consetetur
sadipscing elitr, sed diam nonumy eirmod tempor invidunt
ut labore et dolore magna aliquyam erat, sed diam voluptua:
<code class="highlight php">
    <?php
        $host = "localhost";
    ?>
</code>

Lorem ipsum dolor sit amet, consetetur sadipscing elitr.

Note: That example above is a string which I could convert in PHP.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

陪你搞怪i 2024-12-06 10:24:30

对我来说这归结为正则表达式。在您开始大喊大叫之前,可以可靠地匹配&amp;只要没有嵌套标签,就替换 html 的子集。

这是简单的方法。一个正则表达式,用于匹配从头到尾的标签,并将函数应用于匹配/编码我们需要的内容并替换它。

这是代码:

<?php
$string = 'Lorem ipsum dolor sit amet, consetetur sadipscing elitr,
sed diam nonumy eirmod tempor invidunt ut labore et dolore
magna aliquyam erat, sed diam voluptua.
<code class="highlight sql">
    CREATE TABLE `comments`&
</code>

<h1>Next step</h1>

Lorem ipsum dolor sit amet, consetetur sadipscing elitr,
sed diam nonumy eirmod tempor invidunt ut labore et
dolore magna aliquyam erat, sed diam voluptua.
At vero eos et accusam et justo duo dolores et ea rebum.
<b>Stet clita kasd gubergren&, no sea takimata sanctus</b> est Lorem
dolor sit amet. Lorem ipsum dolor sit amet, consetetur
sadipscing elitr, sed diam nonumy " eirmod " tempor invidunt
ut labore et dolore magna aliq&uyam erat, sed diam voluptua:
<code class="highlight php">
    <?php
       * $host = "localhost";
    ?>&
</code>

Lorem ipsum dolor sit amet, consetetur sadipscing elitr.';

echo preg_replace("/(<code[^>]*?>)(.*?)(<\/code>)/se", "
    stripslashes('$1').
    htmlentities(stripslashes('$2')).
    stripslashes('$3')
", $string);

这是键盘上的一个工作测试用例

http://codepad.org/MhKwfOQl

这将有效因为没有令人讨厌的嵌套标签/损坏的 html。

我仍然建议您尝试并确保按照您想要的方式保存数据,使其可见,并在需要时进行编码。

如果您想在一组不同的标签之间进行替换,请更改正则表达式。

更新: 看来 $host 正在被 php 解析...并且我们不希望这样。发生这种情况是因为 php 将替换字符串评估为 php,然后执行给定的函数并将找到的字符串输入到这些函数中,如果该字符串是由双引号封装的,它也会解析这些字符串...呵呵,真是麻烦。

然后出现另一个问题,php 转义匹配中的单引号和双引号,因此它们不会生成解析错误,这意味着匹配中的任何 qoutes 也必须从斜杠中删除...导致相当长的替换字符串。

This comes down to a regex for me. And before you start shouting it is possible to reliably match & replace subsets of html, as long as there are no nesting tags.

This is the easy way tbh. A regex to match a tag start till end and apply a function to the matches / encoding what we need and replacing it.

Heres the code:

<?php
$string = 'Lorem ipsum dolor sit amet, consetetur sadipscing elitr,
sed diam nonumy eirmod tempor invidunt ut labore et dolore
magna aliquyam erat, sed diam voluptua.
<code class="highlight sql">
    CREATE TABLE `comments`&
</code>

<h1>Next step</h1>

Lorem ipsum dolor sit amet, consetetur sadipscing elitr,
sed diam nonumy eirmod tempor invidunt ut labore et
dolore magna aliquyam erat, sed diam voluptua.
At vero eos et accusam et justo duo dolores et ea rebum.
<b>Stet clita kasd gubergren&, no sea takimata sanctus</b> est Lorem
dolor sit amet. Lorem ipsum dolor sit amet, consetetur
sadipscing elitr, sed diam nonumy " eirmod " tempor invidunt
ut labore et dolore magna aliq&uyam erat, sed diam voluptua:
<code class="highlight php">
    <?php
       * $host = "localhost";
    ?>&
</code>

Lorem ipsum dolor sit amet, consetetur sadipscing elitr.';

echo preg_replace("/(<code[^>]*?>)(.*?)(<\/code>)/se", "
    stripslashes('$1').
    htmlentities(stripslashes('$2')).
    stripslashes('$3')
", $string);

And heres a working testcase on codepad

http://codepad.org/MhKwfOQl

This will work as long as there are no nasty nested tags / corrupted html.

I would still advise you to try and make sure you save the data as you want to make it visible, encoded where needed.

If you want to replace between a different set of tags change the regex.

Update: It seemed that $host was being parsed by php... and ofrourse we don't want this. This happened because php evaluates the replacement string as php which then executes the given functions and inputs the found strings into those functions, and if that string is encapsulated by double qoutes it will parse those strings too... heh what a hassle.

And another problem then arises, php escapes single and double qoutes in matches so they won't generate parse errors, this ment that any qoutes in the matches had to be stripped from their slashes too... resulting in the pretty long replace string.

前事休说 2024-12-06 10:24:30

尽管正则表达式或解析器可能会为您提供解决此难题的方法,但我认为您可能会以错误的方式实现您的目标。

摘自问题下方的评论:

@Poru 该字符串是如何生成的?

@Phil:从数据库中获取。它是
教程的内容。这是一个自己开发的“CMS”。

如果您将此字符串存储在数据库中,并且它的功能是返回 HTML 内容,则您应该存储准备用作 HTML 的内容,这意味着您必须使用相应的 HTML 实体对相应的字符进行转义。

这是在这个问题中已经向您提供的建议:https://stackoverflow。 com/questions/7059776/include-source-code-in-html-valid/7059834

这里解释了必须转义的字符(以及其他各种参考资料):

http://php.net/manual/en/function.htmlspecialchars.php

执行的翻译是:

  • '&' (& 符号)变为 '&'
  • 当未设置 ENT_NOQUOTES 时,

  • '"'(双引号)将变为 '"'
  • 仅当设置了 ENT_QUOTES 时,“'”(单引号)才变为 '''
  • '<' (小于)变为 '<'
  • '>' (大于)变为 '>'

如果事实上是这样,并且这个字符串应该是 HTML 输出并且没有其他功能,那么保存没有任何意义它是无效的 HTML,或者至少不是您想要的样子。

如果您必须存储未转义的代码示例,请考虑为这些片段使用单独的数据库表,然后简单地运行 htmlspecialchars() 在将其输出到 HTML 之前 您甚至可以为每个记录分配一种语言,并自动使用适当的语法突出显示工具来转义字符。

在我看来, 并准备好您的 HTML 内容以当前形式输出到屏幕是可行的方法。

Although a regular expression or parser may give you a solution to this puzzle, I think you may be going about your goal the wrong way.

Taken from the comments below the question:

@Poru How is that string generated?

@Phil: Fetched from database. It's
the content of a tutorial. It's an own development "CMS".

If you are storing this string in a database, and it's function is to return HTML content, you should be storing the content ready to serve as HTML, which means you must escape the appropriate characters with their equivalent HTML entities.

This was the advice already offered to you in this question: https://stackoverflow.com/questions/7059776/include-source-code-in-html-valid/7059834

The characters that must be escaped are explained here (among other various references):

http://php.net/manual/en/function.htmlspecialchars.php

The translations performed are:

  • '&' (ampersand) becomes '&'
  • '"' (double quote) becomes '"' when ENT_NOQUOTES is not set.
  • "'" (single quote) becomes ''' only when ENT_QUOTES is set.
  • '<' (less than) becomes '<'
  • '>' (greater than) becomes '>'

If in fact this is the case, and this string is supposed to be HTML output and has no other function, it doesn't make any sense to save it as invalid HTML, or at least not what you intend it to be.

If you must store your code examples unescaped, consider a separate database table for these snippets, and simply run htmlspecialchars() on them before outputting it to the HTML document. You could even assign a language to each record, and use the appropriate syntax highlighting tool for each case automatically.

What you are attempting, in my opinion, is not the appropriate solution to this particular problem, in this context. Escaping the characters and having your HTML content ready to be output to screen in it's current form is the way to go.

高跟鞋的旋律 2024-12-06 10:24:30
$dom = new DOMDocument;
$dom->loadHTML(...);

$tags = $dom->getElementsByTagName('tag');
foreach($tags as $tag) {
    $tag->nodeValue = htmlentities($tag->nodeValue);
}
$dom->saveHTML();
$dom = new DOMDocument;
$dom->loadHTML(...);

$tags = $dom->getElementsByTagName('tag');
foreach($tags as $tag) {
    $tag->nodeValue = htmlentities($tag->nodeValue);
}
$dom->saveHTML();
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文