逆 htmlentities / html_entity_decode

发布于 2024-11-19 16:49:44 字数 562 浏览 1 评论 0原文

基本上我想把一个字符串变成这样:

;

blabla
;

改为:

<code>

blabla

我该怎么做?


The use case (bc some people were curious):

this 这样的页面,其中包含允许的 HTML 标签和示例列表。例如, 是一个允许的标签,这就是示例:

<code>&lt;?php echo "Hello World!"; ?&gt;</code>

我想要一个反向函数,因为有很多这样的标签和示例,我将它们全部存储到一个我迭代的数组中在一个循环中,而不是单独处理每个循环......

Basically I want to turn a string like this:

<code> <div> blabla </div> </code>

into this:

<code> <div> blabla </div> </code>

How can I do it?


The use case (bc some people were curious):

A page like this with a list of allowed HTML tags and examples. For example, <code> is a allowed tag, and this would be the sample:

<code><?php echo "Hello World!"; ?></code>

I wanted a reverse function because there are many such tags with samples that I store them all into a array which I iterate in one loop, instead of handling each one individually...

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

陪你到最终 2024-11-26 16:49:44

我的版本使用正则表达式:

$string = '<code> <div> blabla </div> </code>';
$new_string = preg_replace(
    '/(.*?)(<.*?>|$)/se', 
    'html_entity_decode("$1").htmlentities("$2")', 
    $string
);

它尝试匹配每个标签文本节点,然后分别应用htmlentitieshtml_entity_decode

My version using regular expressions:

$string = '<code> <div> blabla </div> </code>';
$new_string = preg_replace(
    '/(.*?)(<.*?>|$)/se', 
    'html_entity_decode("$1").htmlentities("$2")', 
    $string
);

It tries to match every tag and textnode and then apply htmlentities and html_entity_decode respectively.

咿呀咿呀哟 2024-11-26 16:49:44

没有现有的功能,但看看这个。
到目前为止,我只在您的示例上测试了它,但此功能应该适用于所有 htmlentities

function html_entity_invert($string) {
    $matches = $store = array();
    preg_match_all('/(&(#?\w){2,6};)/', $string, $matches, PREG_SET_ORDER);

    foreach ($matches as $i => $match) {
        $key = '__STORED_ENTITY_' . $i . '__';
        $store[$key] = html_entity_decode($match[0]);
        $string = str_replace($match[0], $key, $string);
    }

    return str_replace(array_keys($store), $store, htmlentities($string));
}

更新:

  • 感谢 @Mike 花时间用其他字符串测试我的函数。我已将正则表达式从 /(\&(.+)\;)/ 更新为 /(\&([^\&\;]+)\;) / 应该可以解决他提出的问题。

  • 我还添加了 {2,6} 来限制每次匹配的长度,以减少误报的可能性。

  • 将正则表达式从 /(\&([^\&\;]+){2,6}\;)/ 更改为 /(&([^&] ;;]+){2,6};)/ 删除不必要的扩展。

  • 哇哦,脑电波!将正则表达式从 /(&([^&]+){2,6};)/ 更改为 /(&(#?\w){2,6 };)/ 进一步降低误报概率

There isn't an existing function, but have a look at this.
So far I've only tested it on your example, but this function should work on all htmlentities

function html_entity_invert($string) {
    $matches = $store = array();
    preg_match_all('/(&(#?\w){2,6};)/', $string, $matches, PREG_SET_ORDER);

    foreach ($matches as $i => $match) {
        $key = '__STORED_ENTITY_' . $i . '__';
        $store[$key] = html_entity_decode($match[0]);
        $string = str_replace($match[0], $key, $string);
    }

    return str_replace(array_keys($store), $store, htmlentities($string));
}

Update:

  • Thanks to @Mike for taking the time to test my function with other strings. I've updated my regex from /(\&(.+)\;)/ to /(\&([^\&\;]+)\;)/ which should take care of the issue he raised.

  • I've also added {2,6} to limit the length of each match to reduce the possibility of false positives.

  • Changed regex from /(\&([^\&\;]+){2,6}\;)/ to /(&([^&;]+){2,6};)/ to remove unnecessary excaping.

  • Whooa, brainwave! Changed the regex from /(&([^&;]+){2,6};)/ to /(&(#?\w){2,6};)/ to reduce probability of false positives even further!

花间憩 2024-11-26 16:49:44

单独更换对您来说还不够。无论是正则表达式还是简单的字符串替换,因为如果替换 < > 符号,那么 <和>符号或反之亦然,您最终将得到一种编码/解码(所有 < 和 > 或所有 < 和 > 符号)。

因此,如果您想执行此操作,则必须解析出一组(我选择用占位符替换)进行替换,然后将它们放回并进行另一次替换。

$str = "<code> <div> blabla </div> </code>";
$search = array("<",">",);

//place holder for < and >
$replace = array("[","]");

//first replace to sub out < and > for [ and ] respectively
$str = str_replace($search, $replace, $str);

//second replace to get rid of original < and >
$search = array("<",">");
$replace = array("<",">",);
$str = str_replace($search, $replace, $str);

//third replace to turn [ and ] into < and >
$search = array("[","]");
$replace = array("<",">");

$str = str_replace($search, $replace, $str);

echo $str;

Replacing alone will not be good enough for you. Whether it be regular expressions or simple string replacing, because if you replace the < > signs then the < and > signs or vice versa you will end up with one encoding/decoding (all < and > or all < and > signs).

So if you want to do this, you will have to parse out one set (I chose to replace with a place holder) do a replace then put them back in and do another replace.

$str = "<code> <div> blabla </div> </code>";
$search = array("<",">",);

//place holder for < and >
$replace = array("[","]");

//first replace to sub out < and > for [ and ] respectively
$str = str_replace($search, $replace, $str);

//second replace to get rid of original < and >
$search = array("<",">");
$replace = array("<",">",);
$str = str_replace($search, $replace, $str);

//third replace to turn [ and ] into < and >
$search = array("[","]");
$replace = array("<",">");

$str = str_replace($search, $replace, $str);

echo $str;
予囚 2024-11-26 16:49:44

我想我有一个小解决方案,为什么不将 html 标签分解成一个数组,然后根据需要进行比较和更改?

function invertHTML($str) {
    $res = array();
    for ($i=0, $j=0; $i < strlen($str); $i++) { 
        if ($str{$i} == "<") { 
           if (isset($res[$j]) && strlen($res[$j]) > 0){
                $j++; 
                $res[$j] = '';
           } else {
               $res[$j] = '';
           }
           $pos = strpos($str, ">", $i); 
           $res[$j] .= substr($str, $i, $pos - $i+1); 
           $i += ($pos - $i); 
           $j++;
           $res[$j] = '';
           continue; 
        } 
        $res[$j] .= $str{$i}; 
    } 

    $newString = '';
    foreach($res as $html){
        $change = html_entity_decode($html);
        if($change != $html){
            $newString .= $change;
        } else {
            $newString .= htmlentities($html);
        }
    }
    return $newString; 
}

修改....没有错误。

I think i have a small sollution, why not break html tags into an array and then compare and change if needed?

function invertHTML($str) {
    $res = array();
    for ($i=0, $j=0; $i < strlen($str); $i++) { 
        if ($str{$i} == "<") { 
           if (isset($res[$j]) && strlen($res[$j]) > 0){
                $j++; 
                $res[$j] = '';
           } else {
               $res[$j] = '';
           }
           $pos = strpos($str, ">", $i); 
           $res[$j] .= substr($str, $i, $pos - $i+1); 
           $i += ($pos - $i); 
           $j++;
           $res[$j] = '';
           continue; 
        } 
        $res[$j] .= $str{$i}; 
    } 

    $newString = '';
    foreach($res as $html){
        $change = html_entity_decode($html);
        if($change != $html){
            $newString .= $change;
        } else {
            $newString .= htmlentities($html);
        }
    }
    return $newString; 
}

Modified .... with no errors.

夜访吸血鬼 2024-11-26 16:49:44

因此,尽管这里的其他人推荐了正则表达式,这可能是绝对正确的方法......我想发布此内容,因为它足以满足您提出的问题。

假设您始终使用 html'esque 代码:

 $str = '<code> <div> blabla </div> </code>';
 xml_parse_into_struct(xml_parser_create(), $str, $nodes);
 $xmlArr = array();
 foreach($nodes as $node) { 
     echo htmlentities('<' . $node['tag'] . '>') . html_entity_decode($node['value']) . htmlentities('</' . $node['tag'] . '>');
 }

给我以下输出:

<CODE> <div> blabla </div> </CODE>

相当肯定这不会支持再次向后退.. 正如发布的其他解决方案一样,在以下意义上:

 $orig = '<code> <div> blabla </div> </code>';
 $modified = '<CODE> <div> blabla </div> </CODE>';
 $modifiedAgain = '<code> <div> blabla </div> </code>';

So, although other people on here have recommended regular expressions, which may be the absolute right way to go ... I wanted to post this, as it is sufficient for the question you asked.

Assuming that you are always using html'esque code:

 $str = '<code> <div> blabla </div> </code>';
 xml_parse_into_struct(xml_parser_create(), $str, $nodes);
 $xmlArr = array();
 foreach($nodes as $node) { 
     echo htmlentities('<' . $node['tag'] . '>') . html_entity_decode($node['value']) . htmlentities('</' . $node['tag'] . '>');
 }

Gives me the following output:

<CODE> <div> blabla </div> </CODE>

Fairly certain that this wouldn't support going backwards again .. as other solutions posted, would, in the sense of:

 $orig = '<code> <div> blabla </div> </code>';
 $modified = '<CODE> <div> blabla </div> </CODE>';
 $modifiedAgain = '<code> <div> blabla </div> </code>';
明月夜 2024-11-26 16:49:44

编辑:看来我还没有完全回答你的问题。没有内置的 PHP 函数可以执行您想要的操作,但您可以使用正则表达式甚至简单表达式进行查找和替换: str_replace, preg_replace

Edit: It appears that I haven't fully answered your question. There is no built-in PHP function to do what you want, but you can do find and replace with regular expressions or even simple expressions: str_replace, preg_replace

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文