如何正确地减去html实体?

发布于 2024-08-29 13:17:44 字数 382 浏览 5 评论 0原文

我有这样的:

$mytext="that's really "confusing" and <absolutly> silly";
echo substr($mytext,0,6);

这种情况下的输出将是: that&# 而不是 that's

我想要的是将 html 实体计为 1 个字符,然后是 substr,因为我总是最终会出现损坏的 html 或文本末尾的一些模糊字符。

请不要建议我对它进行 html 解码,然后对它进行 substr,然后对其进行编码,我想要一个干净的方法:)

谢谢

I have like this :

$mytext="that's really "confusing" and <absolutly> silly";
echo substr($mytext,0,6);

The output in this case will be : that&# instead of that's

What i want is to count html entities as 1 character then substr, because i always end up with breaked html or some obscure characters at the end of text.

Please don't suggest me to html decode it then substr then encode it, i want a clean method :)

Thanks

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

眼波传意 2024-09-05 13:17:44

有两种方法可以做到这一点:

  1. 您可以解码 HTML 实体 substr(),然后进行编码;或

  2. 您可以使用正则表达式。

(1) 使用 html_entity_decode()< /a> 和 htmlentities():

$s = html_entity_decode($mytext);
$sub = substr($s, 0, 6);
echo htmlentities($sub);

(2 ) 可能类似于:

if (preg_match('!^([^&]|&(?:.*?;)){0,5}!s', $mytext, $match)) {
  echo $match[0];
}

这句话的意思是:从字符串开头查找前面的表达式最多出现 5 次。前面的表达式可以是:

  • 任何不是 & 符号的字符;或

  • 一个 & 符号,后跟任何直到并包括分号的内容(即 HTML 实体)。

这并不完美,所以我倾向于(1)。

There are two ways of doing this:

  1. You can decode the HTML entities, substr() and then encode; or

  2. You can use a regular expression.

(1) uses html_entity_decode() and htmlentities():

$s = html_entity_decode($mytext);
$sub = substr($s, 0, 6);
echo htmlentities($sub);

(2) might be something like:

if (preg_match('!^([^&]|&(?:.*?;)){0,5}!s', $mytext, $match)) {
  echo $match[0];
}

What this is saying is: find me up to 5 occurrences of the preceding expression from the beginning of the string. The preceding expression is either:

  • any character that isn't an ampersand; or

  • an ampersand, followed by anything up to and including a semi-colon (ie an HTML entity).

This isn't perfect so I would favour (1).

初心未许 2024-09-05 13:17:44

function encoded_substr($string, $param, $param2){
  $s = html_entity_decode($string);
  $sub = substr($s, $param, $param2);
  return htmlentities($sub);
}

在那里,我将 cletus 的代码复制粘贴到一个函数中。现在您可以用 1 行代码调用一个非常简单的 3 行函数。如果这不是“干净”,那么我很困惑“干净”的含义。


function encoded_substr($string, $param, $param2){
  $s = html_entity_decode($string);
  $sub = substr($s, $param, $param2);
  return htmlentities($sub);
}

There, I copypasted cletus' code into a function for you. Now you can call a very straightforward 3 line function with 1 line of code. If this isn't "clean" then I'm confused what "clean" means.

久光 2024-09-05 13:17:44

请注意,如果您使用 substr(),某些字符会破坏建议的解码+编码。

示例

$string=html_entity_decode("Workin’ on my Fitness…In the Backyard.");

echo $string;
echo substr($string,0,25);
echo htmlentities(substr($string,0,25));

将输出:

  • Workin' on my Fitness...In the Backyard。
  • 致力于我的健身。
  • (空字符串)

解决方案

使用 mb_substr()

echo mb_substr($string,0,25);
echo htmlentities(mb_substr($string,0,25));

将输出:

  • Workin' on my Fitness...In
  • Workin on my FitnessIn

Be advised that some characters break the proposed decoding + encoding, if you use substr().

Example

$string=html_entity_decode("Workin’ on my Fitness…In the Backyard.");

echo $string;
echo substr($string,0,25);
echo htmlentities(substr($string,0,25));

Will output:

  • Workin’ on my Fitness…In the Backyard.
  • Workin’ on my Fitness�
  • (empty string)

The solution

Use mb_substr().

echo mb_substr($string,0,25);
echo htmlentities(mb_substr($string,0,25));

Will output:

  • Workin’ on my Fitness…In
  • Workin on my FitnessIn
琴流音 2024-09-05 13:17:44

请尝试使用以下编码功能。

<?php

$mytext="that's really "confusing" and <absolutly> silly";

echo limit_text($tamil_var,6);

function limit_text($text,$limit){
   preg_match_all("/&(.*)\;/U", $text, $pat_array);
   $additional=0;

   foreach ($pat_array[0] as $key => $value) {
     if($key <$limit){$additional += (strlen($value)-1);}
   }
   $limit+=$additional;

   if(strlen($text)>$limit){
     $text = substr( $text,0,$limit );
     $text = substr( $text,0,-(strlen(strrchr($text,' '))) );
   }
   return $text;

}

?>

Please try with following coding Functions.

<?php

$mytext="that's really "confusing" and <absolutly> silly";

echo limit_text($tamil_var,6);

function limit_text($text,$limit){
   preg_match_all("/&(.*)\;/U", $text, $pat_array);
   $additional=0;

   foreach ($pat_array[0] as $key => $value) {
     if($key <$limit){$additional += (strlen($value)-1);}
   }
   $limit+=$additional;

   if(strlen($text)>$limit){
     $text = substr( $text,0,$limit );
     $text = substr( $text,0,-(strlen(strrchr($text,' '))) );
   }
   return $text;

}

?>
戏舞 2024-09-05 13:17:44

好吧,干净的方法只有一种:
根本不使用实体。
对实体字符串进行 substr 的原因并不单一。它只能用于输出。
所以,先减去,然后编码。

Well, clean method is only one:
Not to use entities at all.
There are not a single reason to substr entitied string. It can be used to output only.
So, first substr, then encode.

灼痛 2024-09-05 13:17:44

这是对语法错误代码的更正,使用 mb_substr 以避免出现意外情况,例如 html 实体字符较少,或者字符计数无法按应有的方式工作,在我的情况下,Sábado 变成了 Sá:

function encoded_substr($string, $param, $param2){
$s = html_entity_decode($string);
$sub = mb_substr($s, $param, $param2);
return htmlentities($sub);
}

Here is a correction for syntax error code, use mb_substr to avoid surprises like html entity having less characters, or character counting not working the way it should, in my case Sábado becoming Sá:

function encoded_substr($string, $param, $param2){
$s = html_entity_decode($string);
$sub = mb_substr($s, $param, $param2);
return htmlentities($sub);
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文