如何正确地减去html实体？

发布于 2024-08-29 13:17:44 字数 382 浏览 5 评论 0原文

我有这样的：

$mytext="that&#039;s really &quot;confusing&quot; and &lt;absolutly&gt; silly";
echo substr($mytext,0,6);

这种情况下的输出将是： that&# 而不是 that's

我想要的是将 html 实体计为 1 个字符，然后是 substr，因为我总是最终会出现损坏的 html 或文本末尾的一些模糊字符。

请不要建议我对它进行 html 解码，然后对它进行 substr，然后对其进行编码，我想要一个干净的方法:)

谢谢

原文

I have like this :

$mytext="that's really "confusing" and <absolutly> silly";
echo substr($mytext,0,6);

The output in this case will be : that&# instead of that's

What i want is to count html entities as 1 character then substr, because i always end up with breaked html or some obscure characters at the end of text.

Please don't suggest me to html decode it then substr then encode it, i want a clean method :)

Thanks

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

眼波传意 2024-09-05 13:17:44

有两种方法可以做到这一点：

您可以解码 HTML 实体 substr()，然后进行编码；或
您可以使用正则表达式。

(1) 使用 html_entity_decode()< /a> 和 htmlentities():

$s = html_entity_decode($mytext);
$sub = substr($s, 0, 6);
echo htmlentities($sub);

(2 ) 可能类似于：

if (preg_match('!^([^&]|&(?:.*?;)){0,5}!s', $mytext, $match)) {
  echo $match[0];
}

这句话的意思是：从字符串开头查找前面的表达式最多出现 5 次。前面的表达式可以是：

任何不是 & 符号的字符；或
一个 & 符号，后跟任何直到并包括分号的内容（即 HTML 实体）。

这并不完美，所以我倾向于（1）。

There are two ways of doing this:

You can decode the HTML entities, substr() and then encode; or
You can use a regular expression.

(1) uses html_entity_decode() and htmlentities():

$s = html_entity_decode($mytext);
$sub = substr($s, 0, 6);
echo htmlentities($sub);

(2) might be something like:

if (preg_match('!^([^&]|&(?:.*?;)){0,5}!s', $mytext, $match)) {
  echo $match[0];
}

What this is saying is: find me up to 5 occurrences of the preceding expression from the beginning of the string. The preceding expression is either:

any character that isn't an ampersand; or
an ampersand, followed by anything up to and including a semi-colon (ie an HTML entity).

This isn't perfect so I would favour (1).

回复收藏 0 原文

初心未许 2024-09-05 13:17:44


function encoded_substr($string, $param, $param2){
  $s = html_entity_decode($string);
  $sub = substr($s, $param, $param2);
  return htmlentities($sub);
}

在那里，我将 cletus 的代码复制粘贴到一个函数中。现在您可以用 1 行代码调用一个非常简单的 3 行函数。如果这不是“干净”，那么我很困惑“干净”的含义。


function encoded_substr($string, $param, $param2){
  $s = html_entity_decode($string);
  $sub = substr($s, $param, $param2);
  return htmlentities($sub);
}

There, I copypasted cletus' code into a function for you. Now you can call a very straightforward 3 line function with 1 line of code. If this isn't "clean" then I'm confused what "clean" means.

回复收藏 0 原文

久光 2024-09-05 13:17:44

请注意，如果您使用 substr()，某些字符会破坏建议的解码+编码。

示例

$string=html_entity_decode("Workin’ on my Fitness…In the Backyard.");

echo $string;
echo substr($string,0,25);
echo htmlentities(substr($string,0,25));

将输出：

Workin' on my Fitness...In the Backyard。
致力于我的健身。
（空字符串）

解决方案

使用 mb_substr()。

echo mb_substr($string,0,25);
echo htmlentities(mb_substr($string,0,25));

将输出：

Workin' on my Fitness...In
Workin’ on my Fitness…In

Be advised that some characters break the proposed decoding + encoding, if you use substr().

Example

$string=html_entity_decode("Workin’ on my Fitness…In the Backyard.");

echo $string;
echo substr($string,0,25);
echo htmlentities(substr($string,0,25));

Will output:

Workin’ on my Fitness…In the Backyard.
Workin’ on my Fitness�
(empty string)

The solution

Use mb_substr().

echo mb_substr($string,0,25);
echo htmlentities(mb_substr($string,0,25));

Will output:

Workin’ on my Fitness…In
Workin’ on my Fitness…In

回复收藏 0 原文

琴流音 2024-09-05 13:17:44

请尝试使用以下编码功能。

<?php

$mytext="that's really "confusing" and <absolutly> silly";

echo limit_text($tamil_var,6);

function limit_text($text,$limit){
   preg_match_all("/&(.*)\;/U", $text, $pat_array);
   $additional=0;

   foreach ($pat_array[0] as $key => $value) {
     if($key <$limit){$additional += (strlen($value)-1);}
   }
   $limit+=$additional;

   if(strlen($text)>$limit){
     $text = substr( $text,0,$limit );
     $text = substr( $text,0,-(strlen(strrchr($text,' '))) );
   }
   return $text;

}

?>

Please try with following coding Functions.

<?php

$mytext="that's really "confusing" and <absolutly> silly";

echo limit_text($tamil_var,6);

function limit_text($text,$limit){
   preg_match_all("/&(.*)\;/U", $text, $pat_array);
   $additional=0;

   foreach ($pat_array[0] as $key => $value) {
     if($key <$limit){$additional += (strlen($value)-1);}
   }
   $limit+=$additional;

   if(strlen($text)>$limit){
     $text = substr( $text,0,$limit );
     $text = substr( $text,0,-(strlen(strrchr($text,' '))) );
   }
   return $text;

}

?>

回复收藏 0 原文

戏舞 2024-09-05 13:17:44

好吧，干净的方法只有一种：
根本不使用实体。
对实体字符串进行 substr 的原因并不单一。它只能用于输出。
所以，先减去，然后编码。

回复收藏 0 原文

灼痛 2024-09-05 13:17:44

这是对语法错误代码的更正，使用 mb_substr 以避免出现意外情况，例如 html 实体字符较少，或者字符计数无法按应有的方式工作，在我的情况下，Sábado 变成了 Sá：

function encoded_substr($string, $param, $param2){
$s = html_entity_decode($string);
$sub = mb_substr($s, $param, $param2);
return htmlentities($sub);
}

Here is a correction for syntax error code, use mb_substr to avoid surprises like html entity having less characters, or character counting not working the way it should, in my case Sábado becoming Sá:

function encoded_substr($string, $param, $param2){
$s = html_entity_decode($string);
$sub = mb_substr($s, $param, $param2);
return htmlentities($sub);
}

回复收藏 0 原文

~没有更多了~