如何正确地减去html实体?
我有这样的:
$mytext="that's really "confusing" and <absolutly> silly";
echo substr($mytext,0,6);
这种情况下的输出将是: that&#
而不是 that's
我想要的是将 html 实体计为 1 个字符,然后是 substr,因为我总是最终会出现损坏的 html 或文本末尾的一些模糊字符。
请不要建议我对它进行 html 解码,然后对它进行 substr,然后对其进行编码,我想要一个干净的方法:)
谢谢
I have like this :
$mytext="that's really "confusing" and <absolutly> silly";
echo substr($mytext,0,6);
The output in this case will be : that
instead of that's
What i want is to count html entities as 1 character then substr, because i always end up with breaked html or some obscure characters at the end of text.
Please don't suggest me to html decode it then substr then encode it, i want a clean method :)
Thanks
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
有两种方法可以做到这一点:
您可以解码 HTML 实体
substr()
,然后进行编码;或您可以使用正则表达式。
(1) 使用
html_entity_decode()
< /a> 和htmlentities()
:(2 ) 可能类似于:
这句话的意思是:从字符串开头查找前面的表达式最多出现 5 次。前面的表达式可以是:
任何不是 & 符号的字符;或
一个 & 符号,后跟任何直到并包括分号的内容(即 HTML 实体)。
这并不完美,所以我倾向于(1)。
There are two ways of doing this:
You can decode the HTML entities,
substr()
and then encode; orYou can use a regular expression.
(1) uses
html_entity_decode()
andhtmlentities()
:(2) might be something like:
What this is saying is: find me up to 5 occurrences of the preceding expression from the beginning of the string. The preceding expression is either:
any character that isn't an ampersand; or
an ampersand, followed by anything up to and including a semi-colon (ie an HTML entity).
This isn't perfect so I would favour (1).
在那里,我将 cletus 的代码复制粘贴到一个函数中。现在您可以用 1 行代码调用一个非常简单的 3 行函数。如果这不是“干净”,那么我很困惑“干净”的含义。
There, I copypasted cletus' code into a function for you. Now you can call a very straightforward 3 line function with 1 line of code. If this isn't "clean" then I'm confused what "clean" means.
请注意,如果您使用
substr()
,某些字符会破坏建议的解码+编码。示例
将输出:
(空字符串)
解决方案
使用
mb_substr()
。将输出:
’
on my Fitness…
InBe advised that some characters break the proposed decoding + encoding, if you use
substr()
.Example
Will output:
(empty string)
The solution
Use
mb_substr()
.Will output:
’
on my Fitness…
In请尝试使用以下编码功能。
Please try with following coding Functions.
好吧,干净的方法只有一种:
根本不使用实体。
对实体字符串进行 substr 的原因并不单一。它只能用于输出。
所以,先减去,然后编码。
Well, clean method is only one:
Not to use entities at all.
There are not a single reason to substr entitied string. It can be used to output only.
So, first substr, then encode.
这是对语法错误代码的更正,使用 mb_substr 以避免出现意外情况,例如 html 实体字符较少,或者字符计数无法按应有的方式工作,在我的情况下,Sábado 变成了 Sá:
Here is a correction for syntax error code, use mb_substr to avoid surprises like html entity having less characters, or character counting not working the way it should, in my case Sábado becoming Sá: