如何strlen多语言字符串
我想获取 Shift-jis 和 Utf-8 的 strlen(),然后比较它们。 字符串可以混合为“ああ12345678sdfdszzz”。我尝试使用 strlen 但它会产生不同的结果。 mb_strlen 也没有帮助,因为这是一个混合字符串。
例如:
ああ12345678 >> strlen() = 24 chars
ああああああああああああああああ >> strlen() = 48 chars
ああああああああああああああああああ >> strlen() = 54 chars
好像没有什么规则。那么计算 strlen 并在多语言中比较它们的最佳方法是什么?
I want to get strlen() of Shift-jis and Utf-8, then compare them.
A string could be mixed "ああ12345678sdfdszzz". I tried to use strlen but it generates the different results. mb_strlen also doesn't help because this is a mixed string.
For example:
ああ12345678 >> strlen() = 24 chars
ああああああああああああああああ >> strlen() = 48 chars
ああああああああああああああああああ >> strlen() = 54 chars
It seems to be there is no rule. So what is the best way to calculate strlen and compare them in multilanguage?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
strlen
只计算字节,因此仅对 单字节字符编码;使用mb_strlen
表示 多字节字符编码,可以代替计算实际字符。strlen
does only count the bytes and thus is only useful for single-byte character encodings; usemb_strlen
for multi-byte character encodings that can count the actual characters instead.我会编写一个函数来检查特定编码存在的位置。
然后我将字符串拆分为编码,执行 mb_strlen 并总结后缀的大小。然后对第二个字符串重复并比较。
我想你明白我的观点了;)
PS: 使用 mb_detect_encoding 来检测编码
mb_detect_encoding (请参阅 php 社区的评论以获取更多想法)
I would write a function to check from where to where a particular encoding exsist.
Then I would split the string into encodings, perform the mb_strlen and sum up the sizes afterwords. Then repeat on the second string and compare.
I guess you understand my point ;)
PS: Use mb_detect_encoding to detect encoding
mb_detect_encoding (see the comments for further ideas by the php community)