PHP iconv_strlen() 含义问题

发布于 2024-11-09 03:01:40 字数 147 浏览 1 评论 0原文

我想知道下面这句话对于我们这些傻瓜来说简单来说意味着什么?

什么是字节序列?一个字节有多少个字符?

iconv_strlen()根据指定的字符集统计给定字节序列str中字符的出现次数,其结果不一定与字符串的字节长度相同。

I was wondering what does the following sentence mean in simple terms for us dummies?

And what is byte sequence? And how many characters in a byte?

iconv_strlen() counts the occurrences of characters in the given byte sequence str on the basis of the specified character set, the result of which is not necessarily identical to the length of the string in byte.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

—━☆沉默づ 2024-11-16 03:01:40

我们以日语字符“こ”为例。假设采用 UTF-8 编码,这是一个 3 字节字符 (0xE3 0x81 0x93)。让我们看看当我们使用 strlen 时会发生什么:

$ php -r 'echo strlen("こ") . "\n";'
3

结果是 3,因为 strlen 正在计算字节。然而,根据 UTF-8 编码,这只是单个字符。这就是 iconv_strlen 的用武之地。它知道在 UTF-8 中,这是单个字符,即使它由 3 个字节组成。因此,如果我们尝试这样做:

$ php -r 'echo iconv_strlen("こ", "UTF-8") . "\n";'
1

我们得到 1。这就是该解释要指出的内容。

Let's take for example the Japanese character 'こ'. Assuming UTF-8 encoding, this is a 3 byte character (0xE3 0x81 0x93). Let's see what happens when we use strlen instead:

$ php -r 'echo strlen("こ") . "\n";'
3

The result is 3, since strlen is counting bytes. However, this is only a single character according to UTF-8 encoding. That's where iconv_strlen comes in. It knows that in UTF-8, this is a single character, even though it's made up of 3 bytes. So if we try this instead:

$ php -r 'echo iconv_strlen("こ", "UTF-8") . "\n";'
1

We get 1. That's what that explanation is meant to point out.

感情废物 2024-11-16 03:01:40

字符串具有特定的字节长度。当且仅当字符串中的每个字符都由单个字节表示时,该字符串中的字符数将等于字节数。例如,对于英文字母来说就是如此。对于使用多个字节来表示部分或全部字符的表示(即编码),字符数将小于字节数*。例如,不可能用一个字节来表示所有可能的汉字。

因此,给定编码的 iconv_strlen 将尝试计算字符串中的字符数。字节序列是字符串中字节的顺序。例如,对于包含中文的字符串,使用 UTF8 编码,您可能有一个包含 14 个字符的 20 字节字符串。

*如果一个字符由少于一个字节表示,则可能会更多。

A string has a particular length in bytes. The number of characters in that string will be equal to the number of bytes if and only if each character in the string is represented by a single byte. This is true, for example, for English letters. For representations (i.e., encodings) that use more than one byte to represent some or all characters, the number of characters will be less than the number of bytes*. It is not possible, for example, to represent all possible Chinese characters with a byte.

So, iconv_strlen, given an encoding, will try to count the number of characters in the string. The byte sequence is the order of bytes in the string. For a string containing Chinese, using UTF8 encoding, you might, for example, have a 20-byte string that has 14 characters.

*It could be more, if a character is represented by less than one byte.

一梦浮鱼 2024-11-16 03:01:40

iconv_strlen()根据指定的字符集统计给定字节序列str中字符的出现次数,其结果不一定与字符串的长度(以字节为单位)。

翻译:

  • 字节序列字符串的另一个词,它是字节序列(1字节= 8位),例如:01011010 00011001 01101011 。字节序列代表字符,例如ABC等。
  • 字符集:又名编码,指定字节如何映射到字符;例如01000001代表ASCII字符集中的A
  • 不一定与字节长度[...]相同:在 ASCII 字符集中,一个字节恰好代表一个字符。并非所有字符集都是如此;有的用两个、三个或更多字节来表示一个字符。这是因为 1 个字节只能容纳 256 个不同的值,而某些语言是使用超过 256 个字符编写的(例如中文和日文)。 Unicode 甚至尝试将所有人类语言的所有字符映射到一个字符集中,这需要每个字符多于一个字节。

总之:

iconv_strlen() 考虑字符集,对给定字符串中的字符进行计数。因此,字符数可能不等于字节数。

iconv_strlen() counts the occurrences of characters in the given byte sequence str on the basis of the specified character set, the result of which is not necessarily identical to the length of the string in byte.

Translations:

  • byte sequence: another word for string, which is a sequence of bytes (1 byte = 8 bits), e.g.: 01011010 00011001 01101011. Byte sequences represent characters like A, B, C etc.
  • character set: a.k.a. encoding, specifies how a byte maps to a character; e.g. 01000001 represents A in the ASCII character set.
  • not necessarily identical to the length […] in byte: in the ASCII character set, one byte represents exactly one character. This is not the case for all character sets; in some two, three or more bytes are used to represent one character. That is because one byte can only hold 256 different values and some languages are written using more than 256 characters (like Chinese and Japanese). Unicode even attempts to map all characters of all human languages in a single character set, which requires a lot more than one byte per character.

In summary:

iconv_strlen() counts the characters in the given string, taking into account the character set. Therefore, the number of characters may not be equal to the number of bytes.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文