是否应该有类似“bytelen”的东西? (与“strlen”一起)?

发布于 2024-08-24 08:58:27 字数 203 浏览 8 评论 0原文

在我看来,“strlen”函数应该只返回字符串中的字符数。没有别的了。确实如此,无论是计算 ASCII 字符还是 Unicode 字符。字符就是一个字符,指向 ASCII 表或 UTF-8 表上的给定位置。而已。

如果您出于某种原因想知道字符串的字节长度,那么您应该使用不同的函数。我是 PHP 脚本编写的新手,所以我还没有找到该功能。 (应该类似于“bytelen()”?)

In my opinion the 'strlen' function should only return the number of characters in a string. Nothing else. And it does, whether it counts ASCII characters or Unicode characters. A character is a character, pointing to a given position on an ASCII table or a UTF-8 table. Nothing more.

If you would like to know, for whatever reason, the byte-length of a string, then you should use a differtent function. I am a newby in PHP scripting, so I did not find that function yet. (Should be something like 'bytelen()'?)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

流星番茄 2024-08-31 08:58:27

mb_strlen() 的作用你在追赶。

mb_strlen() does what you're after.

2024-08-31 08:58:27

是的,这将是最合乎逻辑的设计。然而,PHP 从一开始就没有计划支持多字节字符集。相反,多年来它一直在以一种混乱的方式发展。您已将您的问题标记为 PHP 4,但 PHP 5 还没有像样的 Unicode 支持(而且我认为它在不久的将来不会改变)。

无论如何,有几个原因:

  • PHP 不是由企业规则控制的集中设计的公司拥有的闭源商业产品。

  • PHP 于 1995 年作为个人项目发布,由某人在其静态主页中需要一些功能:当时不需要 Unicode 支持。

  • 如果您修改像 strlen() 这样的核心函数,您​​必须以不会破坏以前功能的方式进行操作。这并不容易。编写新的单独函数要容易得多。

更新

抱歉,我忘记了你问题的第二部分。如果您需要处理 Unicode 字符串,则必须使用一组单独的函数:

您可能还会发现这些章节很有趣:

请注意每个版本所需的 PHP 版本您计划使用的功能; PHP 4 已经很老了。

Yes, that would be most logical design. However, PHP has not been planned to support multibyte charsets from the beginning. Instead, it's been evolving along the years in a sort of chaotic manner. You've tagged your question as PHP 4 but PHP 5 does not have a decent Unicode support yet (and I don't think it'll change in a nearby future).

There're a few reasons for this anyway:

  • PHP is not a closed-source commercial product owned by a company with a centralized design controlled by enterprise rules.

  • PHP was released in 1995 as a personal project by someone who needed some functionality in his static home page: at that time, it had no need for Unicode support.

  • If you modify core functions like strlen() you must do it in a way that it doesn't break previous functionality. It's not easy. Writing a new separate function is much easier.

Update

Sorry, I forgot the second part of your question. If you need to handle Unicode strings you have to use a separate set of functions:

You might also find these chapters interesting:

Please take note of the PHP version required by each function you are planning to use; PHP 4 is pretty old.

来世叙缘 2024-08-31 08:58:27

如果我没有严重误解你的话,那么 strlen() 你的“bytelen()”,正如其他回复中提到的那样。

strlen()本身不支持utf-8或其他多字节字符集;如果你想要一个合适的 strlen(),你需要 mb_strlen()

Pentium10 的函数 strBytes($str),从浏览它(未测试)来看,如果您知道您的编码是 utf-8 并且您坚持使用超低版本,那么它似乎是一个不错的选择由于某种原因PHP4。

(我确实建议您查看 Álvaro G. Vicario 的帖子,了解此行为背后的原因。PHP6 将提供适当的原生 UTF-8 支持。)

If I'm not grossly misunderstanding you, then strlen() is your 'bytelen()', as alluded to in the other responses here.

strlen() itself has no support for utf-8 or other multi-byte character sets; if you want a proper strlen(), you'll need mb_strlen().

Pentium10's function strBytes($str), from glancing over it (not testing) looks like it would be a good alternative if you know your encoding is utf-8 and you're stuck with a super low version of PHP4 for some reason.

(And I do recommend taking a look at Álvaro G. Vicario's post for the reasons behind this behaviour. Proper, native UTF-8 support is due to come with PHP6.)

記憶穿過時間隧道 2024-08-31 08:58:27
/** 
     * Count the number of bytes of a given string. 
     * Input string is expected to be ASCII or UTF-8 encoded. 
     * Warning: the function doesn't return the number of chars 
     * in the string, but the number of bytes. 
     * 
     * @param string $str The string to compute number of bytes 
     * 
     * @return The length in bytes of the given string. 
     */ 
    function strBytes($str) 
    { 
      // STRINGS ARE EXPECTED TO BE IN ASCII OR UTF-8 FORMAT 

      // Number of characters in string 
      $strlen_var = strlen($str); 

      // string bytes counter 
      $d = 0; 

     /* 
      * Iterate over every character in the string, 
      * escaping with a slash or encoding to UTF-8 where necessary 
      */ 
      for ($c = 0; $c < $strlen_var; ++$c) { 

          $ord_var_c = ord($str{$d}); 

          switch (true) { 
              case (($ord_var_c >= 0x20) && ($ord_var_c <= 0x7F)): 
                  // characters U-00000000 - U-0000007F (same as ASCII) 
                  $d++; 
                  break; 

              case (($ord_var_c & 0xE0) == 0xC0): 
                  // characters U-00000080 - U-000007FF, mask 110XXXXX 
                  // see http://www.cl.cam.ac.uk/~mgk25/unicode.html#utf-8 
                  $d+=2; 
                  break; 

              case (($ord_var_c & 0xF0) == 0xE0): 
                  // characters U-00000800 - U-0000FFFF, mask 1110XXXX 
                  // see http://www.cl.cam.ac.uk/~mgk25/unicode.html#utf-8 
                  $d+=3; 
                  break; 

              case (($ord_var_c & 0xF8) == 0xF0): 
                  // characters U-00010000 - U-001FFFFF, mask 11110XXX 
                  // see http://www.cl.cam.ac.uk/~mgk25/unicode.html#utf-8 
                  $d+=4; 
                  break; 

              case (($ord_var_c & 0xFC) == 0xF8): 
                  // characters U-00200000 - U-03FFFFFF, mask 111110XX 
                  // see http://www.cl.cam.ac.uk/~mgk25/unicode.html#utf-8 
                  $d+=5; 
                  break; 

              case (($ord_var_c & 0xFE) == 0xFC): 
                  // characters U-04000000 - U-7FFFFFFF, mask 1111110X 
                  // see http://www.cl.cam.ac.uk/~mgk25/unicode.html#utf-8 
                  $d+=6; 
                  break; 
              default: 
                $d++;    
          } 
      } 

      return $d; 
    } 
/** 
     * Count the number of bytes of a given string. 
     * Input string is expected to be ASCII or UTF-8 encoded. 
     * Warning: the function doesn't return the number of chars 
     * in the string, but the number of bytes. 
     * 
     * @param string $str The string to compute number of bytes 
     * 
     * @return The length in bytes of the given string. 
     */ 
    function strBytes($str) 
    { 
      // STRINGS ARE EXPECTED TO BE IN ASCII OR UTF-8 FORMAT 

      // Number of characters in string 
      $strlen_var = strlen($str); 

      // string bytes counter 
      $d = 0; 

     /* 
      * Iterate over every character in the string, 
      * escaping with a slash or encoding to UTF-8 where necessary 
      */ 
      for ($c = 0; $c < $strlen_var; ++$c) { 

          $ord_var_c = ord($str{$d}); 

          switch (true) { 
              case (($ord_var_c >= 0x20) && ($ord_var_c <= 0x7F)): 
                  // characters U-00000000 - U-0000007F (same as ASCII) 
                  $d++; 
                  break; 

              case (($ord_var_c & 0xE0) == 0xC0): 
                  // characters U-00000080 - U-000007FF, mask 110XXXXX 
                  // see http://www.cl.cam.ac.uk/~mgk25/unicode.html#utf-8 
                  $d+=2; 
                  break; 

              case (($ord_var_c & 0xF0) == 0xE0): 
                  // characters U-00000800 - U-0000FFFF, mask 1110XXXX 
                  // see http://www.cl.cam.ac.uk/~mgk25/unicode.html#utf-8 
                  $d+=3; 
                  break; 

              case (($ord_var_c & 0xF8) == 0xF0): 
                  // characters U-00010000 - U-001FFFFF, mask 11110XXX 
                  // see http://www.cl.cam.ac.uk/~mgk25/unicode.html#utf-8 
                  $d+=4; 
                  break; 

              case (($ord_var_c & 0xFC) == 0xF8): 
                  // characters U-00200000 - U-03FFFFFF, mask 111110XX 
                  // see http://www.cl.cam.ac.uk/~mgk25/unicode.html#utf-8 
                  $d+=5; 
                  break; 

              case (($ord_var_c & 0xFE) == 0xFC): 
                  // characters U-04000000 - U-7FFFFFFF, mask 1111110X 
                  // see http://www.cl.cam.ac.uk/~mgk25/unicode.html#utf-8 
                  $d+=6; 
                  break; 
              default: 
                $d++;    
          } 
      } 

      return $d; 
    } 
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文