编码 - 字符串字节长度

发布于 2025-02-09 01:40:30 字数 1020 浏览 1 评论 0原文

file1.pl

use strict;
use warnings;
use Encode;

my @flist = `svn diff --summarize ...`;

foreach my $file (@flist) {
  my $foo = "$one/$file";
  use bytes;
  print(bytes::length($one)."\n");
  print(bytes::length($file)."\n");
  print(bytes::length($foo)."\n");
}
# 76
# 31
# 108

我已经使用相同的主逻辑 和file2.pl。但是在file2.pl中,输出为:

# 76
# 31
# 110 <-- ?

两个文件具有相同的编码(ISO-8859-1)。对于与file1.pl中的结果相同的结果,我要

my $foo = "$one/".decode('UTF-8', $file);

file2.pl中使用。 decode('utf-8',$ file) in file2.pl中的decode('utf-8',$ file)的原因可能是什么?似乎与如果我不解码?怎么办?谢谢。

Perl v5.10.1

I've file file1.pl:

use strict;
use warnings;
use Encode;

my @flist = `svn diff --summarize ...`;

foreach my $file (@flist) {
  my $foo = "$one/$file";
  use bytes;
  print(bytes::length($one)."\n");
  print(bytes::length($file)."\n");
  print(bytes::length($foo)."\n");
}
# 76
# 31
# 108

and file2.pl with the same main logic. But in file2.pl the output is:

# 76
# 31
# 110 <-- ?

Both files have the same encoding (ISO-8859-1). For the same result as in file1.pl I've to use

my $foo = "$one/".decode('UTF-8', $file);

in file2.pl. What could be the reason for that difference or the requirement of decode('UTF-8', $file) in file2.pl? Seems to be related to What if I don't decode? but in which manner and only in file2.pl? Thx.

Perl v5.10.1

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

纸伞微斜 2025-02-16 01:40:30

不要使用字节。

强烈劝阻除调试目的以外的任何其他方法。

bytes ::长度获得字符串内部存储的长度。没用的。


可能有什么原因

$一个$ file包含使用不同内部存储格式存储的字符串。需要转换一个以进行连接。

use strict;
use warnings;
use feature qw( say );
use bytes qw( );
use Encode qw( encode );

sub dump_lengths {
   my $s = shift;
   say
      join " ",
         length( $s ),
         length( encode( "UTF-8", $s ) ),
         bytes::length( $s );
}
                         # +------ Length of string
my $x = chr( 0xE9 );     # | +---- Length of its UTF-8 encoding
my $y = chr( 0x2660 );   # | | +-- Length of internal storage
                         # | | |
dump_lengths( $x );      # 1 2 1
dump_lengths( $y );      # 1 3 3

my $z = $x . $y;

dump_lengths( $z );      # 2 5 5

Don't use bytes.

Use of this module for anything other than debugging purposes is strongly discouraged.

bytes::length gets the length of the internal storage of a string. It's useless.


What could be the reason for that difference

$one and $file contained strings stored using different internal storage formats. One needed to be converted for a concatenation to occur.

use strict;
use warnings;
use feature qw( say );
use bytes qw( );
use Encode qw( encode );

sub dump_lengths {
   my $s = shift;
   say
      join " ",
         length( $s ),
         length( encode( "UTF-8", $s ) ),
         bytes::length( $s );
}
                         # +------ Length of string
my $x = chr( 0xE9 );     # | +---- Length of its UTF-8 encoding
my $y = chr( 0x2660 );   # | | +-- Length of internal storage
                         # | | |
dump_lengths( $x );      # 1 2 1
dump_lengths( $y );      # 1 3 3

my $z = $x . $y;

dump_lengths( $z );      # 2 5 5
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文