Perl:文本连接后编码混乱
我在更新/升级一些遗留代码时遇到了奇怪的情况。
我有一个包含 HTML 的变量。在我输出它之前,它必须填充大量数据。本质上,我有以下内容:
for my $line (@lines) {
$output = loadstuff($line, $output);
}
在 loadstuff()
内部,有以下内容
sub loadstuff {
my ($line, $output) = @_;
# here the process is simplified for better understanding.
my $stuff = getOtherStuff($line);
my $result = $output.$stuff;
return $result;
}
此函数构建一个由不同区域组成的页面。所有区域都是独立加载的,这就是为什么有一个 for 循环。
麻烦就从这里开始。当我从头开始加载页面时(单击链接,Perl 执行并提供 HTML),一切都加载得很好。每当我通过 AJAX 加载第二个页面进行比较时,该 HTML 就会破坏编码。
我将问题追踪到了这一行 my $result = $output.$stuff
。在连接之前,$output
和 $stuff
都很好。但之后,$result
中的编码就混乱了。
有人知道为什么串联会弄乱我的编码吗?当我们讨论这个主题时,为什么只有通过 AJAX 完成调用时才会发生这种情况?
编辑 1
Perl 和 AJAX 调用都执行相同的功能来构建页面。因此,每当我修复 AJAX 问题时,新加载的页面都会出现问题。似乎只有 AJAX 启动调用时才会发生这种情况。
在这种特殊情况下,唯一的区别是页面的当前值与旧值进行比较(这是备份/恢复功能)。从这里开始,一切都一样了。变量中的编码(据我所知)是可以的。我什至仅对从 AJAX 加载的值尝试了 Encode 函数,但无济于事。根据“Kate”的说法,文件本身似乎是 utf8。
除此之外,我还有另一个具有相同行为的函数,它使用完全相同的函数、值和文件。当从 Perl/Apache 启动调用时,编码正常。通过 AJAX,再次变得混乱。
我一直在检查 AJAX 请求(jQuery),但没有发现任何奇怪的东西。编码好像也是utf8。
I have encountered a weird situation while updating/upgrading some legacy code.
I have a variable which contains HTML. Before I can output it, it has to be filled with lots of data. In essence, I have the following:
for my $line (@lines) {
$output = loadstuff($line, $output);
}
Inside of loadstuff()
, there is the following
sub loadstuff {
my ($line, $output) = @_;
# here the process is simplified for better understanding.
my $stuff = getOtherStuff($line);
my $result = $output.$stuff;
return $result;
}
This function builds a page which consists of different areas. All area is loaded up independently, that's why there is a for-loop.
Trouble starts right about here. When I load the page from ground up (click on a link, Perl executes and delivers HTML), everything is loaded fine. Whenever I load a second page via AJAX for comparison, that HTML has broken encoding.
I tracked down the problem to this line my $result = $output.$stuff
. Before the concatenation, $output
and $stuff
are fine. But afterward, the encoding in $result
is messed up.
Does somebody have a clue why concatenation messes up my encoding? While we are on the subject, why does it only happen when the call is done via AJAX?
Edit 1
The Perl and the AJAX call both execute the very same functions for building up a page. So, whenever I fix it for AJAX, it is broken for freshly reloaded pages. It really seems to happen only if AJAX starts the call.
The only difference in this particular case is that the current values for the page are compared with an older one (it is a backup/restore function). From here, everything is the same. The encoding in the variables (as far as I can tell) are ok. I even tried the Encode functions only on the values loaded from AJAX, but to no avail. The files themselves seem to be utf8 according to "Kate".
Besides that, I have a another function with the same behavior which uses the EXACT same functions, values and files. When the call is started from Perl/Apache, the encoding is ok. Via AJAX, again, it is messed up.
I have been examinating the AJAX Request (jQuery) and could not find anything odd. The encoding seems to be utf8 too.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
Perl 为每个标量值都有一个“utf8”标志,该标志可以是“on”或“off”。标志的“On”状态告诉 perl 将值视为 Unicode 字符的字符串。
如果您将一个 utf8 标志关闭的字符串与一个 utf8 标志打开的字符串连接起来,perl 会将第一个字符串转换为 Unicode。这是问题的常见根源。
在连接之前,您需要使用
Encode::encode()
将两个变量转换为字节,或者使用Encode::decode()
将两个变量转换为 Perl 的内部格式。请参阅 perldoc 编码。
Perl has a “utf8” flag for every scalar value, which may be “on” or “off”. “On” state of the flag tells perl to treat the value as a string of Unicode characters.
If you take a string with utf8 flag off and concatenate it with a string that has utf8 flag on, perl converts the first one to Unicode. This is the usual source of problems.
You need to either convert both variables to bytes with
Encode::encode()
or to perl's internal format withEncode::decode()
before concatenation.See perldoc Encode.
扩展之前的答案,这里有一些更多的信息,当我开始在 Perl 中搞乱字符编码时,我发现这些信息很有用。
这是对 Perl 中 Unicode 的精彩介绍: http://perldoc.perl.org/perluniintro.html< /a>. “Perl 的 Unicode 模型”部分与您所看到的问题特别相关。
在 Perl 中使用的一个好规则是在数据传入时将其解码为 Perl 字符,并在传出时将其编码为字节。您可以使用
Encode::encode
和Encode::decode
显式执行此操作。如果您正在读取/写入文件句柄,您可以使用binmode
并设置层来指定文件句柄上的编码:perldoc -f binmode
您可以分辨出哪个示例中的字符串已使用
Encode::is_utf8
解码为 Perl 字符:Expanding on the previous answer, here's a little more information that I found useful when I started messing with character encodings in Perl.
This is an excellent introduction to Unicode in perl: http://perldoc.perl.org/perluniintro.html. The section "Perl's Unicode Model" is particularly relevant to the issue you're seeing.
A good rule to use in Perl is to decode data to Perl characters on it's way in and encode it into bytes on it's way out. You can do this explicitly using
Encode::encode
andEncode::decode
. If you're reading from/writing to a file handle you can specify an encoding on the filehandle by usingbinmode
and setting layer:perldoc -f binmode
You can tell which of the strings in your example has been decoded into Perl characters using
Encode::is_utf8
:我的一位同事找到了这个问题的答案。这确实与 AJAX 发起调用有关。
文件结构如下:
1个Handler,由Apache访问
1 个处理程序,由 Apache 访问,但仅包含 AJAX 响应程序。我们称之为 AJAX-Handler
1 个包,其中包含与整个软件相关的功能,它们从我们自己的框架访问其他包
在 AJAX-Handler 内部,我们这样打印结果
现在,当我替换
$r->print($ output);
通过print($output);
,问题就消失了!我知道这不是在 mod_perl 中打印内容的推荐方法,但这似乎可行。尽管如此,任何如何以正确的方式做到这一点的想法都是受欢迎的。
A colleague of mine found the answer to this problem. It really had something to do with the fact that AJAX started the call.
The file structure is as follows:
1 Handler, accessed by Apache
1 Handler, accessed by Apache but who only contains AJAX responders. We call it the AJAX-Handler
1 package, which contains functions relevant for the entire software, who access yet other packages from our own Framework
Inside of the AJAX-Handler, we print the result as such
Now, when I replace
$r->print($output);
byprint($output);
, the problem disappears! I know that this is not the recommended way to print stuff in mod_perl, but this seems to work.Still, any ideas how to do this the proper way are welcome.