将旧版 Perl 代码迁移到 UTF-8 时会出现哪些问题?
到目前为止,我工作的项目仅在源代码中使用了 ASCII。由于 I18N 领域即将发生的一些变化,也因为我们在测试中需要一些 Unicode 字符串,我们正在考虑硬着头皮将源代码移至 UTF-8,同时使用 utf8
pragma ( use utf8;
)
由于代码现在是 ASCII 格式,我预计代码本身不会有任何问题。但是,我不太清楚我们可能会遇到的任何副作用,但考虑到我们的环境(perl5.8.8、Apache2、mod_perl、带有 FreeTDS 驱动程序的 MSSQL Server),我认为很可能会出现一些副作用。
如果您过去进行过此类迁移:我会遇到什么问题?我该如何管理它们?
Until now, the project I work in used ASCII only in the source code. Due to several upcoming changes in I18N area and also because we need some Unicode strings in our tests, we are thinking about biting the bullet and move the source code to UTF-8, while using the utf8
pragma (use utf8;
)
Since the code is in ASCII now, I don't expect to have any troubles with the code itself. However, I'm not quite aware of any side effects we might be getting, while I think it's quite probable that I will get some, considering our environment (perl5.8.8, Apache2, mod_perl, MSSQL Server with FreeTDS driver).
If you have done such migrations in the past: what problems can I expect? How can I manage them?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
utf8
pragma 只是告诉 Perl 您的源代码是 UTF-8 编码的。如果您在源代码中只使用了 ASCII,那么 Perl 理解源代码不会有任何问题。为了安全起见,您可能想在源代码管理中创建一个分支。 :)如果您需要处理文件中的 UTF-8 数据,或将 UTF-8 写入文件,则需要在文件句柄上设置编码,并按照外部位的预期对数据进行编码。例如,请参见 使用utf8编码的Perl脚本,可以打开GB2312编码的文件名吗?。
查看 Perl 文档,了解有关 Unicode 的信息:
另请参阅 Juerd 的 Perl Unicode 建议。
The
utf8
pragma merely tells Perl that your source code is UTF-8 encoded. If you have only used ASCII in your source, you won't have any problems with Perl understanding the source code. You might want to make a branch in your source control just to be safe. :)If you need to deal with UTF-8 data from files, or write UTF-8 to files, you'll need to set the encodings on your filehandles and encode your data as external bits expect it. See, for instance, With a utf8-encoded Perl script, can it open a filename encoded as GB2312?.
Check out the Perl documentation that tells you about Unicode:
Also see Juerd's Perl Unicode Advice.
几年前,我将我们内部的 mod_perl 平台 (~35k LOC) 迁移到 UTF-8。以下是我们必须考虑/更改的事情:
open($fh,"< ;:utf8",$filename)
:raw< 较旧的 perls(甚至 5.8.x 版本)中的/code> 层
$b=substr(lc($utf8string),0,2048)
随机失败,但$a=lc($utf8string);$b=substr($a,0,2048)< /代码> 有效!
$uri=utf_decode($r->uri())
)还有一点 - 这是黄金法则 - 不要只是破解直到它起作用,确保你完全理解给定环境中发生的事情/解码情况!
我确信您已经解决了大部分问题,但希望所有这些都可以帮助那里的人避免我们经历的长时间调试。
A few years ago I moved our in-house mod_perl platform (~35k LOC) to UTF-8. Here are the things which we had to consider/change:
open($fh,"<:utf8",$filename)
:raw
layer$b=substr(lc($utf8string),0,2048)
fails randomly but$a=lc($utf8string);$b=substr($a,0,2048)
works!$uri=utf_decode($r->uri())
)<meta>
One more - this is the golden rule - don't just hack til it works, make sure you fully understand what's happening in a given en/decoding situation!
I'm sure you already had most of these sorted out but hopefully all that helps someone out there avoid the many hours debugging which we went through.