如何使用 Perl 的 DBI 处理 unicode?
我的 delicious-to-wp perl script 可以工作,但会为所有“奇怪”的字符提供更奇怪的输出。 所以我尝试了
$description = decode_utf8( $description );
但这并没有什么区别。 我希望“go live”变成“go live”而不是“go live” 我如何在 Perl 中处理 unicode 以便它可以工作?
更新:我发现问题是设置我必须在 Perl 中设置的 DBI 的 utf:
my $sql = qq{SET NAMES 'utf8';};
$dbh->do($sql);
这是我必须设置的部分,很棘手。 谢谢!
My delicious-to-wp perl script works but gives for all "weird" characters even weirder output.
So I tried
$description = decode_utf8( $description );
but that doesnt make a difference. I would like e.g. “go live” to become “go live” and not “go live†How can I handle unicode in Perl so that this works?
UPDATE: I found the problem was to set utf of DBI I had to set in Perl:
my $sql = qq{SET NAMES 'utf8';};
$dbh->do($sql);
That was the part that I had to set, tricky. Thanks!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
值得注意的是,如果您运行的 DBD::mysql 版本足够新(3.0008 上),您可以执行以下操作:
$dbh->{'mysql_enable_utf8'} = 1;
和然后,在从/进入 DBI 的过程中,一切都会为您进行解码()/编码()。It's worth noting that if you're running a version of DBD::mysql new enough (3.0008 on), you can do the following:
$dbh->{'mysql_enable_utf8'} = 1;
and then everything's decode()ed/encode()ed for you on the way out from/in to DBI.当您像这样连接到数据库时,启用 UTF8:
这应该会为您提供根据需要设置了 UTF8 标志的字符模式字符串。
来自 DBI 通用接口规则和规则 注意事项:
详细信息来自 DBD::mysql对于 mysql_enable_utf8
Enable UTF8, when you connect to database like this:
This should get you character mode strings with the UTF8 flag set as needed.
From DBI General Interface Rules & Caveats:
And the specifics from DBD::mysql for mysql_enable_utf8
该术语
绝对可以节省访问 utf-8 声明的数据库的时间,但请注意,如果您要对从数据库获取的任何数据进行任何 perl 处理,明智的做法是将其存储在将 perl var 作为 utf8 字符串,因为此操作不是隐式的。
当然,为了正确地处理 utf8 字符串(读取、打印、写入输出),请记住进行设置
,
后者对于打印 utf8 字符串至关重要。 希望这可以帮助。
The term
definitely saves the day for accessing an utf-8 declared database, but take notice, if you are going to do any perl processing of any data obatined from the db it would be wise to store it in a perl var as an utf8 string with, as this operation is not implicit.
of course, for proper i/o handling of utf8 strings (reading, printing, writing to output) remember to set
and
the latter being essential for printing out utf8 strings. Hope this helps.
它可能与 Perl 无关。 检查并确保您在相关 MySQL 表列中使用 UTF 编码。
It may have nothing to do with Perl. Check to make sure you're using UTF encodings in the pertinent MySQL table columns.
请将此 öne 排除在外:
使用时
否则您的输出将具有双重 utf8 编码,导致无法读取双字节字符!
我花了几个小时才弄清楚这一点..
Leave this öne out:
when using:
Otherwise your output will have double utf8 encoding, resulting in unreadable double byte characters!
It took me a couple of hours to figure this out..
默认情况下,Perl/MySQL 驱动程序处理二进制数据(至少我从 MySQL 5.1 和 5.5 的一些实验中得出了这一结论)。
在不设置 mysql_enable_utf8 的情况下,我在向数据库写入/读取数据库之前将字符串编码为 UTF-8 或从 UTF-8 解码。
它不应该依赖于 perl 内部字符串表示形式作为字节数组; 请注意,内部 'utf8' 不保证是标准 UTF-8; 相反,单字节编码不保证为 ISO-8859-1; 确实对 UTF-8(而不是“utf8”)进行编码/解码。
还有MySQL的一些设置(如上面的SET NAMES,据我记得有客户端编码,连接编码和服务器编码,如果它们不都具有相同的值,它们的交互对我来说不是很清楚)关于编码; 将它们全部设置为 UTF-8,上面的方法对我有用。
By default, the driver Perl/MySQL handles binary data (at least I concluded this from some experiments with MySQL 5.1 and 5.5).
Without setting mysql_enable_utf8, I encoded/decoded the strings to/from UTF-8 before writing/reading to/from the database.
It should not be relied upon the perl-internal string representation as an array of byte; be aware that the internal 'utf8' is not guaranteed to be standard UTF-8; in converse, the single byte encoding is not guaranteed to be ISO-8859-1; really do encode/decode to/from UTF-8 (and not 'utf8').
There are also some settings of MySQL (like SET NAMES above, as far as I remember there is a client encoding, a connection encoding, and a server encoding, whose interactions are not quite clear to me if they do not all have the same value) regarding to the encodings; setting all of them to UTF-8, and the recipe above, worked for me.