如何使用 Perl 的 DBI 处理 unicode?

发布于 2024-07-24 08:00:15 字数 532 浏览 7 评论 0原文

我的 delicious-to-wp perl script 可以工作,但会为所有“奇怪”的字符提供更奇怪的输出。 所以我尝试了

$description = decode_utf8( $description ); 

但这并没有什么区别。 我希望“go live”变成“go live”而不是“go live” 我如何在 Perl 中处理 unicode 以便它可以工作?

更新:我发现问题是设置我必须在 Perl 中设置的 DBI 的 utf:

my $sql = qq{SET NAMES 'utf8';};
$dbh->do($sql);

这是我必须设置的部分,很棘手。 谢谢!

My delicious-to-wp perl script works but gives for all "weird" characters even weirder output.
So I tried

$description = decode_utf8( $description ); 

but that doesnt make a difference. I would like e.g. “go live” to become “go live” and not “go live†How can I handle unicode in Perl so that this works?

UPDATE: I found the problem was to set utf of DBI I had to set in Perl:

my $sql = qq{SET NAMES 'utf8';};
$dbh->do($sql);

That was the part that I had to set, tricky. Thanks!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

心如狂蝶 2024-07-31 08:00:15

值得注意的是,如果您运行的 DBD::mysql 版本足够新(3.0008 上),您可以执行以下操作:$dbh->{'mysql_enable_utf8'} = 1; 和然后,在从/进入 DBI 的过程中,一切都会为您进行解码()/编码()。

It's worth noting that if you're running a version of DBD::mysql new enough (3.0008 on), you can do the following: $dbh->{'mysql_enable_utf8'} = 1; and then everything's decode()ed/encode()ed for you on the way out from/in to DBI.

迷离° 2024-07-31 08:00:15

当您像这样连接到数据库时,启用 UTF8:

my $dbh = DBI->connect(
    "dbi:mysql:dbname=db_name", 
    "db_user", "db_pass",
     {RaiseError => 0, PrintError => 0, mysql_enable_utf8 => 1}
 ) or die "Connect to database failed.";

这应该会为您提供根据需要设置了 UTF8 标志的字符模式字符串。

来自 DBI 通用接口规则和规则 注意事项

Perl 支持两种字符串:Unicode(内部为 utf8)和非 Unicode(如果强制采用编码,则默认为 iso-8859-1)。 驱动程序应该接受两种字符串,并且如果需要,将它们转换为正在使用的数据库的字符集。 同样,当从数据库中获取非 iso-8859-1 字符数据时,驱动程序应将其转换为 utf8。

详细信息来自 DBD::mysql对于 mysql_enable_utf8

此外,打开此标志会告诉 MySQL 传入的数据应被视为 UTF-8。 仅当用作 connect() 调用的一部分时,这才会生效。 如果您在连接后打开该标志,则需要发出命令 SET NAMES utf8 才能获得相同的效果。

Enable UTF8, when you connect to database like this:

my $dbh = DBI->connect(
    "dbi:mysql:dbname=db_name", 
    "db_user", "db_pass",
     {RaiseError => 0, PrintError => 0, mysql_enable_utf8 => 1}
 ) or die "Connect to database failed.";

This should get you character mode strings with the UTF8 flag set as needed.

From DBI General Interface Rules & Caveats:

Perl supports two kinds of strings: Unicode (utf8 internally) and non-Unicode (defaults to iso-8859-1 if forced to assume an encoding). Drivers should accept both kinds of strings and, if required, convert them to the character set of the database being used. Similarly, when fetching from the database character data that isn't iso-8859-1 the driver should convert it into utf8.

And the specifics from DBD::mysql for mysql_enable_utf8

Additionally, turning on this flag tells MySQL that incoming data should be treated as UTF-8. This will only take effect if used as part of the call to connect(). If you turn the flag on after connecting, you will need to issue the command SET NAMES utf8 to get the same effect.

同展鸳鸯锦 2024-07-31 08:00:15

该术语

$dbh->do(qq{SET NAMES 'utf8';});

绝对可以节省访问 utf-8 声明的数据库的时间,但请注意,如果您要对从数据库获取的任何数据进行任何 perl 处理,明智的做法是将其存储在将 perl var 作为 utf8 字符串,因为此操作不是隐式的。

$utfstring = decode('utf8',$string_from_db);

当然,为了正确地处理 utf8 字符串(读取、打印、写入输出),请记住进行设置

use open ':utf8';

binmode STDOUT, ":utf8";

后者对于打印 utf8 字符串至关重要。 希望这可以帮助。

The term

$dbh->do(qq{SET NAMES 'utf8';});

definitely saves the day for accessing an utf-8 declared database, but take notice, if you are going to do any perl processing of any data obatined from the db it would be wise to store it in a perl var as an utf8 string with, as this operation is not implicit.

$utfstring = decode('utf8',$string_from_db);

of course, for proper i/o handling of utf8 strings (reading, printing, writing to output) remember to set

use open ':utf8';

and

binmode STDOUT, ":utf8";

the latter being essential for printing out utf8 strings. Hope this helps.

痕至 2024-07-31 08:00:15

它可能与 Perl 无关。 检查并确保您在相关 MySQL 表列中使用 UTF 编码。

It may have nothing to do with Perl. Check to make sure you're using UTF encodings in the pertinent MySQL table columns.

假装爱人 2024-07-31 08:00:15

请将此 öne 排除在外:

binmode STDOUT, ":utf8";

使用时

$dbh->do(qq{SET NAMES 'utf8';});

否则您的输出将具有双重 utf8 编码,导致无法读取双字节字符!
我花了几个小时才弄清楚这一点..

Leave this öne out:

binmode STDOUT, ":utf8";

when using:

$dbh->do(qq{SET NAMES 'utf8';});

Otherwise your output will have double utf8 encoding, resulting in unreadable double byte characters!
It took me a couple of hours to figure this out..

傾城如夢未必闌珊 2024-07-31 08:00:15

默认情况下,Perl/MySQL 驱动程序处理二进制数据(至少我从 MySQL 5.1 和 5.5 的一些实验中得出了这一结论)。

在不设置 mysql_enable_utf8 的情况下,我在向数据库写入/读取数据库之前将字符串编码为 UTF-8 或从 UTF-8 解码。

它不应该依赖于 perl 内部字符串表示形式作为字节数组; 请注意,内部 'utf8' 不保证是标准 UTF-8; 相反,单字节编码不保证为 ISO-8859-1; 确实对 UTF-8(而不是“utf8”)进行编码/解码。

还有MySQL的一些设置(如上面的SET NAMES,据我记得有客户端编码,连接编码和服务器编码,如果它们不都具有相同的值,它们的交互对我来说不是很清楚)关于编码; 将它们全部设置为 UTF-8,上面的方法对我有用。

By default, the driver Perl/MySQL handles binary data (at least I concluded this from some experiments with MySQL 5.1 and 5.5).

Without setting mysql_enable_utf8, I encoded/decoded the strings to/from UTF-8 before writing/reading to/from the database.

It should not be relied upon the perl-internal string representation as an array of byte; be aware that the internal 'utf8' is not guaranteed to be standard UTF-8; in converse, the single byte encoding is not guaranteed to be ISO-8859-1; really do encode/decode to/from UTF-8 (and not 'utf8').

There are also some settings of MySQL (like SET NAMES above, as far as I remember there is a client encoding, a connection encoding, and a server encoding, whose interactions are not quite clear to me if they do not all have the same value) regarding to the encodings; setting all of them to UTF-8, and the recipe above, worked for me.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文