Perl 解码西里尔字母字符串

发布于 2024-11-07 18:10:10 字数 641 浏览 0 评论 0原文

我遇到以下字符串问题:

$str="this is \321\213\321\213\321\213\321\213\321\213 \321\201\320\277\320\260\321\200\321\202\320\260\321\200";

该字符串位于 ascii 文本文件中,我想存储在 Mysql 数据库(utf8)中。 \321\231 ... 是西里尔字母符号。

我该怎么做才能使 \321\213 看起来像 Mysql db 中的西里尔字符

这应该在 RFC2047 中描述,最终看起来像是 utf7 到 utf8 的转换.. 不知道确切。 它的“unicode escape”

工作变体:

use Encode::Escape;
$var1='\321\213';
         print decode 'unicode-escape', $var1;
#correct mysql view in phpmyadmin
$dbh = DBI->connect('DBI:mysql:database=test', 'testuser', 'testpass', { mysql_enable_utf8 => 1});

I've got a problem with the following string:

$str="this is \321\213\321\213\321\213\321\213\321\213 \321\201\320\277\320\260\321\200\321\202\320\260\321\200";

This string is located in an ascii text file and I want to store in a Mysql db (utf8). \321\231 ... are cyrillic symbols.

What can I do to make \321\213 look like cyrillic characters in Mysql db

This should be described in RFC2047, end look like it was utf7 to utf8 conversion.. dont know excatly.
its "unicode escape"

working variant:

use Encode::Escape;
$var1='\321\213';
         print decode 'unicode-escape', $var1;
#correct mysql view in phpmyadmin
$dbh = DBI->connect('DBI:mysql:database=test', 'testuser', 'testpass', { mysql_enable_utf8 => 1});

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

丑丑阿 2024-11-14 18:10:10

这根本不可引用打印。这是一系列八位位组的 Perl 带引号的字符串表示形式,也称为 PERLQQ。数字是八进制的。

这些字节大部分采用 UTF-8 编码,但数据包含两个错误。看起来每个角色的一半都不知何故脱落了。我在下面用箭头标记了它。

my $octets = "this is \321\213\321\213\321\213\321\213\321 \321\201\320\277\320\260\321\200\321\202\320\260\321";
#                                                     ↑↑↑↑                                                 ↑↑↑↑

这在 UTF-8 中无效,但可以修复。我们放置Unicode 替换字符

use Encode qw(decode);
my $characters = decode 'UTF-8', $octets, Encode::FB_DEFAULT | Encode::LEAVE_SRC;
# this is ыыыы� спарта�

现在可以像往常一样简单地将这个字符串插入到数据库中。 DBI 或 DBIx::Class 的 connect 调用中的 DSN 必须包含属性 mysql_enable_utf8

connect('DBI:mysql:foobar;mysql_enable_utf8=1', …, …);

This is not quoted-printable at all. This is Perl quoted string representation, also know as PERLQQ, of a series of octets. The numbers are octal.

These bytes encode UTF-8 for the most part, but the data contain two errors. Looks like one half of a character each somehow fell off. I have marked it with arrows just below.

my $octets = "this is \321\213\321\213\321\213\321\213\321 \321\201\320\277\320\260\321\200\321\202\320\260\321";
#                                                     ↑↑↑↑                                                 ↑↑↑↑

This invalid in UTF-8, but can be repaired. We put the Unicode replacement character.

use Encode qw(decode);
my $characters = decode 'UTF-8', $octets, Encode::FB_DEFAULT | Encode::LEAVE_SRC;
# this is ыыыы� спарта�

This character string can now be simply inserted into the database as usual. The DSN in the connect call for DBI or DBIx::Class must include the attribute mysql_enable_utf8.

connect('DBI:mysql:foobar;mysql_enable_utf8=1', …, …);
网名女生简单气质 2024-11-14 18:10:10

您需要将代码显式转换为字符。为此,您需要知道输入编码是什么。我想它是 iso-8859-5,但也可能是 windows-1252 或其他。

use Encode qw( decode );

my $str="this is \321\213\321\213\321\213\321\213\321 \321\201\320\277\320\260\321\200\321\202\320\260\321";
my $out .= from_to( "iso-8859-5","utf-8", $str );

我刚刚看到你的源字符串确实是QP,所以你需要从QP转换为字节;这很简单,只需使用 MIME::QuotedPrint

use MIME::QuotedPrint ();

my $out = MIME::QuotedPrint::decode($str);

You need to convert explicitly the codes to characters. For that you need to know what's the input encoding. I suppose it's iso-8859-5, but it could be windows-1252 or something else.

use Encode qw( decode );

my $str="this is \321\213\321\213\321\213\321\213\321 \321\201\320\277\320\260\321\200\321\202\320\260\321";
my $out .= from_to( "iso-8859-5","utf-8", $str );

I've just seen that your source string is indeed QP, so you need to convert from QP to bytes; that's easy, simply use MIME::QuotedPrint:

use MIME::QuotedPrint ();

my $out = MIME::QuotedPrint::decode($str);
命比纸薄 2024-11-14 18:10:10

问题是:perl 不知道字符串是 UTF-8,因此您必须显式打开标志。

编码::_utf8_on($str);

Problem is: perl does not know that the string is UTF-8, so you must turn flag explicitly on.

Encode::_utf8_on($str);

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文