perl/mariadb:与解码的冲突(执行失败:不正确的字符串值:' \ xe4 ...')

发布于 2025-02-08 19:36:36 字数 1785 浏览 2 评论 0原文

我从带有Perl的HTML文件中提取数据,并希望将数据插入数据库。但是umlauts有一种奇怪的行为(ä,Ö,...)。

该设置为:

use DBI;        
use DBD::mysql;
use HTML::Entities;

$name = ... # extraction from html-file

my $dbh = DBI->connect ("DBI:mysql:db_0",
                        "root", "mypw",
                        { RaiseError => 1
                          , PrintError => 0
                          , mysql_enable_utf8mb4 => 1
                        }
                        ) or die "Fehler beim Verbindungsaufbau zum MariaDB-Server:" .
                                 " $DBI::err -< $DBI::errstr \n";


my $insert_import = $dbh->prepare("INSERT INTO arzt_0 (name) VALUES (?));
$insert_import->execute($name)
or die "Fehler bei der Ausfuehrung: ".
   "$DBI::err -> $DBI::errstr (stage_imports $DBI::state)\n"
   ;

$insert_import->finish();

数据库具有字符集UTF8MB4和COLLATION UTF8MB4_UNICODE_CI。

由于标量$名称的内容具有umlaute的HTML代码,例如&amp;#196; ä我用来

use HTML::Entities;

将HTML代码更改为contric ä

$name = decode_entities $name;

例如,$ name的原点内容&amp;#196; rztin 以及解码后ärztin

如果$ name将导出到终端(print $ name)或在两种情况下ärztin将被导出。

但是,如果将$ name插入数据库中,则会遇到错误:

DBD::mysql::st execute failed: Incorrect string value: '\xE4rztin...' for column .......

如果我在脚本中写入:

$name = 'Ärztin';
$insert_import->execute($name)

或者是直接的伪造

$insert_import->execute('Ärztin')

并运行脚本,则没有错误,并且编写了ärztin。在列中名称

显然,$ name解码时与直接写入标量的字符串时不同。对此有什么解释? 我解决了问题吗?

I extract data from html-files with perl and want to insert the data into a database. But there is a strange behaviour with umlauts (ä,ö, ...).

The setup is:

use DBI;        
use DBD::mysql;
use HTML::Entities;

$name = ... # extraction from html-file

my $dbh = DBI->connect ("DBI:mysql:db_0",
                        "root", "mypw",
                        { RaiseError => 1
                          , PrintError => 0
                          , mysql_enable_utf8mb4 => 1
                        }
                        ) or die "Fehler beim Verbindungsaufbau zum MariaDB-Server:" .
                                 " $DBI::err -< $DBI::errstr \n";


my $insert_import = $dbh->prepare("INSERT INTO arzt_0 (name) VALUES (?));
$insert_import->execute($name)
or die "Fehler bei der Ausfuehrung: ".
   "$DBI::err -> $DBI::errstr (stage_imports $DBI::state)\n"
   ;

$insert_import->finish();

Database has character set utf8mb4 and collation utf8mb4_unicode_ci.

Since the content of the scalar $name has html code for umlaute, for example Ä for Ä I used

use HTML::Entities;

to change the html code to character Ä:

$name = decode_entities $name;

For example, the origin content of $name was Ärztin and after decoding Ärztin.

If $name is exported to the terminal (print $name) or to a file in both cases Ärztin will be exported.

But if $name is inserted into the database I got the error:

DBD::mysql::st execute failed: Incorrect string value: '\xE4rztin...' for column .......

If I write in the script:

$name = 'Ärztin';
$insert_import->execute($name)

or straight forwared

$insert_import->execute('Ärztin')

and run the script there is no error and Ärztin is written in column name.

Obviously, $name when decoded is not the same as when the string with umlaut is written directly into the scalar. What is the explanation for this and how can
I solve the problem?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文