将文本导入 MySQL:奇怪的格式

发布于 2024-08-20 09:28:09 字数 2982 浏览 8 评论 0原文

我正在使用 mysqlimport 将一些数据从 .txt 文件导入到 MySQL 数据库表中。它似乎导入正常(没有错误消息),但显示时看起来很奇怪,并且无法按预期进行搜索。

以下是详细信息。原始文本文件以 UTF-8 格式保存,其记录(在文本编辑器中)如下所示。第二个字段包括换行符:

WAR-16,52 ~~~~~ Lorem ipsum dolor sit.
Lorem ipsum dolor sit.
~~~~~ ENDOFRECORD
WAR-16,53~~~~~Lorem ipsum dolor sit.
Lorem ipsum dolor sit.
Lorem ipsum dolor sit.
Lorem ipsum dolor sit.

~~~~~ ENDOFRECORD

我要导入的数据库表非常简单:

+-------+---------------+------+-----+---------+-------+
| Field | Type          | Null | Key | Default | Extra |
+-------+---------------+------+-----+---------+-------+
| id    | varchar(100)  | YES  |     | NULL    |       |
| text  | varchar(5000) | YES  |     | NULL    |       |
+-------+---------------+------+-----+---------+-------+
2 rows in set (0.00 sec)

当我导入文件时,这是我使用的命令:

$ mysqlimport -u root db textfile.txt --fields-terminated-by="~~~~~" --lines-terminated-by="ENDOFTHELINE" --default-character-set='utf8'
db.records_list: Records: 18778  Deleted: 0  Skipped: 0  Warnings: 18787

如果我然后要求 MySQL 显示记录,我会看到以下内容:

mysql> select * from textfile;
| 
 W A R - 1 6 , 5 2 |  L o r e m  i p s u m  d o l o r  s i t .
L o r e m  i p s u m  d o l o r  s i t .
(etc)

所以,它看起来像尽管空格或一些奇怪的编码附加内容被添加到文本中。

这是数据库查询的问题:

mysql> select * from textfile where id like "%WAR%";

不返回任何内容;添加空格也不会:

mysql> select * from textfile where id like "%W A R%";

只有这个命令会返回任何内容

mysql> select * from textfile where id like "%W%";

有人能猜到会发生什么吗?我觉得这一定是编码问题,但我无法解决。

------ 更新 --------

好的,我已经检查了数据库和连接编码。

mysql> show variables like "character_set_%";
+--------------------------+----------------------------------------+
| Variable_name            | Value                                  |
+--------------------------+----------------------------------------+
| character_set_client     | latin1                                 |
| character_set_connection | latin1                                 |
| character_set_database   | latin1                                 |
| character_set_filesystem | binary                                 |
| character_set_results    | latin1                                 |
| character_set_server     | latin1                                 |
| character_set_system     | utf8                                   |
| character_sets_dir       | /usr/local/mysql/share/mysql/charsets/ |
+--------------------------+----------------------------------------+
8 rows in set (0.01 sec)

show table status 表示该表是 latin1_swedish_ci

我已将文本文件重新保存在“Western (Windows Latin 1)”中(在 Snow Leopard 上使用 TextEdit),并尝试使用与上面相同的命令导入它。但是我仍然有同样的编码问题。

我也尝试过,但没有成功:

  • 使用 UTF-8 创建一个新表并导入现有文件
  • 复制和导入将文本粘贴到我之前导入的另一个文本文件中,然后尝试导入它。

还是完全困惑:(((

I'm importing some data from a .txt file into a MySQL database table, using mysqlimport. It seems to import OK (no error messages) but looks very odd when displayed, and can't be searched as expected.

Here are the details. The original text file is saved in UTF-8, with records that look (in a text editor) like this. The second field includes line breaks:

WAR-16,52 ~~~~~ Lorem ipsum dolor sit.
Lorem ipsum dolor sit.
~~~~~ ENDOFRECORD
WAR-16,53~~~~~Lorem ipsum dolor sit.
Lorem ipsum dolor sit.
Lorem ipsum dolor sit.
Lorem ipsum dolor sit.

~~~~~ ENDOFRECORD

The database table into which I am importing is very simple:

+-------+---------------+------+-----+---------+-------+
| Field | Type          | Null | Key | Default | Extra |
+-------+---------------+------+-----+---------+-------+
| id    | varchar(100)  | YES  |     | NULL    |       |
| text  | varchar(5000) | YES  |     | NULL    |       |
+-------+---------------+------+-----+---------+-------+
2 rows in set (0.00 sec)

When I import the file, here's the command I use:

$ mysqlimport -u root db textfile.txt --fields-terminated-by="~~~~~" --lines-terminated-by="ENDOFTHELINE" --default-character-set='utf8'
db.records_list: Records: 18778  Deleted: 0  Skipped: 0  Warnings: 18787

Here's what I see if I then ask MySQL to display the records:

mysql> select * from textfile;
| 
 W A R - 1 6 , 5 2 |  L o r e m  i p s u m  d o l o r  s i t .
L o r e m  i p s u m  d o l o r  s i t .
(etc)

So, it looks as though spaces, or some strange encoding extras, are being added to the text.

And here's the problem with the database query:

mysql> select * from textfile where id like "%WAR%";

returns nothing; nor does adding spaces:

mysql> select * from textfile where id like "%W A R%";

Only this command returns anything

mysql> select * from textfile where id like "%W%";

Can anyone guess what might be happening? I feel like it must be an encoding problem, but I can't work it out.

------ UPDATE --------

OK, I've checked the database and connection encoding.

mysql> show variables like "character_set_%";
+--------------------------+----------------------------------------+
| Variable_name            | Value                                  |
+--------------------------+----------------------------------------+
| character_set_client     | latin1                                 |
| character_set_connection | latin1                                 |
| character_set_database   | latin1                                 |
| character_set_filesystem | binary                                 |
| character_set_results    | latin1                                 |
| character_set_server     | latin1                                 |
| character_set_system     | utf8                                   |
| character_sets_dir       | /usr/local/mysql/share/mysql/charsets/ |
+--------------------------+----------------------------------------+
8 rows in set (0.01 sec)

And show table status says the table is latin1_swedish_ci.

I have re-saved the text file in "Western (Windows Latin 1)" (using TextEdit on Snow Leopard) and tried to import it using the same command as above. However I still have the same encoding problem.

I also tried, again with no luck:

  • creating a new table with UTF-8 and importing the existing file
  • copying & pasting the text into another text file that I've previously imported fine, and trying to import that.

Still totally baffled :(((

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

长不大的小祸害 2024-08-27 09:28:09

正如您的重复问题中所述,请确保您的表和连接都使用 UTF-8。这是此问题的常见来源之一。

http://dev.mysql.com/doc/refman/ 5.0/en/charset-unicode.html

As noted in your duplicate question, make sure that your table and connection are both using UTF-8. That is one of the common sources of this issue.

http://dev.mysql.com/doc/refman/5.0/en/charset-unicode.html

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文