通过 phpMyAdmin 将 23Mb .sql 文件导入 MySQL 时出现编码问题
当我导入以下 .sql 文件(插入 4 条记录)
SET SQL_MODE="NO_AUTO_VALUE_ON_ZERO";
/*!40101 SET @OLD_CHARACTER_SET_CLIENT=@@CHARACTER_SET_CLIENT */;
/*!40101 SET @OLD_CHARACTER_SET_RESULTS=@@CHARACTER_SET_RESULTS */;
/*!40101 SET @OLD_COLLATION_CONNECTION=@@COLLATION_CONNECTION */;
/*!40101 SET NAMES utf8 */;
CREATE TABLE IF NOT EXISTS `sentences` (
`jp` text character set utf8 collate utf8_unicode_ci,
`eng` text character set utf8 collate utf8_unicode_ci,
`reading` text character set utf8 collate utf8_unicode_ci,
`query` varchar(50) character set utf8 collate utf8_unicode_ci default NULL,
`patternIDs` varchar(100) character set utf8 collate utf8_unicode_ci default NULL,
`hasImage` tinyint(1) NOT NULL,
`imageURL` varchar(100) character set utf8 collate utf8_unicode_ci NOT NULL,
`id` int(11) NOT NULL auto_increment,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=eucjpms;
INSERT INTO `sentences` (`jp`, `eng`, `reading`, `query`, `patternIDs`, `hasImage`, `imageURL`, `id`) VALUES
('ムーリエルは20歳になりました。', 'Muiriel is 20 now.', 'はにぜろさいになりました。', 'ムーリエル', '64', 0, 'none', 1),
('すぐに戻ります。', 'I will be back soon.', 'すぐにもどります。', 'すぐ', '4', 1, 'http://ts1.mm.bing.net/images/thumbnail.aspx?q=959017328936&id=b33b9daf539756a8b0b2364f63088008', 2),
('すぐに諦めて昼寝をするかも知れない。', 'I may give up soon and just nap instead.', 'すぐにあきらめてひるねをするかもしれない。', '昼寝', '19', 1, 'http://ts1.mm.bing.net/images/thumbnail.aspx?q=888895375610&id=5debb6afed90989674d447f9493b4a1d', 3),
('ログアウトするんじゃなかったよ。', 'I shouldn\'t have logged off.', 'ログアウトするんじゃなかったよ。', 'ログアウト', '16', 1, 'http://ts1.mm.bing.net/images/thumbnail.aspx?q=846535990996&id=4e0ad521154e2e7456330af87b24ee71', 4);
然后浏览句子表时,可以查看所有日语句子,UTF-8 编码没有任何问题。但是,当我导入以下文件时(完全相同的文件,唯一的区别是大小,插入了约 73000 条记录,而不是 4 条),
SET SQL_MODE="NO_AUTO_VALUE_ON_ZERO";
/*!40101 SET @OLD_CHARACTER_SET_CLIENT=@@CHARACTER_SET_CLIENT */;
/*!40101 SET @OLD_CHARACTER_SET_RESULTS=@@CHARACTER_SET_RESULTS */;
/*!40101 SET @OLD_COLLATION_CONNECTION=@@COLLATION_CONNECTION */;
/*!40101 SET NAMES utf8 */;
CREATE TABLE IF NOT EXISTS `sentences` (
`jp` text character set utf8 collate utf8_unicode_ci,
`eng` text character set utf8 collate utf8_unicode_ci,
`reading` text character set utf8 collate utf8_unicode_ci,
`query` varchar(50) character set utf8 collate utf8_unicode_ci default NULL,
`patternIDs` varchar(100) character set utf8 collate utf8_unicode_ci default NULL,
`hasImage` tinyint(1) NOT NULL,
`imageURL` varchar(100) character set utf8 collate utf8_unicode_ci NOT NULL,
`id` int(11) NOT NULL auto_increment,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=eucjpms;
INSERT INTO `sentences` (`jp`, `eng`, `reading`, `query`, `patternIDs`, `hasImage`, `imageURL`, `id`) VALUES
('ムーリエルは20歳になりました。', 'Muiriel is 20 now.', 'はにぜろさいになりました。', 'ムーリエル', '64', 0, 'none', 1),
('すぐに戻ります。', 'I will be back soon.', 'すぐにもどります。', 'すぐ', '4', 1, 'http://ts1.mm.bing.net/images/thumbnail.aspx?q=959017328936&id=b33b9daf539756a8b0b2364f63088008', 2),
('すぐに諦めて昼寝をするかも知れない。', 'I may give up soon and just nap instead.', 'すぐにあきらめてひるねをするかもしれない。', '昼寝', '19', 1, 'http://ts1.mm.bing.net/images/thumbnail.aspx?q=888895375610&id=5debb6afed90989674d447f9493b4a1d', 3),
('ログアウトするんじゃなかったよ。', 'I shouldn\'t have logged off.', 'ログアウトするんじゃなかったよ。', 'ログアウト', '16', 1, 'http://ts1.mm.bing.net/images/thumbnail.aspx?q=846535990996&id=4e0ad521154e2e7456330af87b24ee71', 4)
('先生に質問したら、すぐに答えてくれました。', 'When I asked a question to my teacher, he/she immediately answered it.', 'せんせいにしつもんしたら、すぐにこたえてくれました。', '先生', '64, 189', 1, 'http://ts1.mm.bing.net/images/thumbnail.aspx?q=889488746606&id=53a411907232964b30b9ebde03093a66', 73660),
('薬を飲んだら、すぐになおりました。', 'I took a medicine, and soon recovered.', 'くすりをのんだら、すぐになおりました。', '薬', '19, 64, 189', 1, 'http://ts2.mm.bing.net/images/thumbnail.aspx?q=934254550695&id=4400863ae021a4827dd7f9f7380fc2a2', 73661);
我看不到日语字符。这是为什么?为什么 phpMyAdmin 在导入较大的 .sql 文件时会出现编码问题?谢谢你们!
When I import the following .sql file (4 records inserted)
SET SQL_MODE="NO_AUTO_VALUE_ON_ZERO";
/*!40101 SET @OLD_CHARACTER_SET_CLIENT=@@CHARACTER_SET_CLIENT */;
/*!40101 SET @OLD_CHARACTER_SET_RESULTS=@@CHARACTER_SET_RESULTS */;
/*!40101 SET @OLD_COLLATION_CONNECTION=@@COLLATION_CONNECTION */;
/*!40101 SET NAMES utf8 */;
CREATE TABLE IF NOT EXISTS `sentences` (
`jp` text character set utf8 collate utf8_unicode_ci,
`eng` text character set utf8 collate utf8_unicode_ci,
`reading` text character set utf8 collate utf8_unicode_ci,
`query` varchar(50) character set utf8 collate utf8_unicode_ci default NULL,
`patternIDs` varchar(100) character set utf8 collate utf8_unicode_ci default NULL,
`hasImage` tinyint(1) NOT NULL,
`imageURL` varchar(100) character set utf8 collate utf8_unicode_ci NOT NULL,
`id` int(11) NOT NULL auto_increment,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=eucjpms;
INSERT INTO `sentences` (`jp`, `eng`, `reading`, `query`, `patternIDs`, `hasImage`, `imageURL`, `id`) VALUES
('ムーリエルは20歳になりました。', 'Muiriel is 20 now.', 'はにぜろさいになりました。', 'ムーリエル', '64', 0, 'none', 1),
('すぐに戻ります。', 'I will be back soon.', 'すぐにもどります。', 'すぐ', '4', 1, 'http://ts1.mm.bing.net/images/thumbnail.aspx?q=959017328936&id=b33b9daf539756a8b0b2364f63088008', 2),
('すぐに諦めて昼寝をするかも知れない。', 'I may give up soon and just nap instead.', 'すぐにあきらめてひるねをするかもしれない。', '昼寝', '19', 1, 'http://ts1.mm.bing.net/images/thumbnail.aspx?q=888895375610&id=5debb6afed90989674d447f9493b4a1d', 3),
('ログアウトするんじゃなかったよ。', 'I shouldn\'t have logged off.', 'ログアウトするんじゃなかったよ。', 'ログアウト', '16', 1, 'http://ts1.mm.bing.net/images/thumbnail.aspx?q=846535990996&id=4e0ad521154e2e7456330af87b24ee71', 4);
and then browse the sentences table, all the japanese sentences can be viewed w/o any problems in UTF-8 encoding. However, when I import the following file (exactly the same thing, the only difference in size, ~73000 records inserted, not 4)
SET SQL_MODE="NO_AUTO_VALUE_ON_ZERO";
/*!40101 SET @OLD_CHARACTER_SET_CLIENT=@@CHARACTER_SET_CLIENT */;
/*!40101 SET @OLD_CHARACTER_SET_RESULTS=@@CHARACTER_SET_RESULTS */;
/*!40101 SET @OLD_COLLATION_CONNECTION=@@COLLATION_CONNECTION */;
/*!40101 SET NAMES utf8 */;
CREATE TABLE IF NOT EXISTS `sentences` (
`jp` text character set utf8 collate utf8_unicode_ci,
`eng` text character set utf8 collate utf8_unicode_ci,
`reading` text character set utf8 collate utf8_unicode_ci,
`query` varchar(50) character set utf8 collate utf8_unicode_ci default NULL,
`patternIDs` varchar(100) character set utf8 collate utf8_unicode_ci default NULL,
`hasImage` tinyint(1) NOT NULL,
`imageURL` varchar(100) character set utf8 collate utf8_unicode_ci NOT NULL,
`id` int(11) NOT NULL auto_increment,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=eucjpms;
INSERT INTO `sentences` (`jp`, `eng`, `reading`, `query`, `patternIDs`, `hasImage`, `imageURL`, `id`) VALUES
('ムーリエルは20歳になりました。', 'Muiriel is 20 now.', 'はにぜろさいになりました。', 'ムーリエル', '64', 0, 'none', 1),
('すぐに戻ります。', 'I will be back soon.', 'すぐにもどります。', 'すぐ', '4', 1, 'http://ts1.mm.bing.net/images/thumbnail.aspx?q=959017328936&id=b33b9daf539756a8b0b2364f63088008', 2),
('すぐに諦めて昼寝をするかも知れない。', 'I may give up soon and just nap instead.', 'すぐにあきらめてひるねをするかもしれない。', '昼寝', '19', 1, 'http://ts1.mm.bing.net/images/thumbnail.aspx?q=888895375610&id=5debb6afed90989674d447f9493b4a1d', 3),
('ログアウトするんじゃなかったよ。', 'I shouldn\'t have logged off.', 'ログアウトするんじゃなかったよ。', 'ログアウト', '16', 1, 'http://ts1.mm.bing.net/images/thumbnail.aspx?q=846535990996&id=4e0ad521154e2e7456330af87b24ee71', 4)
('先生に質問したら、すぐに答えてくれました。', 'When I asked a question to my teacher, he/she immediately answered it.', 'せんせいにしつもんしたら、すぐにこたえてくれました。', '先生', '64, 189', 1, 'http://ts1.mm.bing.net/images/thumbnail.aspx?q=889488746606&id=53a411907232964b30b9ebde03093a66', 73660),
('薬を飲んだら、すぐになおりました。', 'I took a medicine, and soon recovered.', 'くすりをのんだら、すぐになおりました。', '薬', '19, 64, 189', 1, 'http://ts2.mm.bing.net/images/thumbnail.aspx?q=934254550695&id=4400863ae021a4827dd7f9f7380fc2a2', 73661);
I can't see Japanese characters. Why is that? Why does phpMyAdmin have encoding problems when importing bigger .sql files? Thanks, guys!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
使用的语言和编码可能会产生很多困难。 http://www.herongyang.com/PHP-Chinese/ 特别针对中文问题,许多讨论也适用于任何 Unicode,包括日语。
例如,Heron Yang给出了一个可能的流程:
H1。关键序列 ->从键盘(文本编辑器)->
H2。 HTML 文档 -> (网络服务器)->
H3。 HTTP 响应 -> (互联网 TCP/IP 连接)->
H4。 HTTP 响应 -> (网络浏览器)->
H5。屏幕上的可视字符
基本上您需要确保导入过程(和输出过程)中的每一步都没有问题。第一步是“phpmyadmin wiki上的乱码数据”Plebsori 指出。不幸的是,该维基说明了一些问题,但我认为没有解决方案。
我首先检查两个 .sql 文件的编码是否完全相同。为了进行测试,您可以使用 Notepad++ 编辑 73,000 个条目文件,并删除除前四行之外的所有行。某些文本编辑器可能会在保存过程中更改编码,从而使两个文件的编码不同,即使它们看起来完全相同。因此,请确保以完全相同的方式保存这两个文件。对于中文,我经常使用Notepad++来更改文件的编码。确保 .sql 文件以相同的编码保存。编码是如此重要,这就是为什么notepad++将其作为菜单栏上的菜单之一。
文件可能出现的另一个问题是文本流开头的 BOM 标记。 http://en.wikipedia.org/wiki/Byte-order_mark。 PHPmyAdmin 可以使用这个不可见的标记来确定转换语言。我再次使用 notepad++ 来保证 BOM 存在。 (编码菜单)。您还可以使用,因为复制/粘贴可能会更改编码(TextFX > TextFX Viz 设置 > Viz 复制剪切也在 unicode 中)。
最后,链条上还有很多环节。好处是,一旦您弄清楚如何在保留语言的同时正确输入和输出数据,那么以后再次执行此操作就可以非常简单。顺便说一句,如果您尝试我建议的编码技巧并已验证文件格式不是问题的根源,那么有一些导入数据的技巧。您可以将UTF8转换为ascii(看起来像垃圾字符),导入它,然后将其转换回您在sql中想要的编码。
There are a lot of difficulties that can arise from the language and encoding being used. There is an invaluable source of info at http://www.herongyang.com/PHP-Chinese/ specifically for Chinese issues, and many of the discussions would also apply to any Unicode including Japanese.
For example, Heron Yang gives a possible flow:
H1. Key Sequences -> from keyboard (Text editor) ->
H2. HTML Document -> (Web server) ->
H3. HTTP Response -> (Internet TCP/IP Connection) ->
H4. HTTP Response -> (Web browser) ->
H5. Visual characters on the screen
Basically you need to make sure that every step in the import process (and output process) that there are no problems. The first step is the "garbled data on the phpmyadmin wiki" pointed out by Plebsori. Unfortunately, that wiki illustrates some problems but I think not the solutions.
I'd start with checking the encoding of the two .sql files are exactly the same. To test you could edit the 73,000 entry file using Notepad++ and delete all but the first four rows. Some text editors might change the encoding during the save process making the encoding of the two files different, even if they look exactly the same. So make sure you save both files exactly the same way. For Chinese, I would often use Notepad++ to change the encoding of the file. Make sure the .sql files are saved with the same encoding. Encoding is so important, that's why notepad++ has it as one of the menus on the menu bar.
Another issue that can arise with files is the BOM marker at the start of the text stream. http://en.wikipedia.org/wiki/Byte-order_mark. This invisible mark is what PHPmyAdmin might use to determine the conversion language. Again I'd use notepad++ to guarantee that the BOM is present. (Encoding menu). You can also use the because copy/paste might change the encoding (TextFX > TextFX Viz Settings > Viz Copy-Cut also in unicode).
Finally, there are still a lot of links in the chain. The good thing is that once you figure out how to get the data in and out properly while preserving the language, then it can be quite straightforward to do it again later. By the way, if you try the encoding tip I suggested and have verified that file formats are not the source of the problem, then there are some tricks to import data. You can convert the UTF8 to ascii (will look like garbage characters), import it, and then convert it back to the encoding you want inside sql.
以下是一些可能有帮助的建议。
我建议您确认您能够将 23 meg 文件发布到服务器。 PHP 配置文件对帖子的大小有限制设置。
我还建议您确认 php 最大执行时间没有受到影响并导致导入提前完成。
也许你可以从命令行导入 SQL 文件
Here's a few suggestions that may help.
I'd suggest that you confirm you're able to post a 23 meg file to server. The PHP config file has a limit setting for the size of a post.
I'd also suggest that you confirm the php max execution time isn't being hit and causing the import to finish early.
Maybe you could import the SQL file from the commandline
您达到了服务器时间限制/大小限制,并且 phpmyadmin 足够聪明(或没有)从停止的地方(大约')继续,因为编码命令位于文件的开头,当第二个连接启动时,它确实没有任何编码设置。
解决方案:要么每隔几百行放置一次编码命令,要么
在 Ubunto 中使用 file import 文件导入:
You hit the server time limit/size limit, and phpmyadmin is smart enough (or not) to continue from where it stopped (approx'), since the encoding command is at the start of the file, when the second connection starts, it does not have any encoding settings.
Solution: either put the encoding command every few hundred lines or use file import
file import in Ubunto: