将 MS Access 数据迁移到 MySQL:字符编码问题

发布于 2024-08-07 09:33:12 字数 717 浏览 1 评论 0原文

我们有一个 MS Access .mdb 文件,我认为是由 Access 2000 数据库生成的。我正在尝试使用 mdbtools 将表导出到 SQL,使用以下命令:

mdb-export -S -X \\ -I orig.mdb Reviewer > Reviewer.sql

这会生成我期望的文件,但有一件事除外:某些字符表示为问号。这:“他还没有准备好”显示如下:“他还没有准备好”,仅在某些情况下(主要是单/双大引号),其中内容可能是从 MS Word 粘贴到数据库中的。否则,数据看起来很棒。

我尝试了“export MDB_ICONV=”的各种值。我尝试在生成的文件上使用 iconv,在 from/to 中使用 ISO-8859-1,在 from/to 中使用 UTF-8,在 from 中使用 WINDOWS-1250 和 WINDOWS-1252 以及 WINDOWS-1256各种组合。但我还没有成功地找回那些弯引号。

坦率地说,根据生成的文件的外观,我怀疑问题出在原始 .mdb 文件或 mdbtools 中。这些畸形字符都是单个问号,但很明显它们不是同一事物的畸形版本;所以(我的直觉说)结果文件中没有足够的数据;所以(我的直觉说)这个问题无法在生成的文件中解决。

有人遇到过这个吗?有什么前进的建议吗? FWIW,我没有也从来没有使用过 MS Access——该文件来自第三方——所以这可能就像更改数据库上的某些内容一样简单,我会很高兴听到这一点。

谢谢。

We have an MS Access .mdb file produced, I think, by an Access 2000 database. I am trying to export a table to SQL with mdbtools, using this command:

mdb-export -S -X \\ -I orig.mdb Reviewer > Reviewer.sql

That produces the file I expect, except one thing: Some of the characters are represented as question marks. This: "He wasn't ready" shows up like this: "He wasn?t ready", only in some cases (primarily single/double curly quotes), where maybe the content was pasted into the DB from MS Word. Otherwise, the data look great.

I have tried various values for "export MDB_ICONV=". I've tried using iconv on the resulting file, with ISO-8859-1 in the from/to, with UTF-8 in the from/to, with WINDOWS-1250 and WINDOWS-1252 and WINDOWS-1256 in the from, in various combinations. But I haven't succeeded in getting those curly quotes back.

Frankly, based on the way the resulting file looks, I suspect the issue is either in the original .mdb file, or in mdbtools. The malformed characters are all single question marks, but it is clear that they are not malformed versions of the same thing; so (my gut says) there's not enough data in the resulting file; so (my gut says) the issue can't be fixed in the resulting file.

Has anyone run into this one before? Any tips for moving forward? FWIW, I don't have and never have had MS Access -- the file is coming from a 3rd party -- so this could be as simple as changing something on the database, and I would be very glad to hear that.

Thanks.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

南街女流氓 2024-08-14 09:33:12

看来“聪明的报价”又造成了另一名受害者。

MS word 采用纯 ascii 引号并将其转换为双字节左引号和右引号字符,并将单引号转换为双字节撇号字符。所讨论的双字节字符属于 MS 代码页,除了愚蠢的引号字符外,该代码页与 unicode-16 大致兼容。

有一个名为“demoroniser.pl”的 Perl 脚本可以消除所有这些虚言,并将引号转换回纯 ASCII。

Looks like "smart quotes" have claimed yet another victim.

MS word takes plain ascii quotes and translates them to the double-byte left-quote and right-quote characters and translates a single quote into the double byte apostrophe character. The double byte characters in question blelong to to an MS code page which is roughly compatable with unicode-16 except for the silly quote characters.

There is a perl script called 'demoroniser.pl' which undoes all this malarky and converts the quotes back to plain ASCII.

情绪少女 2024-08-14 09:33:12

这很可能是由于 Access 文件中的数据是 UTF,并且 MDB Tools 试图将其转换为 ascii/latin/is0-8859-1 或其他编码。由于这些编码不能正确映射所有 UTF 字符,因此最终会出现问号。 此处的信息可能会帮助您修复编码通过让 MDB 工具使用正确的编码来解决问题。

It's most likely due to the fact that the data in the Access file is UTF, and MDB Tools is trying to convert it to ascii/latin/is0-8859-1 or some other encoding. Since these encodings don't map all the UTF characters properly, you end up with question marks. The information here may help you fix your encoding issues by getting MDB Tools to use the correct encoding.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文