使用 EmEditor 将 Unicode 文件保存为另一种格式会扭曲/更改格式。解决方案?
有一个 MySQL 备份文件,这是一个巨大的文件 - 大约 3 GB。有一张表有一个 LONGBLOB 列,用于存储 JPEG 图像数据。
如果从 MySQL Workbench - 数据导入/恢复完成,文件导入成功。
我需要打开这个文件并提取前几行(大约包含图像数据的表的两行 INSERT),以便我可以测试另一个程序是否可以将此数据导入到另一个 MySQL 数据库中。
我尝试使用 EmEditor 打开该文件(它擅长打开大文件),然后仅将脚本的最多一个 Insert 语句复制/粘贴到新文件中(最多大约第 25 行,因为有问题的表是该文件中的第一个表)备份脚本),然后将所选内容粘贴到新文件中。
问题来了:
然而这会弄乱编码(即使我保存为utf8)。当我尝试将这个新文件(再次使用 MySQL Workbench)导入(恢复)到 MySQL 数据库时,我意识到这一点,恢复顺利进行,但 blob 列中的 JPEG 图像现在被破坏/损坏。
我的猜测是原始文件和新文件的编码不同。
EmEditor 不显示原始文件的编码,有一个检测选项,它将其检测为“UTF8 Unsigned”。但是保存的时候我保存为UTF8。我也尝试另存为 ANSI、ISO8859(Windows 默认)等,但每次都是相同的结果。
对于这个特殊问题你有什么解决办法吗?即我只想剪切巨大备份文件的前几行并保存到一个新文件,保持编码相同,以便图像(斑点)不会更改。有什么方法可以使用 EmEditor 来完成此操作(即我是否使用了错误的方法 [即剪切粘贴]?)是否有任何专门的软件可以做到这一点?我如何诊断这里出了什么问题?
感谢您的任何回复。
There is a MySQL backup file which is a huge file - about 3 GB. There is one table that has a LONGBLOB column that stores JPEG image data.
The file imports successfully if done from MySQL Workbench - Data Import/Restore.
I need to open this file and extract the first few lines (about two rows of INSERTs of the table with the image data) so that I can test if another program can import this data into another MySQL database.
I tried opening the file with EmEditor (which is good at opening large files) and then copy/paste only upto one Insert statement of the script into a new file (upto about line 25, because the table in question is the first table in the backup script), and then Paste the selection into a new file.
Here comes the problem:
However this messes up the encoding (even though I save as utf8). I realize this when I try to import (restore) this new file (again using MySQL Workbench) into a MySQL database, the restore goes ahead without errors, but the JPEG images in the blob column are now destroyed/corrupted.
My guess is that the encoding is different between the original file and new file.
EmEditor does not show the encoding on the original file, there is an option to detect, and it detects it as 'UTF8 Unsigned'. But when saving I save it as UTF8. I tried also saving as ANSI, ISO8859 (windows default), etc, etc.. but everytime the same result.
Do you have any solution for this particular problem? ie I want to only cut the first few lines of the huge backup file and save to a new file keeping the encoding the same, so that the images (blobs) are not changed. Is there any way this can be done with EmEditor (ie do I have the wrong approach [ie Cut-Paste]?) Is there any specialized software that can do this? How can I diagnose what is going wrong here?
Thanks for any responses.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
UTF-8 对于任意二进制数据来说不是一个好的选择。有许多高字节序列在 UTF-8 中无效,因此您可能会在加载-更改-保存过程中的某个时刻破坏它们。
如果您使用将每个字节映射到唯一字符的编码加载文件,并使用相同的编码重新保存文件,则应保留原始内容 (*)。 ISO-8859-1 是通常为此目的选择的编码,因为它只是将每个字节 0..0xFF 映射到具有相同数字的 Unicode 代码点。
(*: 假设编辑器对于其他棘手的问题是二进制安全的,例如 null、
\n
/\r
和其他控制字符......我相信 EmEditor 可以.)UTF-8 is not a good choice for arbitrary binary data. There are many sequences of high-bytes which are not valid in UTF-8, so you will mangle them at some point during the load-alter-save process.
If you load the file using an encoding that maps every single byte to a unique character, and re-save the file using that same encoding, you should preserve the original content(*). ISO-8859-1 is the encoding usually chosen for this purpose, since it simply maps each byte 0..0xFF to the Unicode code point with the same number.
(*: assuming the editor is binary-safe with regard to other tricky points like nulls,
\n
/\r
and other control characters... I believe EmEditor can be.)在 EmEditor 中打开原始文件时,尝试选择编码为二进制(ASCII 视图)。正如 bobince 所说,二进制(ASCII 视图)会将每个字节映射到一个唯一的字符,并在保存文件时保留该字符。我认为这应该可以解决你的问题。
When opening the original file in EmEditor, trying selecting the encoding as Binary (ASCII View). The Binary (ASCII View) will, as bobince said, map each byte to a unique character and preserve that when you save the file. I think this should fix your problem.