大文件的编码转换

发布于 2024-11-19 12:45:12 字数 241 浏览 6 评论 0原文

我面临着一个大(~ 18 GB)文件,从 SQL Server 导出为 Unicode 文本文件,这意味着它的编码是 UTF-16(小端)。该文件现在存储在运行 Linux 的计算机中,但我还没有找到将其转换为 UTF-8 的方法。

起初我尝试使用 iconv,但文件太大了。我的下一个方法是使用拆分并逐个转换文件,但这也不起作用 - 转换过程中出现了很多错误。

那么,关于如何将其转换为 UTF-8 有什么想法吗?任何帮助将不胜感激。

I am faced with a large (~ 18 GB) file, exported from SQL Server as a Unicode text file, which means its encoding is UTF-16 (little endian). The file is now stored in a computer running Linux, but I have not figured out a way to convert it to UTF-8.

At first I tried using iconv, but the file is too large for that. My next approach was using split and converting the files one by one, but that didn't work either - there were a lot of errors during the conversions.

So, any ideas on how to convert this to UTF-8? Any help will be much appreciated.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

小忆控 2024-11-26 12:45:12

由于您使用的是 SQL Server,我假设您的平台是 Windows。在最简单的情况下,您可以快速编写一个脏的 .NET 应用程序,该应用程序逐行读取源代码并写入转换后的文件。像这样的事情:

using System;
using System.IO;
using System.Text;

namespace UTFConv {
    class Program {
        static void Main(string[] args) {
            try {
                Encoding encSrc = Encoding.Unicode;
                Encoding encDst = Encoding.UTF8;
                uint lines = 0;
                using (StreamReader src = new StreamReader(args[0], encSrc)) {
                    using (StreamWriter dest = new StreamWriter(args[1], false, encDst)) {
                        string ln;
                        while ((ln = src.ReadLine()) != null) {
                            lines++;
                            dest.WriteLine(ln);
                        }
                    }
                }
                Console.WriteLine("Converted {0} lines", lines);
            } catch (Exception x) {
                Console.WriteLine("Problem converting the file: {0}", x.Message);
            }
        }
    }
}

只需打开 Visual Studio,启动一个新的 C# 控制台应用程序项目,将此代码粘贴到其中,编译并从命令行运行它。第一个参数是您的源文件,第二个参数是您的目标文件。应该有效。

Since you're using SQL server, I assume your platform is Windows. In the simplest case you can write quick an dirty .NET application, which reads the source line-by-line and writes the converted file as it goes. Something like this:

using System;
using System.IO;
using System.Text;

namespace UTFConv {
    class Program {
        static void Main(string[] args) {
            try {
                Encoding encSrc = Encoding.Unicode;
                Encoding encDst = Encoding.UTF8;
                uint lines = 0;
                using (StreamReader src = new StreamReader(args[0], encSrc)) {
                    using (StreamWriter dest = new StreamWriter(args[1], false, encDst)) {
                        string ln;
                        while ((ln = src.ReadLine()) != null) {
                            lines++;
                            dest.WriteLine(ln);
                        }
                    }
                }
                Console.WriteLine("Converted {0} lines", lines);
            } catch (Exception x) {
                Console.WriteLine("Problem converting the file: {0}", x.Message);
            }
        }
    }
}

Just open Visual Studio, start a new C# Console Application project, paste this code in there, compile, and run it from the command line. The first argument is your source file, the second argument is your destination file. Should work.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文