当前位置：文江博客话题详情

大文件的编码转换

发布于 2024-11-19 12:45:12 字数 241 浏览 6 评论 0原文

我面临着一个大（~ 18 GB）文件，从 SQL Server 导出为 Unicode 文本文件，这意味着它的编码是 UTF-16（小端）。该文件现在存储在运行 Linux 的计算机中，但我还没有找到将其转换为 UTF-8 的方法。

起初我尝试使用 iconv，但文件太大了。我的下一个方法是使用拆分并逐个转换文件，但这也不起作用 - 转换过程中出现了很多错误。

那么，关于如何将其转换为 UTF-8 有什么想法吗？任何帮助将不胜感激。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

小忆控 2024-11-26 12:45:12

由于您使用的是 SQL Server，我假设您的平台是 Windows。在最简单的情况下，您可以快速编写一个脏的 .NET 应用程序，该应用程序逐行读取源代码并写入转换后的文件。像这样的事情：

using System;
using System.IO;
using System.Text;

namespace UTFConv {
    class Program {
        static void Main(string[] args) {
            try {
                Encoding encSrc = Encoding.Unicode;
                Encoding encDst = Encoding.UTF8;
                uint lines = 0;
                using (StreamReader src = new StreamReader(args[0], encSrc)) {
                    using (StreamWriter dest = new StreamWriter(args[1], false, encDst)) {
                        string ln;
                        while ((ln = src.ReadLine()) != null) {
                            lines++;
                            dest.WriteLine(ln);
                        }
                    }
                }
                Console.WriteLine("Converted {0} lines", lines);
            } catch (Exception x) {
                Console.WriteLine("Problem converting the file: {0}", x.Message);
            }
        }
    }
}

只需打开 Visual Studio，启动一个新的 C# 控制台应用程序项目，将此代码粘贴到其中，编译并从命令行运行它。第一个参数是您的源文件，第二个参数是您的目标文件。应该有效。

Since you're using SQL server, I assume your platform is Windows. In the simplest case you can write quick an dirty .NET application, which reads the source line-by-line and writes the converted file as it goes. Something like this:

using System;
using System.IO;
using System.Text;

namespace UTFConv {
    class Program {
        static void Main(string[] args) {
            try {
                Encoding encSrc = Encoding.Unicode;
                Encoding encDst = Encoding.UTF8;
                uint lines = 0;
                using (StreamReader src = new StreamReader(args[0], encSrc)) {
                    using (StreamWriter dest = new StreamWriter(args[1], false, encDst)) {
                        string ln;
                        while ((ln = src.ReadLine()) != null) {
                            lines++;
                            dest.WriteLine(ln);
                        }
                    }
                }
                Console.WriteLine("Converted {0} lines", lines);
            } catch (Exception x) {
                Console.WriteLine("Problem converting the file: {0}", x.Message);
            }
        }
    }
}

Just open Visual Studio, start a new C# Console Application project, paste this code in there, compile, and run it from the command line. The first argument is your source file, the second argument is your destination file. Should work.

回复收藏 0 原文

~没有更多了~