大文件的编码转换
我面临着一个大(~ 18 GB)文件,从 SQL Server 导出为 Unicode 文本文件,这意味着它的编码是 UTF-16(小端)。该文件现在存储在运行 Linux 的计算机中,但我还没有找到将其转换为 UTF-8 的方法。
起初我尝试使用 iconv,但文件太大了。我的下一个方法是使用拆分并逐个转换文件,但这也不起作用 - 转换过程中出现了很多错误。
那么,关于如何将其转换为 UTF-8 有什么想法吗?任何帮助将不胜感激。
I am faced with a large (~ 18 GB) file, exported from SQL Server as a Unicode text file, which means its encoding is UTF-16 (little endian). The file is now stored in a computer running Linux, but I have not figured out a way to convert it to UTF-8.
At first I tried using iconv, but the file is too large for that. My next approach was using split and converting the files one by one, but that didn't work either - there were a lot of errors during the conversions.
So, any ideas on how to convert this to UTF-8? Any help will be much appreciated.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
由于您使用的是 SQL Server,我假设您的平台是 Windows。在最简单的情况下,您可以快速编写一个脏的 .NET 应用程序,该应用程序逐行读取源代码并写入转换后的文件。像这样的事情:
只需打开 Visual Studio,启动一个新的 C# 控制台应用程序项目,将此代码粘贴到其中,编译并从命令行运行它。第一个参数是您的源文件,第二个参数是您的目标文件。应该有效。
Since you're using SQL server, I assume your platform is Windows. In the simplest case you can write quick an dirty .NET application, which reads the source line-by-line and writes the converted file as it goes. Something like this:
Just open Visual Studio, start a new C# Console Application project, paste this code in there, compile, and run it from the command line. The first argument is your source file, the second argument is your destination file. Should work.