强制标准输出编码为 UTF8

发布于 2024-12-06 02:03:52 字数 831 浏览 1 评论 0原文

我希望从我的 C# 项目中另一个应用程序的标准输出流中解析 UTF8 字符。使用默认方法,从进程的标准输出流读取时,ANSI 范围之外的字符会被损坏。

现在根据微软的说法,我需要做的是设置StandardOutputEncoding:

如果 StandardOutputEncoding 属性的值为 Nothing,则进程使用标准输出的默认标准输出编码。必须在进程启动之前设置 StandardOutputEncoding 属性。设置此属性并不能保证进程将使用指定的编码。应测试应用程序以确定该进程支持哪些编码。

但是,尝试将 StandardOutputEncoding 设置为 UTF8/CP65001,当转储到二进制文件时,读取的输出显示相同的外语字符阉割。它们总是读作“?” (又名 0x3F)而不是它们应该的样子。

我知道此时的假设是我正在解析其输出的应用程序根本不发送 UTF8 输出,但绝对不是这种情况,因为当我尝试将应用程序的输出从命令行转储到文件时将命令提示符的代码页强制为 65001,一切看起来都很好。

chcp 65001 && slave.exe > file.txt

由此,我知道应用程序 Slave.exe 能够输出 UTF8 编码的标准输出,但尽我所能,我无法让 StandardOutputEncoding 在我的 C# 应用程序中执行相同的操作。

每次我最终处理 .NET 中的编码时,我都希望自己能回到 C++ 世界,因为一切都需要更多工作,但更加透明。我正在考虑编写一个 C 应用程序来将slave.txt 的输出读取到一个 UTF8 编码的文本文件中,以供 C# 解析,但我现在暂缓采用这种方法。

I'm looking to parse UTF8 characters from the standard output stream of another application in my C# project. Using the default approach, characters outside of the ANSI spectrum are corrupted when read from the process' standard output stream.

Now according to Microsoft, what I need to do is set the StandardOutputEncoding:

If the value of the StandardOutputEncoding property is Nothing, the process uses the default standard output encoding for the standard output. The StandardOutputEncoding property must be set before the process is started. Setting this property does not guarantee that the process will use the specified encoding. The application should be tested to determine which encodings the process supports.

However, try as I might to set StandardOutputEncoding to UTF8/CP65001 the output as read, when dumped to a binary file, shows the same castration of foreign language characters. They are always read as '?' (aka 0x3F) instead of what they're supposed to be.

I know the assumption at this point would be that the application whose output I'm parsing is simply not sending UTF8 output, but this is definitely not the case as when I attempt to dump the output of the application to a file from the commandline after forcing the codepage of the commandprompt to 65001, everything looks fine.

chcp 65001 && slave.exe > file.txt

By this, I know for a fact that the application slave.exe is capable of spitting out UTF8-encoded standard output, but try as I might, I'm unable to get StandardOutputEncoding to do the same in my C# application.

Each and every time I end up dealing with encoding in .NET, I wish I were back in the C++ world were everything required more work but was so much more transparent. I'm contemplating writing a C application to read the output of slave.txt into a UTF8-encoded text file ready for C# parsing, but I'm holding off on that approach for now.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

删除会话 2024-12-13 02:03:52

StandardOutputEncoding 的唯一影响对执行的应用程序的标准输出没有任何影响。它所做的唯一事情是设置位于从正在运行的应用程序捕获的二进制标准输出流顶部的 StreamReader 的编码。

这对于本机输出 UTF8 或 Unicode 标准输出的应用程序来说是可以的,但大多数 Microsoft 实用程序这样做,而只会根据控制台的代码页对结果进行编码。控制台的代码页是使用 WIN32 API SetConsoleOutputCPSetConsoleCP 手动设置的,如果您想阅读,则需要手动强制为 UTF8。这需要在正在执行 exe 的控制台上完成,并且据我所知,无法从主机的 .NET 环境中完成。

因此,我编写了一个名为 UtfRedirect 的代理应用程序,其源代码已在 在 GitHub 上 下发布MIT 许可证的条款旨在在 .NET 主机中生成,然后告诉执行哪个 exe。它将为最终从属 exe 的控制台设置代码页,然后运行它并将标准输出通过管道传输回主机。

UtfRedirector 调用示例:

//At the time of creating the process:
_process = new Process
                {
                    StartInfo =
                        {
                            FileName = application,
                            Arguments = arguments,
                            RedirectStandardInput = true,
                            RedirectStandardOutput = true,
                            StandardOutputEncoding = Encoding.UTF8,
                            StandardErrorEncoding =  Encoding.UTF8,
                            UseShellExecute = false,
                        },
                };

_process.StartInfo.Arguments = "";
_process.StartInfo.FileName = "UtfRedirect.exe"

//At the time of running the process
_process.Start();

//Write the name of the final slave exe to the stdin of UtfRedirector in UTF8
var bytes = Encoding.UTF8.GetBytes(application);
_process.StandardInput.BaseStream.Write(bytes, 0, bytes.Length);
_process.StandardInput.WriteLine();

//Write the arguments to be sent to the final slave exe to the stdin of UtfRedirector in UTF8
bytes = Encoding.UTF8.GetBytes(arguments);
_process.StandardInput.BaseStream.Write(bytes, 0, bytes.Length);
_process.StandardInput.WriteLine();

//Read the output that has been proxied with a forced codepage of UTF8
string utf8Output = _process.StandardOutput.ReadToEnd();

The only effect that StandardOutputEncoding has no impact whatsoever on the stdout of the executed application. The only thing it does is set the encoding of the StreamReader that sits on top of the binary stdout stream captured from the application being run.

This is OK for applications that will natively output UTF8 or Unicode stdout, but most Microsoft utilities do not do so, and instead will only encode the results per the console's codepage. The codepage of the console is manually set with the WIN32 API SetConsoleOutputCP and SetConsoleCP, and needs to be manually forced to UTF8 if that's what you'd like to read. This needs to be done on the console the exe is being executed within, and to the best of my knowledge, cannot be done from the host's .NET environment.

As such, I have written a proxy application dubbed UtfRedirect, the source code of which I have published on GitHub under the terms of the MIT license, which is intended to be spawned in the .NET host, then told which exe to execute. It'll set the codepage for the console of the final slave exe, then run it and pipe the stdout back to the host.

Sample UtfRedirector invocation:

//At the time of creating the process:
_process = new Process
                {
                    StartInfo =
                        {
                            FileName = application,
                            Arguments = arguments,
                            RedirectStandardInput = true,
                            RedirectStandardOutput = true,
                            StandardOutputEncoding = Encoding.UTF8,
                            StandardErrorEncoding =  Encoding.UTF8,
                            UseShellExecute = false,
                        },
                };

_process.StartInfo.Arguments = "";
_process.StartInfo.FileName = "UtfRedirect.exe"

//At the time of running the process
_process.Start();

//Write the name of the final slave exe to the stdin of UtfRedirector in UTF8
var bytes = Encoding.UTF8.GetBytes(application);
_process.StandardInput.BaseStream.Write(bytes, 0, bytes.Length);
_process.StandardInput.WriteLine();

//Write the arguments to be sent to the final slave exe to the stdin of UtfRedirector in UTF8
bytes = Encoding.UTF8.GetBytes(arguments);
_process.StandardInput.BaseStream.Write(bytes, 0, bytes.Length);
_process.StandardInput.WriteLine();

//Read the output that has been proxied with a forced codepage of UTF8
string utf8Output = _process.StandardOutput.ReadToEnd();
不醒的梦 2024-12-13 02:03:52

现代.NET选项:

Console.OutputEncoding = System.Text.Encoding.UTF8;

来源

modern .NET option:

Console.OutputEncoding = System.Text.Encoding.UTF8;

Source

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文