我需要帮助将 C# 字符串从一种字符编码转换为另一种字符编码?

发布于 2024-10-19 07:18:45 字数 1553 浏览 4 评论 0原文

根据 Spolsky 我不能称自己为开发人员,所以有很多耻辱这个问题背后...

场景:从 C# 应用程序中,我想从 SQL 数据库中获取一个字符串值并将其用作目录的名称。我有一个安全 (SSL) FTP 服务器,我想在该服务器上使用数据库中的字符串值设置当前目录。
问题:一切都工作正常,直到我遇到带有“特殊”字符的字符串值 - 我似乎无法正确编码目录名称以满足 FTP 服务器的要求。

下面的代码示例

  • 示例,
  • 使用“特殊”字符 é 作为使用 WinSCP 作为 ftps 通信的外部应用程序的
  • 未显示设置进程“_winscp”所需的所有代码。
  • 为了简单起见,通过写入进程标准输入向 WinSCP exe 发送命令
  • ,不从数据库获取信息,而是简单地声明一个字符串(但我确实做了一个 .Equals 来确认来自数据库的值与声明的字符串)
  • 使用不同的字符串编码尝试设置 FTP 服务器上的当前目录 - 所有这些都失败
  • 尝试使用从手工制作的字节数组创建的字符串设置目录 -

Process _winscp = new Process();
byte[] buffer;

string nameFromString = "Sinéad O'Connor";
_winscp.StandardInput.WriteLine("cd \"" + nameFromString + "\"");

buffer = Encoding.UTF8.GetBytes(nameFromString);
_winscp.StandardInput.WriteLine("cd \"" + Encoding.UTF8.GetString(buffer) + "\"");

buffer = Encoding.ASCII.GetBytes(nameFromString);
_winscp.StandardInput.WriteLine("cd \"" + Encoding.ASCII.GetString(buffer) + "\"");

byte[] nameFromBytes = new byte[] { 83, 105, 110, 130, 97, 100, 32, 79, 39, 67, 111, 110, 110, 111, 114 };
_winscp.StandardInput.WriteLine("cd \"" + Encoding.Default.GetString(nameFromBytes) + "\"");

UTF8编码将 é 更改为 101(十进制),但 FTP 服务器不喜欢它。

ASCII 编码将 é 更改为 63(十进制),但 FTP 服务器不喜欢它。

当我将 é 表示为值 130(十进制)时,FTP 服务器很高兴,但我找不到可以为我执行此操作的方法(我必须从显式字节手动构造字符串)。

有人知道我应该如何处理我的字符串,将 é 编码为 130 并使 FTP 服务器满意,并通过解释开发人员应该理解的唯一一件事,最终将我提升为 1 级开发人员?

According to Spolsky I can't call myself a developer, so there is a lot of shame behind this question...

Scenario: From a C# application, I would like to take a string value from a SQL db and use it as the name of a directory. I have a secure (SSL) FTP server on which I want to set the current directory using the string value from the DB.
Problem: Everything is working fine until I hit a string value with a "special" character - I seem unable to encode the directory name correctly to satisfy the FTP server.

The code example below

  • uses "special" character é as an example
  • uses WinSCP as an external application for the ftps comms
  • does not show all the code required to setup the Process "_winscp".
  • sends commands to the WinSCP exe by writing to the process standardinput
  • for simplicity, does not get the info from the DB, but instead simply declares a string (but I did do a .Equals to confirm that the value from the DB is the same as the declared string)
  • makes three attempts to set the current directory on the FTP server using different string encodings - all of which fail
  • makes an attempt to set the directory using a string that was created from a hand-crafted byte array - which works

Process _winscp = new Process();
byte[] buffer;

string nameFromString = "Sinéad O'Connor";
_winscp.StandardInput.WriteLine("cd \"" + nameFromString + "\"");

buffer = Encoding.UTF8.GetBytes(nameFromString);
_winscp.StandardInput.WriteLine("cd \"" + Encoding.UTF8.GetString(buffer) + "\"");

buffer = Encoding.ASCII.GetBytes(nameFromString);
_winscp.StandardInput.WriteLine("cd \"" + Encoding.ASCII.GetString(buffer) + "\"");

byte[] nameFromBytes = new byte[] { 83, 105, 110, 130, 97, 100, 32, 79, 39, 67, 111, 110, 110, 111, 114 };
_winscp.StandardInput.WriteLine("cd \"" + Encoding.Default.GetString(nameFromBytes) + "\"");

The UTF8 encoding changes é to 101 (decimal) but the FTP server doesn't like it.

The ASCII encoding changes é to 63 (decimal) but the FTP server doesn't like it.

When I represent é as value 130 (decimal) the FTP server is happy, except I can't find a method that will do this for me (I had to manually contruct the string from explicit bytes).

Anyone know what I should do to my string to encode the é as 130 and make the FTP server happy and finally elevate me to level 1 developer by explaining the only single thing a developer should understand?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

一念一轮回 2024-10-26 07:18:45

130 不是 ASCII(ASCII 只有 7 位 - 请参阅 Encoding.ASCII 文档 - 所以它将“é”敲成普通的“?”,因为它没有更好的事情可做)。 UTF-8 实际上将字符编码为两个字节(十进制:195 和 169),但保留代码点。

明确使用代码页,例如拉丁语 (CP 1252) -- 需要匹配另一边。如下所示,输出中没有“130”,所以...不是您需要的编码:-) 但这同样适用:对特定代码页使用编码。

编辑:正如 Hans Passant 在评论中解释的那样,此处使用的代码页是 MS-DOS (CP 437) 这将产生所需的结果。

// LINQPad -- Encoding is System.Text.Encoding
var enc = Encoding.GetEncoding(1252);
string.Join(" ", enc.GetBytes("Sinéad O'Connor")).Dump();
// -> 83 105 110 233 97 100 32 79 39 67 111 110 110 111 114

请参阅:http://msdn.microsoft.com/en-us/goglobal/bb688114 了解更多。

快乐编码。

顺便提一句。艺术家的良好选择——如果是故意的话:p

130 isn't ASCII (ASCII is only 7bits -- see the Encoding.ASCII documentation -- so it whacks the "é" into a normal "?" because it has nothing better to do). UTF-8 is actually encoding the character into two bytes (decimal: 195 & 169) but preserves the code-point.

Use a code-page explicitly, such as Latin (CP 1252) -- needs to match whatever other side is. As from below, there is no "130" in the output so... not the encoding you need :-) But the same applies: use an encoding for a specific code-page.

Edit: As Hans Passant explained in a comment, the code-page to use here is MS-DOS (CP 437) which will result in the desired results.

// LINQPad -- Encoding is System.Text.Encoding
var enc = Encoding.GetEncoding(1252);
string.Join(" ", enc.GetBytes("Sinéad O'Connor")).Dump();
// -> 83 105 110 233 97 100 32 79 39 67 111 110 110 111 114

See: http://msdn.microsoft.com/en-us/goglobal/bb688114 for more.

Happy coding.

Btw. good selection in artists -- if it was intentional :p

不忘初心 2024-10-26 07:18:45

我认为这里的问题是所有 .NET 字符串都是 Unicode 格式的。 .NET 字符串中没有“我是什么编码”。因此,使用 Encoding.ASCII.GetString(buffer) 可以将 ASCII 中的“字符串”转换回 Unicode。

我认为你的问题应该通过更改 Process.StandardInput 的编码来解决,这样你就可以在 WinSCP 中获得正确的编码。

OR

您应该检查 Encoding.Default 是什么,因为我很确定它不是 UTF8 或 ASCII。

I think problem here is that ALL .NET string are in Unicode. There is no "what encoding I'm" in .NET strings. So using Encoding.ASCII.GetString(buffer) you convert your "string" in ASCII back into Unicode.

I think your problem should be solved by changing encoding for Process.StandardInput, so you get correct encoding inside WinSCP.

OR

You should check what Encoding.Default is, because I'm pretty sure it's not UTF8 or ASCII.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文