C# 将字符串从 UTF-8 转换为 ISO-8859-1 (Latin1) H

发布于 2024-08-15 04:13:03 字数 406 浏览 11 评论 0原文

我用谷歌搜索了这个主题,并查看了每个答案,但我仍然不明白。

基本上我需要将 UTF-8 字符串转换为 ISO-8859-1,我使用以下代码执行此操作:

Encoding iso = Encoding.GetEncoding("ISO-8859-1");
Encoding utf8 = Encoding.UTF8;
string msg = iso.GetString(utf8.GetBytes(Message));

我的源字符串是

Message = "ÄäÖöÕõÜü"

但不幸的是我的结果字符串变成了

msg = "�ä�ö�õ�ü

我在这里做错了什么?

I have googled on this topic and I have looked at every answer, but I still don't get it.

Basically I need to convert UTF-8 string to ISO-8859-1 and I do it using following code:

Encoding iso = Encoding.GetEncoding("ISO-8859-1");
Encoding utf8 = Encoding.UTF8;
string msg = iso.GetString(utf8.GetBytes(Message));

My source string is

Message = "ÄäÖöÕõÜü"

But unfortunately my result string becomes

msg = "�ä�ö�õ�ü

What I'm doing wrong here?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(10

白芷 2024-08-22 04:13:03

使用 Encoding.Convert 调整字节数组在尝试将其解码为目标编码之前。

Encoding iso = Encoding.GetEncoding("ISO-8859-1");
Encoding utf8 = Encoding.UTF8;
byte[] utfBytes = utf8.GetBytes(Message);
byte[] isoBytes = Encoding.Convert(utf8, iso, utfBytes);
string msg = iso.GetString(isoBytes);

Use Encoding.Convert to adjust the byte array before attempting to decode it into your destination encoding.

Encoding iso = Encoding.GetEncoding("ISO-8859-1");
Encoding utf8 = Encoding.UTF8;
byte[] utfBytes = utf8.GetBytes(Message);
byte[] isoBytes = Encoding.Convert(utf8, iso, utfBytes);
string msg = iso.GetString(isoBytes);
生死何惧 2024-08-22 04:13:03

我认为你的问题是你假设表示 utf8 字符串的字节在解释为其他内容时将产生相同的字符串(iso-8859-1)。但事实并非如此。我建议您阅读 Joel spolsky 撰写的这篇优秀文章

I think your problem is that you assume that the bytes that represent the utf8 string will result in the same string when interpreted as something else (iso-8859-1). And that is simply just not the case. I recommend that you read this excellent article by Joel spolsky.

哭了丶谁疼 2024-08-22 04:13:03

试试这个:

Encoding iso = Encoding.GetEncoding("ISO-8859-1");
Encoding utf8 = Encoding.UTF8;
byte[] utfBytes = utf8.GetBytes(Message);
byte[] isoBytes = Encoding.Convert(utf8,iso,utfBytes);
string msg = iso.GetString(isoBytes);

Try this:

Encoding iso = Encoding.GetEncoding("ISO-8859-1");
Encoding utf8 = Encoding.UTF8;
byte[] utfBytes = utf8.GetBytes(Message);
byte[] isoBytes = Encoding.Convert(utf8,iso,utfBytes);
string msg = iso.GetString(isoBytes);
自控 2024-08-22 04:13:03

您需要首先修复字符串的来源。

.NET 中的字符串实际上只是 16 位 unicode 代码点、字符的数组,因此字符串不采用任何特定的编码。

当您获取该字符串并将其转换为一组字节时,编码就会发挥作用。

无论如何,如您所见,您使用一种字符集将字符串编码为字节数组,然后使用另一种字符集对其进行解码的方式将行不通。

您能否告诉我们更多有关原始字符串的来源以及您认为它编码错误的原因?

You need to fix the source of the string in the first place.

A string in .NET is actually just an array of 16-bit unicode code-points, characters, so a string isn't in any particular encoding.

It's when you take that string and convert it to a set of bytes that encoding comes into play.

In any case, the way you did it, encoded a string to a byte array with one character set, and then decoding it with another, will not work, as you see.

Can you tell us more about where that original string comes from, and why you think it has been encoded wrong?

生生漫 2024-08-22 04:13:03

看起来有点奇怪的代码。要从 Utf8 字节流中获取字符串,您需要做的是:

string str = Encoding.UTF8.GetString(utf8ByteArray);

如果您需要将 iso-8859-1 字节流保存到某个地方,则只需使用:
之前的附加代码行:

byte[] iso88591data = Encoding.GetEncoding("ISO-8859-1").GetBytes(str);

Seems bit strange code. To get string from Utf8 byte stream all you need to do is:

string str = Encoding.UTF8.GetString(utf8ByteArray);

If you need to save iso-8859-1 byte stream to somewhere then just use:
additional line of code for previous:

byte[] iso88591data = Encoding.GetEncoding("ISO-8859-1").GetBytes(str);
臻嫒无言 2024-08-22 04:13:03

也许它可以帮助
将一个代码页转换为另一代码页:

    public static string fnStringConverterCodepage(string sText, string sCodepageIn = "ISO-8859-8", string sCodepageOut="ISO-8859-8")
    {
        string sResultado = string.Empty;
        try
        {
            byte[] tempBytes;
            tempBytes = System.Text.Encoding.GetEncoding(sCodepageIn).GetBytes(sText);
            sResultado = System.Text.Encoding.GetEncoding(sCodepageOut).GetString(tempBytes);
        }
        catch (Exception)
        {
            sResultado = "";
        }
        return sResultado;
    }

用法:

string sMsg = "ERRO: Não foi possivel acessar o servico de Autenticação";
var sOut = fnStringConverterCodepage(sMsg ,"ISO-8859-1","UTF-8"));

输出:

"Não foi possivel acessar o servico de Autenticação"

Maybe it can help
Convert one codepage to another:

    public static string fnStringConverterCodepage(string sText, string sCodepageIn = "ISO-8859-8", string sCodepageOut="ISO-8859-8")
    {
        string sResultado = string.Empty;
        try
        {
            byte[] tempBytes;
            tempBytes = System.Text.Encoding.GetEncoding(sCodepageIn).GetBytes(sText);
            sResultado = System.Text.Encoding.GetEncoding(sCodepageOut).GetString(tempBytes);
        }
        catch (Exception)
        {
            sResultado = "";
        }
        return sResultado;
    }

Usage:

string sMsg = "ERRO: Não foi possivel acessar o servico de Autenticação";
var sOut = fnStringConverterCodepage(sMsg ,"ISO-8859-1","UTF-8"));

Output:

"Não foi possivel acessar o servico de Autenticação"
真心难拥有 2024-08-22 04:13:03
Encoding targetEncoding = Encoding.GetEncoding(1252);
// Encode a string into an array of bytes.
Byte[] encodedBytes = targetEncoding.GetBytes(utfString);
// Show the encoded byte values.
Console.WriteLine("Encoded bytes: " + BitConverter.ToString(encodedBytes));
// Decode the byte array back to a string.
String decodedString = Encoding.Default.GetString(encodedBytes);
Encoding targetEncoding = Encoding.GetEncoding(1252);
// Encode a string into an array of bytes.
Byte[] encodedBytes = targetEncoding.GetBytes(utfString);
// Show the encoded byte values.
Console.WriteLine("Encoded bytes: " + BitConverter.ToString(encodedBytes));
// Decode the byte array back to a string.
String decodedString = Encoding.Default.GetString(encodedBytes);
蓦然回首 2024-08-22 04:13:03

首先,指定输入和输出编码(没有办法从txt文件中准确识别编码,您必须知道它......):

Encoding tempInEncoding = Encoding.GetEncoding("utf-8"); // ("Windows-1252");
Encoding tempOutEncoding = Encoding.GetEncoding("iso-8859-1");

对于您可以放入GetEncoding中的每个“名称”,请参阅此处的MS表:
https://learn. microsoft.com/it-it/dotnet/api/system.text.encodinginfo.getencoding?view=net-8.0

然后将字符串从输入编码转换为输出编码。

// GET - encoding conversion 
string tempOutputStringConverted = get1StringWithEncodingConversion(tempInEncoding, 
                                                                    tempOutEncoding,
                                                                    TextToConvert);


    /// <summary>
    /// GET - convert a string from an input Encoding to an Output encoding
    /// </summary>       
    private static string get1StringWithEncodingConversion(Encoding iInEncoding, Encoding iOutEncoding, string iRow)
    {
        // GET
        byte[] tempInputBytes = iInEncoding.GetBytes(iRow);
        byte[] tempOutputBytes = Encoding.Convert(iInEncoding, iOutEncoding, tempInputBytes);
        // GET - conversion
        string tempOutputString = iOutEncoding.GetString(tempOutputBytes);
        // RET
        return tempOutputString;
    }

First, specify the input and output encoding (There is no way to exactly identify an encoding from txt file, you must know it...):

Encoding tempInEncoding = Encoding.GetEncoding("utf-8"); // ("Windows-1252");
Encoding tempOutEncoding = Encoding.GetEncoding("iso-8859-1");

For every "name" you can put inside the GetEncoding, refer to MS table here:
https://learn.microsoft.com/it-it/dotnet/api/system.text.encodinginfo.getencoding?view=net-8.0

Then convert the string from input encoding to output encoding.

// GET - encoding conversion 
string tempOutputStringConverted = get1StringWithEncodingConversion(tempInEncoding, 
                                                                    tempOutEncoding,
                                                                    TextToConvert);


    /// <summary>
    /// GET - convert a string from an input Encoding to an Output encoding
    /// </summary>       
    private static string get1StringWithEncodingConversion(Encoding iInEncoding, Encoding iOutEncoding, string iRow)
    {
        // GET
        byte[] tempInputBytes = iInEncoding.GetBytes(iRow);
        byte[] tempOutputBytes = Encoding.Convert(iInEncoding, iOutEncoding, tempInputBytes);
        // GET - conversion
        string tempOutputString = iOutEncoding.GetString(tempOutputBytes);
        // RET
        return tempOutputString;
    }
旧人哭 2024-08-22 04:13:03

刚刚使用了内森的解决方案,效果很好。我需要将 ISO-8859-1 转换为 Unicode:

string isocontent = Encoding.GetEncoding("ISO-8859-1").GetString(fileContent, 0, fileContent.Length);
byte[] isobytes = Encoding.GetEncoding("ISO-8859-1").GetBytes(isocontent);
byte[] ubytes = Encoding.Convert(Encoding.GetEncoding("ISO-8859-1"), Encoding.Unicode, isobytes);
return Encoding.Unicode.GetString(ubytes, 0, ubytes.Length);

Just used the Nathan's solution and it works fine. I needed to convert ISO-8859-1 to Unicode:

string isocontent = Encoding.GetEncoding("ISO-8859-1").GetString(fileContent, 0, fileContent.Length);
byte[] isobytes = Encoding.GetEncoding("ISO-8859-1").GetBytes(isocontent);
byte[] ubytes = Encoding.Convert(Encoding.GetEncoding("ISO-8859-1"), Encoding.Unicode, isobytes);
return Encoding.Unicode.GetString(ubytes, 0, ubytes.Length);
雪落纷纷 2024-08-22 04:13:03

这是 ISO-8859-9 的示例;

protected void btnKaydet_Click(object sender, EventArgs e)
{
    Response.Clear();
    Response.Buffer = true;
    Response.ContentType = "application/vnd.openxmlformatsofficedocument.wordprocessingml.documet";
    Response.AddHeader("Content-Disposition", "attachment; filename=XXXX.doc");
    Response.ContentEncoding = Encoding.GetEncoding("ISO-8859-9");
    Response.Charset = "ISO-8859-9";
    EnableViewState = false;


    StringWriter writer = new StringWriter();
    HtmlTextWriter html = new HtmlTextWriter(writer);
    form1.RenderControl(html);


    byte[] bytesInStream = Encoding.GetEncoding("iso-8859-9").GetBytes(writer.ToString());
    MemoryStream memoryStream = new MemoryStream(bytesInStream);


    string msgBody = "";
    string Email = "[email protected]";
    SmtpClient client = new SmtpClient("mail.xxxxx.org");
    MailMessage message = new MailMessage(Email, "[email protected]", "ONLINE APP FORM WITH WORD DOC", msgBody);
    Attachment att = new Attachment(memoryStream, "XXXX.doc", "application/vnd.openxmlformatsofficedocument.wordprocessingml.documet");
    message.Attachments.Add(att);
    message.BodyEncoding = System.Text.Encoding.UTF8;
    message.IsBodyHtml = true;
    client.Send(message);}

Here is a sample for ISO-8859-9;

protected void btnKaydet_Click(object sender, EventArgs e)
{
    Response.Clear();
    Response.Buffer = true;
    Response.ContentType = "application/vnd.openxmlformatsofficedocument.wordprocessingml.documet";
    Response.AddHeader("Content-Disposition", "attachment; filename=XXXX.doc");
    Response.ContentEncoding = Encoding.GetEncoding("ISO-8859-9");
    Response.Charset = "ISO-8859-9";
    EnableViewState = false;


    StringWriter writer = new StringWriter();
    HtmlTextWriter html = new HtmlTextWriter(writer);
    form1.RenderControl(html);


    byte[] bytesInStream = Encoding.GetEncoding("iso-8859-9").GetBytes(writer.ToString());
    MemoryStream memoryStream = new MemoryStream(bytesInStream);


    string msgBody = "";
    string Email = "[email protected]";
    SmtpClient client = new SmtpClient("mail.xxxxx.org");
    MailMessage message = new MailMessage(Email, "[email protected]", "ONLINE APP FORM WITH WORD DOC", msgBody);
    Attachment att = new Attachment(memoryStream, "XXXX.doc", "application/vnd.openxmlformatsofficedocument.wordprocessingml.documet");
    message.Attachments.Add(att);
    message.BodyEncoding = System.Text.Encoding.UTF8;
    message.IsBodyHtml = true;
    client.Send(message);}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文