”“xC9;”没有正确转换为两个字节

发布于 2024-08-29 08:21:17 字数 1400 浏览 6 评论 0原文

进一步这个问题我有一个补充问题。

我找到了一首标题中带有“É”的曲目。

我的代码:

var playList = new StreamWriter(playlist, false, Encoding.UTF8);

-

private static void WriteUTF8(StreamWriter playList, string output)
{
    byte[] byteArray = Encoding.UTF8.GetBytes(output);
    foreach (byte b in byteArray)
    {
        playList.Write(Convert.ToChar(b));
    }
}

将其转换为以下字节:

195
137

输出为 à 后跟一个正方形(这是一个无法以当前字体打印的字符)。

我已将相同的文件导出到 Media Monkey 中的播放列表,它将“É”写为“�” - 我假设这是正确的(正如 KennyTM 指出的)。

我的问题是,如何获得“‰”符号输出?我是否需要选择不同的字体?如果需要,选择哪一种?

更新

人们似乎没有抓住要点。

我可以使用“É”写入文件,

playList.WriteLine("É");

这不是问题。

问题是 Media Monkey 要求文件采用以下格式:

#EXTINFUTF8:140,Yann Tiersen - Comptine D'Un Autre Été: L'Après Midi
#EXTINF:140,Yann Tiersen - Comptine D'Un Autre Été: L'Après Midi
#UTF8:04-Comptine D'Un Autre Été- L'Après Midi.mp3
04-Comptine D'Un Autre Été- L'Après Midi.mp3

其中所有“high-ascii”(为了缺少更好的术语)都以一对字符的形式写出。

更新 2

我应该将 c9 替换为 c3 89

我本来打算把我实际得到的内容放在上面,但在为此进行测试时,我设法让一个测试程序以“按原样”以正确的格式输出文本。所以我需要做更多的调查。

Further to this question I've got a supplementary problem.

I've found a track with an "É" in the title.

My code:

var playList = new StreamWriter(playlist, false, Encoding.UTF8);

-

private static void WriteUTF8(StreamWriter playList, string output)
{
    byte[] byteArray = Encoding.UTF8.GetBytes(output);
    foreach (byte b in byteArray)
    {
        playList.Write(Convert.ToChar(b));
    }
}

converts this to the following bytes:

195
137

which is being output as à followed by a square (which is an character that can't be printed in the current font).

I've exported the same file to a playlist in Media Monkey at it writes the "É" as "É" - which I'm assuming is correct (as KennyTM pointed out).

My question is, how do I get the "‰" symbol output? Do I need to select a different font and if so which one?

UPDATE

People seem to be missing the point.

I can get the "É" written to the file using

playList.WriteLine("É");

that's not the problem.

The problem is that Media Monkey requires the file to be in the following format:

#EXTINFUTF8:140,Yann Tiersen - Comptine D'Un Autre Été: L'Après Midi
#EXTINF:140,Yann Tiersen - Comptine D'Un Autre Été: L'Après Midi
#UTF8:04-Comptine D'Un Autre Été- L'Après Midi.mp3
04-Comptine D'Un Autre Été- L'Après Midi.mp3

Where all the "high-ascii" (for want of a better term) are written out as a pair of characters.

UPDATE 2

I should be getting c9 replaced by c3 89.

I was going to put what I'm actually getting, but in doing the tests for this I've managed to get a test program to output the text in the right format "as is". So I need to do some more investigation.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

请你别敷衍 2024-09-05 08:21:17

像这样使用Convert.ToChar几乎可以肯定是一个坏主意。您基本上对事物进行了两次编码。

您应该自己执行转换,然后直接写入流,或者您应该让StreamWriter执行转换。如果您尝试自己执行转换,为什么还要使用 StreamWriter 呢?

您是否正在尝试写入二进制文件或简单的文本文件?如果它是一个简单的文本文件,只需使用 StreamWriter 并让它进行转换。如果是二进制文件,请使用 Stream 而不是 StreamWriter,并在需要的地方直接执行文本编码,然后将字节直接写入流。

编辑:这就是您的原始代码发生的情况:

Encoding.UTF8.GetBytes(text) => byte[] { 0xc3, 0x89 };

Convert.ToChar(0xc3) => char U+00C3
StreamWriter writes U+00C3 as byte[] { 0xc3, 0x83 };

Convert.ToChar(0x89) => char U+0089
StreamWriter writes U+00C3 as byte[] { 0xc2, 0x89 };

这就是为什么您将 c3 83 c2 89 写入文件。

Using Convert.ToChar like that is almost certainly a bad idea. You're basically encoding things twice.

You should either be performing the conversion yourself and then writing directly to a stream, or you should be letting the StreamWriter do the conversion. Why are you using a StreamWriter at all if you're trying to perform the conversions yourself?

Are you trying to write to a binary file, or a simple text file? If it's a simple text file, just use a StreamWriter and let that do the conversion. If it's a binary file, use a Stream instead of a StreamWriter, and perform text encoding directly where you need to, writing the bytes straight to the stream afterwards.

EDIT: Here's what's happening with your original code:

Encoding.UTF8.GetBytes(text) => byte[] { 0xc3, 0x89 };

Convert.ToChar(0xc3) => char U+00C3
StreamWriter writes U+00C3 as byte[] { 0xc3, 0x83 };

Convert.ToChar(0x89) => char U+0089
StreamWriter writes U+00C3 as byte[] { 0xc2, 0x89 };

So that's why you're getting c3 83 c2 89 written to the file.

别想她 2024-09-05 08:21:17

StreamWriter 已经将您发送给它的字符转换为 UTF-8 — 这就是它的全部目的。扔掉WriteUTF8;它坏了而且没用。

WriteUTF8 正在获取字符,将它们转换为 UTF-8 字节,将每个字节转换为其在当前代码页中映射到的字符,然后对每个这些字符进行编码在 UTF-8 中,最好的情况是使用双重 UTF-8 编码的字符串;在最坏的情况下,您会完全丢失未映射到系统代码页的字节;对于 DBCS 代码页来说尤其糟糕。 .)

Media Monkey 遇到的问题可能只是它根本不支持 UTF-8 或 Unicode 文件名。尝试要求它播放包含不适合您的系统代码页的字符的文件(并导出播放列表),例如将文件重命名为 αβγ.mp3

编辑:

#EXTINFUTF8:140,Yann Tiersen - Comptine D'Un Autre Été: L'Après Midi
#EXTINF:140,Yann Tiersen - Comptine D'Un Autre Été: L'Après Midi
#UTF8:04-Comptine D'Un Autre Été- L'Après Midi.mp3
04-Comptine D'Un Autre Été- L'Après Midi.mp3

好的,您在同一个文件中混合了多种编码:难怪文本编辑器在打开它时会遇到问题。未注释的行和 #EXTINF 行位于系统默认代码页中,用于支持无法读取 Unicode 文件名的媒体播放器。系统代码页中不存在的任何文件名字符(例如,上面的希腊语,在西方 Windows 安装中)将被破坏,并且对于任何不了解 #UTF8 (和 #EXTINFUTF8 用于描述)行。

因此,如果这是您的目标格式,您需要获取两种编码并依次使用每种编码,如下所示:

private static void writePlaylistEntry(Stream playlist, string filename, int length) {
    Encoding utf8= new UTF8Encoding(false);
    Encoding ansi= Encoding.Default;
    playlist.Write(utf8.GetBytes("#EXTINFUTF8:"+length+","+filename+"\n"));
    playlist.Write(ansi.GetBytes("#EXTINF:"+length+","+filename+"\n"));
    playlist.Write(utf8.GetBytes("#UTF8:"+filename+"\n"));
    playlist.Write(ansi.GetBytes(filename+"\n"));
}

StreamWriter already converts the characters you send it to UTF-8 — that's its entire purpose. Throw WriteUTF8 away; it's broken and useless.

(WriteUTF8 is taking characters, converting them to UTF-8 bytes, converting each single byte to the character it maps to in the current code page, then encoding each of those characters in UTF-8. So in the best case you have a doubly-UTF-8-encoded string; in the worst, you've completely lost bytes that weren't mapped in the system code page repertoire; especially bad for DBCS code pages.)

The problem you're having with Media Monkey may be just that it doesn't support UTF-8 or Unicode filenames at all. Try asking it to play (and export a playlist for) files with characters that don't fit in your system codepage, for example by renaming a file to αβγ.mp3.

Edit:

#EXTINFUTF8:140,Yann Tiersen - Comptine D'Un Autre Été: L'Après Midi
#EXTINF:140,Yann Tiersen - Comptine D'Un Autre Été: L'Après Midi
#UTF8:04-Comptine D'Un Autre Été- L'Après Midi.mp3
04-Comptine D'Un Autre Été- L'Après Midi.mp3

OK, what you've got there is a mixture of encodings in the same file: it's no wonder text editors are going to have trouble opening it. The uncommented and #EXTINF lines are in the system default code page, and are present to support media players that can't read Unicode filenames. Any filename characters not present in the system code page (eg. Greek as above, on a Western Windows install) will be mangled and unplayable for anything that doesn't know about the #UTF8 (and #EXTINFUTF8 for the description) lines.

So if this is your target format, you'll need to grab two encodings and use each in turn, something like:

private static void writePlaylistEntry(Stream playlist, string filename, int length) {
    Encoding utf8= new UTF8Encoding(false);
    Encoding ansi= Encoding.Default;
    playlist.Write(utf8.GetBytes("#EXTINFUTF8:"+length+","+filename+"\n"));
    playlist.Write(ansi.GetBytes("#EXTINF:"+length+","+filename+"\n"));
    playlist.Write(utf8.GetBytes("#UTF8:"+filename+"\n"));
    playlist.Write(ansi.GetBytes(filename+"\n"));
}
看轻我的陪伴 2024-09-05 08:21:17

我不使用 C#,但症状告诉我,您确实将其编写为 UTF-8,但是输出/控制台/应用程序/您查看写入输出的任何内容是不使用 UTF-8,而是使用 ISO-8859-1 来显示它们,并且 MediaMonkey 使用 CP1252 来显示它们。

如果您在 IDE 控制台中查看它们,则需要将 IDE 配置为使用 UTF-8 作为控制台和文本文件编码。

更新您显然想要将 UTF-8 数据写入 CP-1252。现在问题更清楚了。再说一次,我不做 C#,但 Java 的等价物是:

Writer writer = new OutputStreamWriter(new FileOutputStream("file.ext"), "CP-1252");
writer.write(someUTF8String); // Will be written as CP-1252. "É" would become "É"

希望这能提供一些见解。

I don't do C# but the symptoms tell me that you're indeed writing it as UTF-8, but that the output/console/application/whatever with which you're viewing the written output is not using UTF-8, but ISO-8859-1 to display them and that MediaMonkey is using CP1252 to display them.

If you're viewing them in the IDE console, then you need to configure the IDE to use UTF-8 as console and text file encoding.

Update you apparently want to write UTF-8 data as CP-1252. Now the question/problem is more clear. Again, I don't do C#, but the Java equivalent would be:

Writer writer = new OutputStreamWriter(new FileOutputStream("file.ext"), "CP-1252");
writer.write(someUTF8String); // Will be written as CP-1252. "É" would become "É"

Hopefully this gives some insights.

陌上青苔 2024-09-05 08:21:17

更根本的问题在于该方法的名称:

 private static void WriteUTF8(...)

.M3U 文件不是 UTF-8。它们是 Latin-1(或 Windows-1252)。

您应该使用 Encoding.GetEncoding(1252),而不是 Encoding.UTF8。然后你可以直接写入流,你不需要任何这种奇怪的转换。

更新:

我刚刚尝试了以下 C# 代码,生成的 .M3U 在 Winamp 和 WMP 中都可以正常打开:

static void Main(string[] args)
{
    string fileName = @"C:\Temp\Test.m3u";
    using (StreamWriter writer = new StreamWriter(fileName, false,
        Encoding.GetEncoding(1252)))
    {
        writer.WriteLine("#EXTM3U");
        writer.WriteLine("#EXTINF:140,Yann Tiersen " +
            "- Comptine D'Un Autre Été: L'Après Midi");
        writer.WriteLine("04-Comptine D'Un Autre Été- L'Après Midi.mp3");
    }
}

所以,正如我所说 - 只需使用正确的编码即可。您不需要所有这些额外的 #EXTINFUTF8#UTF8 行,除非这是 Media Monkey 的一些奇怪的要求(它绝对不是基本 M3U 规范的一部分)。

The more fundamental problem is in the name of the method:

 private static void WriteUTF8(...)

.M3U files aren't UTF-8. They're Latin-1 (or Windows-1252).

Instead of Encoding.UTF8, you should be using Encoding.GetEncoding(1252). Then you can just write directly to the stream, you won't need any of this conversion weirdness.

Update:

I just tried the following C# code and the resulting .M3U opens just fine in both Winamp and WMP:

static void Main(string[] args)
{
    string fileName = @"C:\Temp\Test.m3u";
    using (StreamWriter writer = new StreamWriter(fileName, false,
        Encoding.GetEncoding(1252)))
    {
        writer.WriteLine("#EXTM3U");
        writer.WriteLine("#EXTINF:140,Yann Tiersen " +
            "- Comptine D'Un Autre Été: L'Après Midi");
        writer.WriteLine("04-Comptine D'Un Autre Été- L'Après Midi.mp3");
    }
}

So, as I said - just use the right encoding to begin with. You don't need all those extra #EXTINFUTF8 and #UTF8 lines, unless it's some bizarre requirement for Media Monkey (it's definitely not part of the basic M3U spec).

我是有多爱你 2024-09-05 08:21:17

好的,首先感谢大家的帮助和耐心。

我终于让它正常工作了。我已经实现了 bobince 解决方案的一个版本,这就是他获得接受的原因(向其他人投了赞成票)。这是我的代码:

var playList = new StreamWriter(playlist, false, Encoding.Default);
playList.WriteLine("#EXTM3U");

foreach (string track in tracks)
{
    // Read ID3 tags from file
    var info = new FileProperties(track);

    // Write extended info (#EXTINF:<time>,<artist> - <title>
    if (Encoding.UTF8.GetBytes(info.Artist).Length != info.Artist.Length ||
        Encoding.UTF8.GetBytes(info.Title).Length != info.Title.Length)
    {
        playList.Close();
        playList = new StreamWriter(playlist, true, Encoding.UTF8);

        playList.WriteLine(string.Format("#EXTINFUTF8:{0},{1} - {2}",
                           info.Duration, info.Artist, info.Title));

        playList.Close();
        playList = new StreamWriter(playlist, true, Encoding.Default);
    }

    playList.WriteLine(string.Format("#EXTINF:{0},{1} - {2}",
                       info.Duration, info.Artist, info.Title));

    // Write the name of the file (removing the drive letter)
    string file = Path.GetFileName(track);
    if (Encoding.UTF8.GetBytes(file).Length != file.Length)
    {
        playList.Close();
        playList = new StreamWriter(playlist, true, Encoding.UTF8);

        playList.WriteLine(string.Format("#UTF8:{0}", file));

        playList.Close();
        playList = new StreamWriter(playlist, true, Encoding.Default);
    }

    playList.WriteLine(file);
}

playList.Close();

如您所见,我假设我不必编写 UTF8,但是当我这样做时,我会关闭流并使用 UTF8 编码重新打开它。然后,在编写有问题的行后,关闭并使用默认编码重新打开它。

现在我不知道为什么我以前的代码给出了不一致的结果。鉴于每个人(尤其是乔恩)的说法,它应该一直失败,或者可能一直有效。

Right, first off thanks to everyone for their help and patience.

I've finally got it working correctly. I've implemented a version of bobince's solution which is why he gets the acceptance (up-votes to everyone else). Here's my code:

var playList = new StreamWriter(playlist, false, Encoding.Default);
playList.WriteLine("#EXTM3U");

foreach (string track in tracks)
{
    // Read ID3 tags from file
    var info = new FileProperties(track);

    // Write extended info (#EXTINF:<time>,<artist> - <title>
    if (Encoding.UTF8.GetBytes(info.Artist).Length != info.Artist.Length ||
        Encoding.UTF8.GetBytes(info.Title).Length != info.Title.Length)
    {
        playList.Close();
        playList = new StreamWriter(playlist, true, Encoding.UTF8);

        playList.WriteLine(string.Format("#EXTINFUTF8:{0},{1} - {2}",
                           info.Duration, info.Artist, info.Title));

        playList.Close();
        playList = new StreamWriter(playlist, true, Encoding.Default);
    }

    playList.WriteLine(string.Format("#EXTINF:{0},{1} - {2}",
                       info.Duration, info.Artist, info.Title));

    // Write the name of the file (removing the drive letter)
    string file = Path.GetFileName(track);
    if (Encoding.UTF8.GetBytes(file).Length != file.Length)
    {
        playList.Close();
        playList = new StreamWriter(playlist, true, Encoding.UTF8);

        playList.WriteLine(string.Format("#UTF8:{0}", file));

        playList.Close();
        playList = new StreamWriter(playlist, true, Encoding.Default);
    }

    playList.WriteLine(file);
}

playList.Close();

As you can see I assume I'm not going to have to write UTF8, but when I do I close the stream and reopen it with UTF8 encoding. I then, after writing the offending line, close and reopen it with the default encoding.

Now I don't know why my previous code gave inconsistent results. Given what everyone (particularly Jon) said it should have failed all the time, or possibly worked all of the time.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文