使用文本框中的 ASCII/UTF8 数据以 ASCII 格式写入 P6 .ppm。编码混乱?
我对我正在编写的小工具的编码有一些普遍的困惑。
首先,我很抱歉下面的代码有点被破坏,但在我迄今为止编写的代码中,它是最接近实际工作的。
如果我使用以下代码:
/*create file*/
FileStream fileS = new FileStream(filename + ".ppm", FileMode.Create, FileAccess.ReadWrite, FileShare.None, 8, FileOptions.None);
/*create a binary writer*/
BinaryWriter bWriter = new BinaryWriter(fileS, Encoding.ASCII);
/*write ppm header*/
string buffer = "P6 ";
bWriter.Write(buffer.ToCharArray(), 0, buffer.Length);
buffer = width.ToString() + " ";
bWriter.Write(buffer.ToCharArray(), 0, buffer.Length);
buffer = height.ToString() + " ";
bWriter.Write(buffer.ToCharArray(), 0, buffer.Length);
buffer = "255 ";
bWriter.Write(buffer.ToCharArray(), 0, buffer.Length);
/*write data out*/
byte[] messageByte = Encoding.UTF8.GetBytes(ppmDataBox.Text);
bWriter.Write(messageByte, 0, messageByte.Length);
/*close writer and bWriter*/
bWriter.Close();
fileS.Close();
那么我得到的是一个以 UTF-8 格式保存的文件,如果我打开该文件并将其重新保存为 ASCII,我会得到我期望的 PPM。
但是,如果我将这一行更改为:
byte[] messageByte = Encoding.UTF8.GetBytes(ppmDataBox.Text);
然后
byte[] messageByte = Encoding.ASCII.GetBytes(ppmDataBox.Text);
我确实得到了一个以 ASCII 格式保存的文件,但该文件是错误的,颜色是错误的,并且基本上文件中的数据与文本框中的数据不匹配。
我假设文本框采用 UTF-8 格式,并且我粘贴到其中的数据实际上是 ASCII 格式/字符,我首先需要将该 ASCII 转换为其相应的 UTF-8...(也称为 UTF-8 版本)这些字符)。然而,如果我完全诚实的话,这是我第一次涉足编码世界,我完全一无所知。所以请让我知道我是否在胡言乱语。
这是我粘贴到文本框中的数据类型的示例:
ÿÿ ÿÿ ÿÿ ÿÿ aa aa aa ÿÿ ÿÿ ÿÿ ÿÿ ÿÿ ÿÿ ÿÿ ÿÿ ÿÿ ÿÿ ÿ
它本来是黄色的,到处都有小黑色方块,但它显示为绿色,当文件以 ASCII 格式创建时,数据最终看起来像这:
?? ?? ?? ?? aa aa aa ?? ?? ?? ??
Im having some general confusion with encoding on a little tool I'm writing.
First of all I apologise that the following code is a little butchered but of the code I have written so far, it's the closest to actually working.
If I use the following code:
/*create file*/
FileStream fileS = new FileStream(filename + ".ppm", FileMode.Create, FileAccess.ReadWrite, FileShare.None, 8, FileOptions.None);
/*create a binary writer*/
BinaryWriter bWriter = new BinaryWriter(fileS, Encoding.ASCII);
/*write ppm header*/
string buffer = "P6 ";
bWriter.Write(buffer.ToCharArray(), 0, buffer.Length);
buffer = width.ToString() + " ";
bWriter.Write(buffer.ToCharArray(), 0, buffer.Length);
buffer = height.ToString() + " ";
bWriter.Write(buffer.ToCharArray(), 0, buffer.Length);
buffer = "255 ";
bWriter.Write(buffer.ToCharArray(), 0, buffer.Length);
/*write data out*/
byte[] messageByte = Encoding.UTF8.GetBytes(ppmDataBox.Text);
bWriter.Write(messageByte, 0, messageByte.Length);
/*close writer and bWriter*/
bWriter.Close();
fileS.Close();
Then what I get is a file saved in UTF-8 format, if I open that file and re-save it as ASCII I get the PPM I expect.
However if I change the line:
byte[] messageByte = Encoding.UTF8.GetBytes(ppmDataBox.Text);
to
byte[] messageByte = Encoding.ASCII.GetBytes(ppmDataBox.Text);
Then I do get a file saved in ASCII format but the file is wrong, the colours are wrong and basically the data in the file does not match the data in the text box.
I am assuming that the textbox is in UTF-8 and the data I am pasting into it is actually ASCII format/characters and I first need to convert that ASCII into its corresponding UTF-8...(aka be the UTF-8 version of those characters). However if I'm totally honest this is my first venture into the world of encoding and I'm completely clueless. So please let me know if I'm talking rubbish.
Here is a sample of the kind of data i'm pasting into the text box:
ÿÿ ÿÿ ÿÿ ÿÿ aa aa aa ÿÿ ÿÿ ÿÿ ÿÿ ÿÿ ÿÿ ÿÿ ÿÿ ÿÿ ÿÿ ÿ
it is meant to be yellow with little black squares everywhere, but its coming out green and when the file is created in ASCII format the data ends up looking like this:
?? ?? ?? ?? aa aa aa ?? ?? ?? ??
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
ASCII 是 7 位编码(字符值 0至 127)。 ÿ 字符的值大于 127,具体值取决于使用的编码或代码页。 (在代码页 1252 中,其值为 255)。当 ASCII 编码尝试处理值大于 127 的字符时,它只会写入一个问号。
看起来您需要将高位 ASCII 字符(字符值 128 到 255)映射到单个字节。这排除了使用 UTF8,UTF32 或 UniCode 编码,因为它们的 GetBytes() 方法将返回多个字节对于大于 127 的单个字符值。
要将高位 ASCII 字符映射到单个字节,请尝试使用 1252 或 437。如果这些没有提供所需的映射,此处列出了许多其他代码页。
以下是使用代码页 1252 的示例:
ASCII is a 7-bit encoding (character values 0 thru 127). The ÿ character has a value greater than 127, the exact value depending on which encoding or code page is used. (In code page 1252 it has a value of 255). When the ASCII encoding tries to process a character with a value greater than 127, it just writes a question mark.
It looks like you need to map high ASCII characters (character values 128 thru 255) to single bytes. That rules out using the UTF8, UTF32 or UniCode encodings, since their GetBytes() methods will return multiple bytes for single character values greater than 127.
To map high ASCII characters to single bytes, try a code page like 1252 or 437. If those don't give the desired mapping, there are many other code pages listed here.
Here's an example using code page 1252: