为什么 BinaryWriter 在流的开头添加乱码?你如何避免它?
我正在调试将对象的一部分写入文件的一些问题,并且我已经了解了打开文件并在其中写入“TEST”的基本情况。我是通过以下方式做到这一点的:
static FileStream fs;
static BinaryWriter w;
fs = new FileStream(filename, FileMode.Create);
w = new BinaryWriter(fs);
w.Write("test");
w.Close();
fs.Close();
不幸的是,这最终会在文件的前面添加一个框,看起来像这样:
TEST,前面有一个有趣的框。为什么会这样,我该如何避免呢?
编辑:这里似乎没有显示该框,但它是看起来像乱码的 unicode 字符。
I'm debugging some issues with writing pieces of an object to a file and I've gotten down to the base case of just opening the file and writing "TEST" in it. I'm doing this by something like:
static FileStream fs;
static BinaryWriter w;
fs = new FileStream(filename, FileMode.Create);
w = new BinaryWriter(fs);
w.Write("test");
w.Close();
fs.Close();
Unfortunately, this ends up prepending a box to the front of the file and it looks like so:
TEST, with a fun box on the front. Why is this, and how can I avoid it?
Edit: It does not seem to be displaying the box here, but it's the unicode character that looks like gibberish.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(9)
听起来像字节顺序标记。
http://en.wikipedia.org/wiki/Byte-order_mark
也许你想要将字符串写入 UTF-8。
Sounds like byte order marks.
http://en.wikipedia.org/wiki/Byte-order_mark
Perhaps you want to write the string as UTF-8.
根据 MSDN:
如果您想从该点读回字符串,您将需要该长度前缀。请参阅
BinaryReader.ReadString()
。附加
因为看起来您实际上想要一个文件头检查器
这是一个问题吗?您读回长度前缀,以便对文件进行类型检查,它工作正常
您可以将字符串转换为 byte[] 数组,可能使用 Encoding.ASCII。但是,您必须使用固定(隐含)长度或...自己添加前缀。读取 byte[] 后,您可以再次将其转换为字符串。
如果您有大量文本要写入,您甚至可以将 TextWriter 附加到同一个流。但要小心,作家们想关闭他们的直播。一般来说我不会建议这样做,但了解一下还是有好处的。在这里,您也必须标记其他读者可以接管的点(固定标题可以正常工作)。
They are not byte-order marks but a length-prefix, according to MSDN:
And you will need that length-prefix if you ever want to read the string back from that point. See
BinaryReader.ReadString()
.Additional
Since it seems you actually want a File-Header checker
Is it a problem? You read the length-prefix back so as a type-check on the File it works OK
You can convert the string to a byte[] array, probably using Encoding.ASCII. But hen you have to either use a fixed (implied) length or... prefix it yourself. After reading the byte[] you can convert it to a string again.
If you had a lot of text to write you could even attach a TextWriter to the same stream. But be careful, the Writers want to close their streams. I wouldn't advice this in general, but it is good to know. Here too you will have to mark a Point where the other reader can take over (fixed header works OK).
这是因为 BinaryWriter 正在写入字符串的二进制表示形式,包括字符串的长度。如果您要写入直接数据(例如 byte[] 等),它将不包括该长度。
您会注意到它不包括长度。如果您要使用二进制写入器写入文本数据,则需要首先对其进行转换。
That's because a BinaryWriter is writing the binary representation of the string, including the length of the string. If you were to write straight data (e.g. byte[], etc.) it won't include that length.
You'll notice that it doesn't include the length. If you're going to be writing textual data using the binary writer, you'll need to convert it first.
开头的字节是字符串的长度,它被写为可变长度整数。
如果字符串不超过 127 个字符,则长度将存储为 1 个字节。当字符串达到 128 个字符时,长度会写为 2,并且在某些长度下也会移动到 3 和 4。
这里的问题是您正在使用 BinaryWriter,它写出 BinaryReader 可以稍后读回的数据。如果您希望以自己的自定义格式编写,则必须放弃这样的字符串编写,或者完全放弃使用 BinaryWriter。
The byte at the start is the length of the string, it's written out as a variable-length integer.
If the string is 127 characters or less, the length will be stored as one byte. When the string hits 128 characters, the length is written out as 2, and it will move to 3 and 4 at some lengths as well.
The problem here is that you're using BinaryWriter, which writes out data that BinaryReader can read back in later. If you wish to write out in a custom format of your own, you must either drop writing strings like that, or drop using BinaryWriter altogether.
正如亨克在 这个答案,这是字符串的长度(作为 32 位 int)。
如果您不希望这样,您可以通过将每个字母的 ASCII 字符写为字节来手动编写“TEST”,或者您可以使用:
并写入结果数组(不包含 length int)
As Henk pointed out in this answer, this is the length of the string (as a 32-bit int).
If you don't want this, you can either write "TEST" manually by writing the ASCII characters for each letter as bytes, or you could use:
And write the resulting array (which will NOT contain a length int)
你看到的实际上是一个7位编码的整数,这是一种整数压缩.
BinaryWriter 在文本前面添加此内容,以便读者(即 BinaryReader)知道写入的字符串有多长。
您可以在 < a href="http://dpatrickcaldwell.blogspot.se/2011/09/7-bit-encoding-with-binarywriter-in-net.html" rel="nofollow">http://dpatrickcaldwell.blogspot.se/ 2011/09/7-bit-encoding-with-binarywriter-in-net.html。
What you're seeing is actually a 7 bit encoded integer, which is a kind of integer compression.
The BinaryWriter prepend the text with this so readers (i.e. BinaryReader) will know how long the written string is.
You can read more about the implementation details of this at http://dpatrickcaldwell.blogspot.se/2011/09/7-bit-encoding-with-binarywriter-in-net.html.
您可以将其保存为 UTF8 编码的字节数组,如下所示:
You can save it as a UTF8 encoded byte array like this:
这很可能是字节顺序标记。这是因为流的编码设置为 Unicode。
That's a byte order mark, most likely. It's because the stream's encoding is set to Unicode.
请记住,Java 字符串在内部以 UTF-16 编码。
因此,“测试”实际上是由字节 0xff、0xfe(一起字节顺序标记)、0x74、0x00、0x65、0x00、0x73、0x00、0x74、0x00 组成。
您可能想使用字节而不是字符流。
Remember that Java strings are internally encoded in UTF-16.
So, "test" is actually made of the bytes 0xff, 0xfe (together the byte order mark), 0x74, 0x00, 0x65, 0x00, 0x73, 0x00, 0x74, 0x00.
You probably want to work with bytes instead of streams of characters.