从 .net 编写的文件中读取字符
我正在尝试使用 java 从使用 .net binaryWriter 编写的文件中读取字符串。
我认为问题是因为 .net 二进制编写器对其字符串使用某种 7 位格式。通过在线研究,我发现了这段代码,其功能应该类似于二进制阅读器的 readString() 方法。这是在我的 CSDataInputStream 类中,它扩展了 DataInputStream。
public String readStringCS() throws IOException {
int stringLength = 0;
boolean stringLengthParsed = false;
int step = 0;
while(!stringLengthParsed) {
byte part = readByte();
stringLengthParsed = (((int)part >> 7) == 0);
int partCutter = part & 127;
part = (byte)partCutter;
int toAdd = (int)part << (step*7);
stringLength += toAdd;
step++;
}
char[] chars = new char[stringLength];
for(int i = 0; i < stringLength; i++) {
chars[i] = readChar();
}
return new String(chars);
}
第一部分似乎正在工作,因为它返回了正确数量的字符 (7)。但当它读取到这些字符时,它们都是中文!我很确定问题出在 DataInputStream.readChar() 上,但我不知道为什么它不起作用...我什至尝试使用
Character.reverseBytes(readChar());
读取 char 来查看是否有效,但它只会返回不同的结果汉字。
也许我需要模仿.net 的读取字符的方式?我该怎么做呢?
我还缺少其他东西吗?
谢谢。
I'm trying to use java to read a string from a file that was written with a .net binaryWriter.
I think the problem is because the .net binary writer uses some 7 bit format for it's strings. By researching online, I came across this code that is supposed to function like the binary reader's readString() method. This is in my CSDataInputStream class that extends DataInputStream.
public String readStringCS() throws IOException {
int stringLength = 0;
boolean stringLengthParsed = false;
int step = 0;
while(!stringLengthParsed) {
byte part = readByte();
stringLengthParsed = (((int)part >> 7) == 0);
int partCutter = part & 127;
part = (byte)partCutter;
int toAdd = (int)part << (step*7);
stringLength += toAdd;
step++;
}
char[] chars = new char[stringLength];
for(int i = 0; i < stringLength; i++) {
chars[i] = readChar();
}
return new String(chars);
}
The first part seems to be working as it is returning the correct amount of characters (7). But when it reads the characters they are all Chinese! I'm pretty sure the problem is with DataInputStream.readChar() but I have no idea why it isn't working... I have even tried using
Character.reverseBytes(readChar());
to read the char to see if that would work, but it would just return different Chinese characters.
Maybe I need to emulate .net's way of reading chars? How would I go about doing that?
Is there something else I'm missing?
Thanks.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
好的,所以您已经通过声音正确解析了长度 - 但您随后将其视为字符的长度。据我从文档中可以看出,它的长度以字节为单位。
因此,您应该将数据读入正确长度的
byte[]
中,然后使用:其中
encoding
是基于 .NET 编写的适当编码。它将默认为 UTF-8,但可以指定为其他内容。顺便说一句,我个人不会扩展
DataInputStream
- 我会组合它,即让您的类型或方法采用< /em> 一个DataInputStream
(或者可能只是采用InputStream
并将其包装在DataInputStream
中)。一般来说,根据我的经验,如果您更喜欢组合而不是继承,它可以使代码更清晰且更易于维护。Okay, so you've parsed the length correctly by the sounds of it - but you're then treating it as the length in characters. As far as I can tell from the documentation it's the length in bytes.
So you should read the data into a
byte[]
of the right length, and then use:where
encoding
is the appropriate coding based on whatever was written from .NET... it will default to UTF-8, but it can be specified as something else.As an aside, I personally wouldn't extend
DataInputStream
- I would compose it instead, i.e. make your type or method take aDataInputStream
(or perhaps just takeInputStream
and wrap that in aDataInputStream
). In general, if you favour composition over inheritance it can make code clearer and easier to maintain, in my experience.