从 .net 编写的文件中读取字符

发布于 2024-12-03 19:45:41 字数 1103 浏览 0 评论 0原文

我正在尝试使用 java 从使用 .net binaryWriter 编写的文件中读取字符串。

我认为问题是因为 .net 二进制编写器对其字符串使用某种 7 位格式。通过在线研究，我发现了这段代码，其功能应该类似于二进制阅读器的 readString() 方法。这是在我的 CSDataInputStream 类中，它扩展了 DataInputStream。

public String readStringCS()  throws IOException {
    int stringLength = 0;
    boolean stringLengthParsed = false;
    int step = 0;
    while(!stringLengthParsed) {
        byte part = readByte();
        stringLengthParsed = (((int)part >> 7) == 0);
        int partCutter = part & 127;
        part = (byte)partCutter;
        int toAdd = (int)part << (step*7);
        stringLength += toAdd;
        step++;
    }
    char[] chars = new char[stringLength];
    for(int i = 0; i < stringLength; i++) {
        chars[i] = readChar();
    }
    return new String(chars);
}

第一部分似乎正在工作，因为它返回了正确数量的字符 (7)。但当它读取到这些字符时，它们都是中文！我很确定问题出在 DataInputStream.readChar() 上，但我不知道为什么它不起作用...我什至尝试使用

Character.reverseBytes(readChar());

读取 char 来查看是否有效，但它只会返回不同的结果汉字。

也许我需要模仿.net 的读取字符的方式？我该怎么做呢？

我还缺少其他东西吗？

谢谢。

原文

I'm trying to use java to read a string from a file that was written with a .net binaryWriter.

I think the problem is because the .net binary writer uses some 7 bit format for it's strings. By researching online, I came across this code that is supposed to function like the binary reader's readString() method. This is in my CSDataInputStream class that extends DataInputStream.

public String readStringCS()  throws IOException {
    int stringLength = 0;
    boolean stringLengthParsed = false;
    int step = 0;
    while(!stringLengthParsed) {
        byte part = readByte();
        stringLengthParsed = (((int)part >> 7) == 0);
        int partCutter = part & 127;
        part = (byte)partCutter;
        int toAdd = (int)part << (step*7);
        stringLength += toAdd;
        step++;
    }
    char[] chars = new char[stringLength];
    for(int i = 0; i < stringLength; i++) {
        chars[i] = readChar();
    }
    return new String(chars);
}

The first part seems to be working as it is returning the correct amount of characters (7). But when it reads the characters they are all Chinese! I'm pretty sure the problem is with DataInputStream.readChar() but I have no idea why it isn't working... I have even tried using

Character.reverseBytes(readChar());

to read the char to see if that would work, but it would just return different Chinese characters.

Maybe I need to emulate .net's way of reading chars? How would I go about doing that?

Is there something else I'm missing?

Thanks.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

浅浅淡淡 2024-12-10 19:45:42

好的，所以您已经通过声音正确解析了长度 - 但您随后将其视为字符的长度。据我从文档中可以看出，它的长度以字节为单位。

因此，您应该将数据读入正确长度的 byte[] 中，然后使用：

return new String(bytes, encoding);

其中 encoding 是基于 .NET 编写的适当编码。它将默认为 UTF-8，但可以指定为其他内容。

顺便说一句，我个人不会扩展DataInputStream - 我会组合它，即让您的类型或方法采用< /em> 一个 DataInputStream （或者可能只是采用 InputStream 并将其包装在 DataInputStream 中）。一般来说，根据我的经验，如果您更喜欢组合而不是继承，它可以使代码更清晰且更易于维护。

Okay, so you've parsed the length correctly by the sounds of it - but you're then treating it as the length in characters. As far as I can tell from the documentation it's the length in bytes.

So you should read the data into a byte[] of the right length, and then use:

return new String(bytes, encoding);

where encoding is the appropriate coding based on whatever was written from .NET... it will default to UTF-8, but it can be specified as something else.

As an aside, I personally wouldn't extend DataInputStream - I would compose it instead, i.e. make your type or method take a DataInputStream (or perhaps just take InputStream and wrap that in a DataInputStream). In general, if you favour composition over inheritance it can make code clearer and easier to maintain, in my experience.

回复收藏 0 原文

~没有更多了~