使用 Java 读取结构化二进制文件的最佳方法

发布于 2024-07-09 07:35:00 字数 624 浏览 5 评论 0原文

我必须使用 Java 读取旧格式的二进制文件。

简而言之，该文件有一个由多个整数、字节和固定长度字符数组组成的标头，后面跟着一个也由整数和字符组成的记录列表。

在任何其他语言中，我都会创建 struct (C/C++) 或 record (Pascal/Delphi)，它们是标头和记录的逐字节表示。然后，我将 sizeof(header) 字节读入标头变量，并对记录执行相同的操作。

像这样的事情：（Delphi）

type
  THeader = record
    Version: Integer;
    Type: Byte;
    BeginOfData: Integer;
    ID: array[0..15] of Char;
  end;

...

procedure ReadData(S: TStream);
var
  Header: THeader;
begin
  S.ReadBuffer(Header, SizeOf(THeader));
  ...
end;

用 Java 做类似事情的最佳方法是什么？我是否必须单独读取每个值，或者是否有其他方法可以进行这种“块读取”？

原文

I have to read a binary file in a legacy format with Java.

In a nutshell the file has a header consisting of several integers, bytes and fixed-length char arrays, followed by a list of records which also consist of integers and chars.

In any other language I would create structs (C/C++) or records (Pascal/Delphi) which are byte-by-byte representations of the header and the record. Then I'd read sizeof(header) bytes into a header variable and do the same for the records.

Something like this: (Delphi)

type
  THeader = record
    Version: Integer;
    Type: Byte;
    BeginOfData: Integer;
    ID: array[0..15] of Char;
  end;

...

procedure ReadData(S: TStream);
var
  Header: THeader;
begin
  S.ReadBuffer(Header, SizeOf(THeader));
  ...
end;

What is the best way to do something similar with Java? Do I have to read every single value on its own or is there any other way to do this kind of "block-read"?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

梦屿孤独相伴 2024-07-16 07:35:00

据我所知，Java 强制您以字节形式读取文件，而不是能够阻止读取。如果您序列化 Java 对象，情况就会不同。

显示的其他示例使用 DataInputStream 类一个文件，但您也可以使用快捷方式： RandomAccessFile class：

RandomAccessFile in = new RandomAccessFile("filename", "r");
int version = in.readInt();
byte type = in.readByte();
int beginOfData = in.readInt();
byte[] tempId;
in.read(tempId, 0, 16);
String id = new String(tempId);

请注意，您可以将响应对象转换为类，如果这样会更容易的话。

To my knowledge, Java forces you to read a file as bytes rather than being able to block read. If you were serializing Java objects, it'd be a different story.

The other examples shown use the DataInputStream class with a File, but you can also use a shortcut: The RandomAccessFile class:

RandomAccessFile in = new RandomAccessFile("filename", "r");
int version = in.readInt();
byte type = in.readByte();
int beginOfData = in.readInt();
byte[] tempId;
in.read(tempId, 0, 16);
String id = new String(tempId);

Note that you could turn the responce objects into a class, if that would make it easier.

回复收藏 0 原文

冷…雨湿花 2024-07-16 07:35:00

如果您要使用 Preon，那么您所要做的就是：

public class Header {
    @BoundNumber int version;
    @BoundNumber byte type;
    @BoundNumber int beginOfData;
    @BoundString(size="15") String id;
}

一旦您拥有了这个，您使用一行创建编解码器：

Codec<Header> codec = Codecs.create(Header.class);

并且您可以像这样使用编解码器：

Header header = Codecs.decode(codec, file);

If you would be using Preon, then all you would have to do is this:

public class Header {
    @BoundNumber int version;
    @BoundNumber byte type;
    @BoundNumber int beginOfData;
    @BoundString(size="15") String id;
}

Once you have this, you create Codec using a single line:

Codec<Header> codec = Codecs.create(Header.class);

And you use the Codec like this:

Header header = Codecs.decode(codec, file);

回复收藏 0 原文

揽清风入怀 2024-07-16 07:35:00

您可以按如下方式使用 DataInputStream 类：

DataInputStream in = new DataInputStream(new BufferedInputStream(
                         new FileInputStream("filename")));
int x = in.readInt();
double y = in.readDouble();

etc.

获得这些值后，您可以随意使用它们。在 API 中查找 java.io.DataInputStream 类以获取更多信息。

You could use the DataInputStream class as follows:

DataInputStream in = new DataInputStream(new BufferedInputStream(
                         new FileInputStream("filename")));
int x = in.readInt();
double y = in.readDouble();

etc.

Once you get these values you can do with them as you please. Look up the java.io.DataInputStream class in the API for more info.

回复收藏 0 原文

素罗衫 2024-07-16 07:35:00

我可能误解了你，但在我看来，你正在创建内存中的结构，你希望它能够准确地表示你想要从硬盘读取的内容，然后将整个内容复制到内存中，然后操纵那里？

如果情况确实如此，那么你正在玩一个非常危险的游戏。至少在 C 中，该标准不会强制执行结构体成员的填充或对齐等操作。更不用说诸如大/小字节序或奇偶校验位之类的事情了......所以即使您的代码碰巧运行它也是非常不可移植且有风险的 - 您依赖于编译器的创建者不会改变对未来版本的想法。

最好创建一个自动机来验证从 HD 读取的结构（每个字节）是否有效，并填充内存中的结构（如果确实没问题）。尽管您获得了平台和编译器的独立性，但您可能会损失一些毫秒（不像现代操作系统进行大量磁盘读取缓存那样看起来那么多）。另外，您的代码可以轻松移植到另一种语言。

帖子编辑：在某种程度上我同情你。在 DOS/Win3.11 的美好时光里，我曾经创建过一个 C 程序来读取 BMP 文件。并使用了完全相同的技术。一切都很好，直到我尝试为 Windows 编译它 - 哎呀！！ Int 现在是 32 位长，而不是 16 位！当我尝试在 Linux 上进行编译时，发现 gcc 的位域分配规则与 Microsoft C（6.0！）非常不同。我不得不求助于宏技巧来使其便携......

回复收藏 0 原文

埋情葬爱 2024-07-16 07:35:00

我使用了 Javolution 和 javastruct，两者都处理字节和对象之间的转换。

Javolution 提供表示 C 类型的类。您所需要做的就是编写一个描述 C 结构的类。例如，从C头文件来看，

struct Date {
    unsigned short year;
    unsigned byte month;
    unsigned byte day;
};

应该翻译成：

public static class Date extends Struct {
    public final Unsigned16 year = new Unsigned16();
    public final Unsigned8 month = new Unsigned8();
    public final Unsigned8 day   = new Unsigned8();
}

然后调用setByteBuffer来初始化对象：

Date date = new Date();
date.setByteBuffer(ByteBuffer.wrap(bytes), 0);

javastruct 使用注释来定义 C 结构中的字段。

@StructClass
public class Foo{

    @StructField(order = 0)
    public byte b;

    @StructField(order = 1)
    public int i;
}

初始化一个对象：

Foo f2 = new Foo();
JavaStruct.unpack(f2, b);

I used Javolution and javastruct, both handles the conversion between bytes and objects.

Javolution provides classes that represent C types. All you need to do is to write a class that describes the C structure. For example, from the C header file,

struct Date {
    unsigned short year;
    unsigned byte month;
    unsigned byte day;
};

should be translated into:

public static class Date extends Struct {
    public final Unsigned16 year = new Unsigned16();
    public final Unsigned8 month = new Unsigned8();
    public final Unsigned8 day   = new Unsigned8();
}

Then call setByteBuffer to initialize the object:

Date date = new Date();
date.setByteBuffer(ByteBuffer.wrap(bytes), 0);

javastruct uses annotation to define fields in a C structure.

@StructClass
public class Foo{

    @StructField(order = 0)
    public byte b;

    @StructField(order = 1)
    public int i;
}

To initialize an object:

Foo f2 = new Foo();
JavaStruct.unpack(f2, b);

回复收藏 0 原文

梦忆晨望 2024-07-16 07:35:00

我猜 FileInputStream 可以让您以字节为单位读取。因此，使用 FileInputStream 打开文件并读取 sizeof(header)。我假设标头具有固定的格式和大小。我没有看到在最初的帖子中提到的，但假设是这种情况，因为如果标头有可选的参数和不同的大小，它会变得更加复杂。

一旦获得信息，就可以有一个标头类，您可以在其中分配已读取的缓冲区的内容。然后以类似的方式解析记录。

回复收藏 0 原文

旧情勿念 2024-07-16 07:35:00

读取字节的链接

这是使用 ByteBuffer (Java NIO) http://exampledepot 。 com/egs/java.nio/ReadChannel.html

回复收藏 0 原文

自演自醉 2024-07-16 07:35:00

正如其他人提到的，DataInputStream 和 Buffers 可能是您在 java 中处理二进制数据所需的低级 API。

但是，您可能想要类似 Construct 的内容（wiki 页面也有很好的示例：http://en.wikipedia.org/wiki/Construct_(python_library)，但适用于 Java。

我不知道任何（Java 版本），但采用这种方法（在代码中声明性地指定结构）可能是正确的方法。如果 Java 中有一个合适的 Fluent 接口，它可能与 DSL 非常相似。

编辑：一点谷歌搜索揭示了这一点：

http://javolution.org/api/javolution/ io/Struct.html

这可能就是您正在寻找的东西。我不知道它是否有效或有什么好处，但它看起来是一个明智的起点。

回复收藏 0 原文

蓝梦月影 2024-07-16 07:35:00

我将创建一个围绕 ByteBuffer 的对象数据的表示并提供 getter 来直接从缓冲区读取。通过这种方式，您可以避免将数据从缓冲区复制到原始类型。此外，您可以使用 MappedByteBuffer 获取字节缓冲区。如果您的二进制数据很复杂，您可以使用类对其进行建模，并为每个类提供缓冲区的切片版本。

class SomeHeader {
    private final ByteBuffer buf;
    SomeHeader( ByteBuffer fileBuffer){
       // you may need to set limits accordingly before
       // fileBuffer.limit(...)
       this.buf = fileBuffer.slice();
       // you may need to skip the sliced region
       // fileBuffer.position(endPos)
    }
    public short getVersion(){
        return buf.getShort(POSITION_OF_VERSION_IN_BUFFER);
    }
}

同样有用的是从字节缓冲区读取无符号值的方法。

华泰

I would create an object that wraps around a ByteBuffer representation of the data and provide getters to read directly from the buffer. In this way, you avoid copying data from the buffer to primitive types. Furthermore, you could use a MappedByteBuffer to get the byte buffer. If your binary data is complex, you can model it using classes and give each class a sliced version of your buffer.

class SomeHeader {
    private final ByteBuffer buf;
    SomeHeader( ByteBuffer fileBuffer){
       // you may need to set limits accordingly before
       // fileBuffer.limit(...)
       this.buf = fileBuffer.slice();
       // you may need to skip the sliced region
       // fileBuffer.position(endPos)
    }
    public short getVersion(){
        return buf.getShort(POSITION_OF_VERSION_IN_BUFFER);
    }
}

Also useful are the methods for reading unsigned values from byte buffers.

HTH

回复收藏 0 原文