Java 中什么是好的可调整大小、随机访问、高效的字节向量类?
我正在尝试找到一个在Java中存储字节向量的类,它支持:随机访问(这样我可以在任何地方获取或设置字节),调整大小(这样我可以将内容附加到末尾,或者手动更改大小),合理的效率(我可能在这些东西中存储兆字节的数据),全部在内存中(我没有文件系统)。有什么建议吗?
到目前为止,候选者是:
byte[]
。不可调整大小。java.util.Vector
。邪恶的。效率也低得令人痛苦。java.io.ByteArrayOutputStream
。不是随机访问。java.nio.ByteBuffer
。不可调整大小。org.apache.commons.collections.primitives.ArrayByteList
。不可调整大小。这很奇怪,因为如果您向其中添加内容,它会自动调整大小,但您无法显式更改大小!java.nio.channels.FileChannel
的纯 RAM 实现。找不到一个。 (记住我没有文件系统。)- Apache Commons VFS 和 RAM 文件系统。如果必须的话我会的,但我真的很喜欢重量较轻的东西......
这似乎是一个奇怪的遗漏,我确信我一定在某个地方错过了一些东西。我只是弄清楚什么。我缺少什么?
I'm trying to find a class for storing a vector of bytes in Java, which supports: random access (so I can get or set a byte anywhere), resizing (so I can either append stuff to the end or else manually change the size), reasonable efficiency (I might be storing megabytes of data in these things), all in memory (I don't have a file system). Any suggestions?
So far the candidates are:
byte[]
. Not resizable.java.util.Vector<Byte>
. Evil. Also painfully inefficient.java.io.ByteArrayOutputStream
. Not random-access.java.nio.ByteBuffer
. Not resizable.org.apache.commons.collections.primitives.ArrayByteList
. Not resizable. Which is weird, because it'll automatically resize if you add stuff to it, you just can't change the size explicitly!- pure RAM implementation of
java.nio.channels.FileChannel
. Can't find one. (Remember I don't have a file system.) - Apache Commons VFS and the RAM filesystem. If I must I will, but I really like something lighter weight...
This seems like such an odd omission that I'm sure I must have missed something somewhere. I just figure out what. What am I missing?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
我会考虑一个将大量
byte[]
数组包装为ArrayList
或Vector
元素的类。使每个块都是例如 1024 字节的倍数,这样您的访问器函数就可以采用
index >>> 。 10
访问ArrayList
的右侧元素,然后index & 0x3ff
访问该数组的特定字节元素。这将避免将每个
byte
视为Byte
对象的浪费,而代价是浪费最后一个块末尾剩余的内容。I would consider a class that wraps lots of chunks of
byte[]
arrays as elements of anArrayList
orVector
.Make each chunk a multiple of e.g. 1024 bytes, so you accessor functions can take
index >> 10
to access the right element of theArrayList
, and thenindex & 0x3ff
to access the specific byte element of that array.This will avoid the wastage of treating each
byte
as aByte
object, at the expensive of wastage of whatever's left over at the end of the last chunk.在这些情况下,我只需初始化对合理长度数组的引用,当它变得太小时,通过调用 Arrays.copyOf() 创建并将其复制到更大的数组。例如:
如果需要,我们可以将其包装在一个类中......
In those cases, I simply initialize a reference to an array of reasonable length, and when it gets too small, create and copy it to a larger one by calling Arrays.copyOf(). e.g.:
And we may wrap that in a class if needed...
ArrayList 比 Vector 更高效,因为它不是同步。
为了提高效率,您可以从适当的初始容量开始。查看构造函数:
我认为默认值只有 16,而您可能需要更大的大小。
不管看起来如何,即使有 100 万条记录,ArrayList 也非常高效! 我编写了一个小型测试程序,在没有声明初始容量的情况下向 ArrayList 添加了 100 万条记录,并且在我的 Linux 上运行了 5 年笔记本电脑(Core 2 T5200 cpu),填满整个列表只需要大约 100 毫秒。如果我声明 100 万字节的初始空间,则需要大约 60-80 毫秒,但如果我声明 10,000 个项目,则可能需要大约 130-160 毫秒,所以最好不要声明任何内容,除非您可以很好地猜测空间需要。
关于内存使用的问题,它需要大约 8 Mb 的内存,我认为这是完全合理的,除非你正在编写手机软件。
正如预期的那样,Vector 在 160-200ms 的范围内表现稍差。
示例输出,未指定起始大小并使用 ArrayList 实现。
ArrayList is more efficient than Vector because it's not synchronized.
To improve efficiency you can start with a decent initial capacity. See the constructor:
I think the default is only 16, while you probably may need much bigger size.
Despite what it seems, ArrayList is very efficient even with a million record! I wrote a small test program that adds a million record to an ArrayList, without declaring an initial capacity and on my Linux, 5 year old laptop (Core 2 T5200 cpu), it takes around 100 milliseconds only to fill the whole list. If I declare a 1 million bytes initial space it takes around 60-80 ms, but If I declare 10,000 items it can take around 130-160ms, so maybe is better to not declare anything unless you can make a really good guess of the space needed.
About the concern for memory usage, it take around 8 Mb of memory, which I consider totally reasonable, unless you're writing phone software.
As expected, Vector performs a little worse in the range of 160-200ms.
Example output, without specifying a start size and with ArrayList implementation.