ByteBuffer 的 HashSet(实际上是整数)来分隔唯一的 & ByteBuffer 数组中的非唯一元素
我有一个 ByteBuffer 数组(实际上代表整数)。我想要单独的 unique &数组中非唯一的 ByteBuffer(即整数)。因此我使用这种类型的 HashSet:HashSet
只是想知道 HashSet
是否是一个好方法?当我为 ByteBuffer
这样做时,我是否会付出更多的成本,而不是为 Integer
这样做?
(实际上,我正在从数据库读取序列化数据,需要在此操作后写回,因此我想避免字节缓冲区到整数之间的序列化和反序列化!)
您对此的想法表示赞赏。
I have an array of ByteBuffer
s(which actually represent integers). I want to the separate unique & non unique ByteBuffers (i.e integers) in the array. Thus I am using HashSet of this type:HashSet<ByteBuffer> columnsSet = new HashSet<ByteBuffer>()
Just wanted to know if HashSet
is a good way to do so? And do I pay more costs, when doing so for a ByteBuffer
then if I would have done it for a Integer
?
(Actually I am reading serialized data from DB which needs to be written back after this operation thus I want to avoid serialization and deserialization between bytebuffer to Integer and back!)
Your thoughts upon this appreciated.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
创建 ByteBuffer 比从重用的 ByteBuffer 中读取/写入要昂贵得多。
存储整数最有效的方法是使用
int
类型。如果您想要一组这些,您可以使用使用 int 原语的 TIntHashSet。您可以使用 O(1) 预分配对象执行多次读取/反序列化/存储和反向操作。Creating a ByteBuffer is far more expensive than reading/writing from a reused ByteBuffer.
The most efficient way to store integers is to use
int
type. If you want a Set of these you can use TIntHashSet which uses int primitives. You can do multiple read/deserialize/store and reverse with O(1) pre-allocated objects.首先,它会起作用。两个 ByteBuffer 上的 equals() 开销肯定会更高,但可能不足以抵消不必反序列化的好处(不过,我并不完全认为确定这是否会是一个大问题)。
我很确定性能将渐近相同,但更节省内存的解决方案是对数组进行排序,然后线性地遍历它并测试连续元素的相等性。
举个例子,假设您的缓冲区包含以下内容:
对它进行排序:
开始迭代后,您会得到 ar[0].equals(ar[1]) 并且您知道这些是重复项。继续这样直到
n-1
。First of all, it will work. The overhead of
equals()
on twoByteBuffer
s will definitely be higher, but perhaps not enough to offset the benefits of not having to deserialize (though, I'm not entirely sure if that would be such a big problem).I'm pretty sure that the performance will asymptotically be the same, but a more memory-efficient solution is to sort your array, then step through it linearly and test successive elements for equality.
An example, suppose your buffers contain the following:
Sort it:
Once you start iterating, you get
ar[0].equals(ar[1])
and you know these are duplicates. Just keep going like that tilln-1
.集合通常使用
equals()
和hashCode()
方法进行操作,因此性能影响将通过存储在集合中的对象的实现来实现。查看
ByteBuffer
和Integer
可以发现,Integer
中这些方法的实现更简单(仅对equals( 进行一次 int 比较) )
和返回值;
对于hashCode()
)。因此,您可以说Set
的成本比Set
更高。但是,我现在无法告诉您此成本是否高于序列化和反序列化成本。
事实上,除非您确实遇到性能问题,否则我只会选择更具可读性的代码。在这种情况下,我会尝试两种方法并采用更快的方法。
Collections normally operate on the
equals()
andhashCode()
methods, so performance implications would come through the implementation of the objects stored in the collection.Looking at
ByteBuffer
andInteger
one can see that the implementation of those methods inInteger
are simpler (just one int comparison forequals()
andreturn value;
forhashCode()
). Thus you could say theSet<ByteBuffer>
has higher cost than aSet<Integer>
.However, I can't tell you right now if this cost is higher than the serialization and deserialization cost.
In fact, I'd just go for the more readable code unless you really have a performance problem. In that case I'd just try both methods and take the faster one.