高效插入长值集合
我正在为一段代码进行指标收集,并希望存储时间差的集合(类型原始 long
)以供以后分析
该集合的插入操作应该尽可能高效,以添加最少的数据结果的开销。
我首先测试了 ConcurrentLinkedQueue
集合。这给出了最差的性能(可能是由于装箱/拆箱),
我目前决定使用同步 gnu.trove.TLongArrayList
,它的数据速度几乎快了 7 倍一组 500 万个多头。
对于其他可能成为此用例基准测试的良好候选者的任何建议,我们将不胜感激。我查看了 guava API,但似乎找不到任何东西
I'm doing metrics collection for a piece of code and want to store a collection of time differences (type primitive long
) for later analysis
The insert operation for this collection should be as efficient as possible to add least overhead to the results.
I first tested out a ConcurrentLinkedQueue<Long>
collection. This gave the worst performance (probably due to boxing/unboxing)
I've currently settled on using a synchronized gnu.trove.TLongArrayList
which is almost 7 times faster for a data set of 5 million longs.
Any recommendations for other collection libraries that may be good candidates to benchmark for this use case would be gratefully acknowledged. I took a look at the guava API, but couldn't seem to find anything
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
为了提高性能,您可以采取的措施是减少数据类型的大小。如果您可以将其减少为
int
将会有所帮助。 (通常两次调用 nanoTime() 之间的差异小于 20 亿)您可以为集合设置一个好的起始大小。尤其是如果你知道你可能有多少。
如果您知道要记录的值的最大数量,则可以在未达到最大值时将
int[]
与可能的counter
一起使用。这比使用对象更快。Something you could do to improve performance is to cut the size of the data type. If you can reduce it to an
int
it would help. (often the difference between two calls to nanoTime() is less than 2 billion)You can set a good starting size for the collection. esp if you know how many you are likely to have.
If you know the maximum number of values you will record you can use
int[]
with a possiblecounter
if the maximum is not reached. This will me faster than using an Object.Trove 的新版本正在开发中(最新版本是 3.0.0-RC2)。 此页面表示 Trove 3 比 Trove 2 快 10% 到 20%
。不幸的是:
There's a new version of Trove in the pipeline (the latest is 3.0.0-RC2). This page says that Trove 3 is 10% to 20% faster that Trove 2.
Unfortunately:
您应该尝试 fastutil。根据场景,fastutil 可能比 trove4j 更快
You should try fastutil. Depends on the scenario, it is possible that fastutil is faster than trove4j
我不确定您的情况是否允许这样做,但是您是否考虑过将数据保存在每个线程的单独的、不同步的数据结构中?类似于包含 TLongArrayList 的 ThreadLocal。这将消除同步开销。
I'm not sure if your situation allows this, but did you consider saving your data in a separate, unsynchronized data structure for each thread? Something like a ThreadLocal containing a TLongArrayList. This would remove the synchronization overhead.
如果您提前知道集合的大小,则可以使用单个不同步的 long[] 数组与 AtomicInteger 计数器相结合来获取下一个插入位置。
If you know ahead of time the size of the collection, you could use a single unsynchronized
long[]
array combined with anAtomicInteger
counter to get the next insert position.