在Java中转换为面向列的数组
虽然我的标题中有 Java,但这可以适用于任何 OO 语言。 我想知道一些新想法来提高我正在尝试做的事情的性能。
我有一个不断接收 Object[] 数组的方法。我需要通过多个数组(列表或其他)拆分此数组中的对象,以便该方法接收的所有数组的每一列都有一个独立的列表。
示例:
List<List<Object>> column-oriented = new ArrayList<ArrayList<Object>>();
public void newObject(Object[] obj) {
for(int i = 0; i < obj.length; i++) {
column-oriented.get(i).add(obj[i]);
}
}
注意:为了简单起见,我省略了对象和内容的初始化。
我上面显示的代码当然很慢。我已经尝试过一些其他的事情,但想听听一些新的想法。
知道它对性能非常敏感,您将如何做到这一点?
编辑:
我测试了一些东西,发现:
我没有使用 ArrayList (或任何其他 Collection),而是将 Object[] 数组包装在另一个对象中来存储各个列。如果该数组达到其容量,我将创建另一个大小为两倍的数组,并使用 System.copyArray 将内容从一个数组复制到另一个数组。令人惊讶的是(至少对我来说)这比使用 ArrayList 存储内部列更快......
Although I have Java in the title, this could be for any OO language.
I'd like to know a few new ideas to improve the performance of something I'm trying to do.
I have a method that is constantly receiving an Object[] array. I need to split the Objects in this array through multiple arrays (List or something), so that I have an independent list for each column of all arrays the method receives.
Example:
List<List<Object>> column-oriented = new ArrayList<ArrayList<Object>>();
public void newObject(Object[] obj) {
for(int i = 0; i < obj.length; i++) {
column-oriented.get(i).add(obj[i]);
}
}
Note: For simplicity I've omitted the initialization of objects and stuff.
The code I've shown above is slow of course. I've already tried a few other things, but would like to hear some new ideas.
How would you do this knowing it's very performance sensitive?
EDIT:
I've tested a few things and found that:
Instead of using ArrayList (or any other Collection), I wrapped an Object[] array in another object to store individual columns. If this array reaches its capacity, I create another array with twice de size and copy the contents from one to another using System.copyArray. Surprisingly (at least for me) this is faster that using ArrayList to store the inner columns...
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
答案取决于数据和使用情况。您在此类集合中拥有多少数据?读/写的比例是多少(添加对象数组)?这会影响内部列表的结构更好以及许多其他可能的优化。
复制数据的最快方法是完全避免复制。如果您知道调用者代码不会进一步修改 obj 数组(这是重要条件),则可能的技巧之一是实现自定义的 List 类以用作内部列表。在内部,您将存储共享的
List
。每次调用我们只是将新数组添加到该列表中。自定义内部列表类将知道它代表哪一列(设为n
),并且当要求在位置m
处提供项目时,它将转置m< /code> 和
n
并查询内部结构以获得internalArray.get(m)[n]
。这种实现是不安全的,因为调用者的限制很容易被忘记,但在某些条件下可能会更快(但是,在其他条件下这可能会更慢)。The answer depends on the data and usage profile. How much data do you have in such collections? What is proportions of reads/writes (adding objects array)? This affects what structure for inner list is better and many other possible optimizations.
The fastest way to copy data is avoid copying at all. If you know that
obj
array is not modified further by the caller code (this is important condition), one of possible tricks is to implement you customList
class to use as inner list. Internally you will store sharedList<Object[]>
. Each call we just add new array to that list. Custom inner list class will know which column it represents (let it ben
) and when it is asked to give item at positionm
, it will transposem
andn
and query internal structure to getinternalArray.get(m)[n]
. This implementation is unsafe because of limitation on the caller that is easy to forget about but might be faster under some conditions (however, this might be slower under other).我会尝试使用 LinkedList 作为内部列表,因为它应该具有更好的插入性能。也许将 Object arra 包装到集合中并使用 addAll 也可能有所帮助。
I would try using LinkedList for the inner list, because it should have better performance for insertions. Maybe wrappping Object arra into collection and using addAll might help as well.
由于复制数组,ArrayList 可能会很慢(它使用与您自己编写的集合类似的方法)。
作为替代解决方案,您可以尝试首先简单地存储行,并在必要时创建列。这样,列表中内部数组的复制就会减少到最少。
示例:
根据列数,外部列表仍然有可能在内部复制数组,但普通表包含的行数远多于列数,因此它应该只是一个小数组。
ArrayList may be slow, due to copying of arrays (It uses a similar approach as your self-written collection).
As an alternate solution you could try to simply store the Rows at first and create columns when neccessary. This way, copying of the internal arrays at the list is reduced to a minimum.
Example:
Depending on the number of columns, it's still possible that the outer list is copying arrays internally, but normal tables contains far more rows than columns, so it should be a small array only.
使用
LinkedList
来实现列列表。它随数据线性增长,时间复杂度为 O(1)。 (如果您使用 ArrayList,则必须不时调整内部数组的大小)。收集值后,您可以将该链接列表转换为数组。如果 N 是行数,则您将从为每个列表保留 3*N 个引用(每个 LInkedList 有 prevRef/nextRef/itemRef)传递到仅 N 个引用。
如果有一个数组来保存不同的列列表,那就太好了,但是当然,这并不是一个很大的改进,只有在您提前知道列数的情况下才能做到这一点。
希望有帮助!
编辑测试和理论表明ArrayList在摊销成本方面更好,即总成本除以处理的项目数量...所以不要听从我的“建议”:)
Use a
LinkedList
for implementing the column lists. It's grows linearly with the data and is O(1). (If you use ArrayList it has to resize the internal array from time to time).After collecting the values you can convert that linked lists to arrays. If N is the number of rows you will pass from holding 3*N refs for each list (each LInkedList has prevRef/nextRef/itemRef) to only N refs.
It would be nice to have an array for holding the different column lists, but of course, it's not a big improvement and you can do it only if you know the column count in advance.
Hope it helps!
Edit tests and theory indicate that ArrayList is better in amortized cost, it is, the total cost divided by the number of items processed... so don't follow my 'advice' :)