改进对数组的单步执行两次(同一数组上的嵌套循环)
我有一大组数据想要循环访问,以确定从时间点“D1”到未来时间点“D2”的数据集的各种统计信息。基本上,我想每次值之间的差异大于 10 时添加到数据库中。例如:
Datum[] data = x;
for( Datum d1 : data ){
Datum[] tail = y; //From d1 up to 10 elements ahead
for( Datum d2 : tail ){
//Calculate difference
if( (d2.val - d1.val) > 10 ){
//Insert into database
}
}
}
我的问题是,是否有更好的算法/方法来执行此操作?由于 tail 中的 9 个元素在外循环的下一次迭代中被重用,我可以从中受益吗?我的目标是让它远小于(大 O 表示法) O(n2),但我无法理解它。通常,找到满足条件的 D1、D2 对意味着下一个 D1 元素也将有更大的匹配机会。我可以利用它来发挥我的优势吗?
我试图让它尽可能高效,因为数据集太大了。
I have a large set of data that I want to cycle through in order to determine various statistics on the data set from a point in time 'D1' to a point in time in the future 'D2'. Basically, I want to add to a database each time the difference between the values is larger than 10. For example:
Datum[] data = x;
for( Datum d1 : data ){
Datum[] tail = y; //From d1 up to 10 elements ahead
for( Datum d2 : tail ){
//Calculate difference
if( (d2.val - d1.val) > 10 ){
//Insert into database
}
}
}
My question is, is there a better algorithm/method to doing this? Since 9 elements from tail are reused in the next iteration of the outer loop, can I benefit somehow from that? My goal was to get this down to much less than (Big-O Notation) O(n2), but I can't wrap my head around it. Typically finding a D1, D2 pair that satisfies the criteria means that the next D1 element will have a greater chance of matching as well. Can I use that to my advantage?
I'm trying to get this to be as efficient as possible because the data set is just so large.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
基于索引的 for 循环可能比迭代器执行得更好,因为您可以直接索引原始数组并避免复制到新数组。你会有更好的内存局部性,更少的错误共享的机会,等等。
An index-based for loop might perform much better than an iterator, since you can index the original array directly and avoid copying to a new array. You'd have much better memory locality, less chance of false sharing, etc.
你所拥有的是一个经典的扫描线算法,它是 O(k*n) ,其中 k 是“重叠”或内部循环运行的部分。在你的情况下,无论 n 是多少,它最多为 10
what you have is a classic sweepline algorithm which are O(k*n) with k the "overlap" or the portion that the inner loop runs over. in your case it's maximum 10 no matter what n is
站在你的立场上,我要做的第一件事就是分析一个典型的数据集并找出时间的去向。这应该会给出一些关于优化工作重点的提示。
假设计算就像减法/比较一样简单,并且可以快速访问数组,那么您建议的优化保存到数据库的建议应该是下一个优先事项。例如,与单独的插入语句相比,写出文本文件并使用批量插入可以提供非常快的性能。如果您坚持使用单独的插入,并且使用 JDBC,那么批量更新将是一个很大的帮助,因为它们避免了与数据库通信的延迟。
如果这仍然不够快,请考虑将数组划分为 N 个分区,并让每个分区由单独的线程处理。如果处理受 CPU 限制,这将特别有效。
最后,寻找代码级优化,例如通过使用索引避免迭代器。如果写入数据库的项目数量与迭代的元素数量相比较少,则迭代器的创建可能会成为瓶颈。
如果元素数量大于 10 个,而且更重要的是,超过了 cpu 缓存的容量,则将扫描分解为更小的块会更有效。例如,不要扫描 data2 中的 1000 个元素,而是将其分解为(比如说)100 个元素的 10 次扫描,这 10 次扫描中的每一次都使用不同的 d1 值。这类似于以块方式实现矩阵乘法并更好地利用 CPU 缓存的方式。
尽管您使用两个循环(通常是 O(N^2) 算法),但第二个循环具有固定大小 - 10 个元素,因此这会减少到一个简单的常数因子 - 您大约会多做 10 倍的工作。
In your shoes, the first thing I would do is profile a typical dataset and find out where the time is going. This should give some hints as to where to focus your optimization efforts.
Assuming the calculation is as simple as the subtraction/comparison, and the arrays are quickly accessed, then your suggestion of looking to optimize saving to the database should be the next priority. For example, writing out a text file and using a bulk insert can give very fast performance compared to individual insert statements. If you stick to using separate inserts, and are using JDBC, then batch updates will be a great help, since they avoid the latency in communicating with the database.
If that is still not fast enough, consider partitioning the array into N partitions, and have each partition processed by a separate thread. This will be particularly effective if processing is CPU bound.
Finally, look for code-level optimizations, such as avoiding iterators by using an index. If the number of items written to the database is small compared to the number of elements iterated, then the iterator creation may be a bottleneck.
If the number of elements is larger than 10, and critically, more than can fit in the cpu cache, it will be more efficient to break up scanning into smaller blocks. For example, rather than scanning 1000 elements from data2, break it up into (say) 10 scans of 100, with each of the 10 scans using a different value of d1. This is similar to how matrix multliplication is implemented in a block fashion and makes better use of the cpu caches.
Although you are using two loops, which typically is a O(N^2) algorithm, the second loop has a fixed size - 10 elements, so this reduces to a simple constant factor - you are doing roughly a factor of 10 more work.
有一种渐近更快的方法来解决这个问题,但我严重怀疑它在实践中是否会运行得更快,因为你的窗口大小(10)太小了。如果您想增加这个大小(我将其称为 k)更大,那么您可能需要考虑选择如下方法。
当您使用此算法时,您需要维护一个包含 k 个元素的窗口,该窗口支持两种操作:
一种方法是将所有元素存储在结合平衡二叉搜索树和队列的数据结构中。队列包含所有 k 个元素,按照它们在原始序列中出现的顺序存储,这样我们就可以在需要添加新元素时记住要逐出哪个元素。平衡 BST 存储按排序顺序存储的每个元素的副本。这意味着您可以像这样实现上述操作:
总的来说,如果您需要将 n 个元素和总共 z 个对插入到数据库中,则此算法将花费 O(n log k + z) 时间。要看到这一点,请注意我们总共执行了操作 (1) 的 n 个副本,每个副本花费 O(log k) 时间。我们还执行操作 (2) 的 n 个副本,这需要 O(n log k) 时间来查找后继者,然后在所有迭代中花费 O(z) 总时间来列出所有匹配对。
与您最初发布的 O(nk) 算法相比,该算法的渐近运行时间很好。假设匹配的数量不是“非常大”(例如,nk 的数量级),当您增加 n 和 k 时,速度会快得多。
如果您存储的值是小范围内的整数(例如 0 - 10,000),您可以通过将平衡 BST 替换为针对整数优化的数据结构(例如 van Emde Boas 树,将其减少到 O(n log log k + z)。同样,这只是渐近更快,并且如果您将 k 保持为 10,这几乎肯定是不值得的。
希望这有帮助!
There is an asymptotically faster way to solve this problem, but I have serious doubts as to whether it would run faster in practice because your window size (10) is so small. If you want to increase this size - which I'll call k - to be larger, then you might want to consider opting for an approach like the following.
When you're using this algorithm, you need to maintain a window of the k elements that supports two operations:
One way to do this would be to store all of your elements in a data structure combining a balanced binary search tree and a queue. The queue contains all k elements stored in the order in which they appear in the original sequence, and is used so that we can remember which element to evict when we need to add a new element. The balanced BST stores a copy of each of those elements stored in sorted order. This means that you can implement the above operations like this:
Collectively, if you have n elements and a total of z pairs that you need to insert into the database, this algorithm will take O(n log k + z) time. To see this, note that we do a total of n copies of operation (1), which takes O(log k) time each. We also do n copies of operation (2), which takes O(n log k) time to find successors and then O(z) total time across all iterations to list all the matching pairs.
The asymptotic runtime of this algorithm is good compared to the O(nk) algorithm you've originally posted. Assuming that the number of matches isn't "really huge" (say, on the order of nk), this will be much faster as you increase n and k.
If the values you're storing are integers in a small range (say, 0 - 10,000), you can speed this up even further by replacing the balanced BST with a data structure optimized for integers, like a van Emde Boas tree, which reduces this to O(n log log k + z). Again, this is only faster asymptotically, and if you are holding k constant at 10 this is almost certainly not worth it.
Hope this helps!