有趣的是,这可能是堆栈溢出问题

发布于 2024-07-18 17:25:25 字数 799 浏览 12 评论 0 原文

以下过程(解释如下)适用于非常小的列表,但是当列表包含大量项目(1/2 百万)时,应用程序进入“无响应”状态,并且需要大约 2.5 分钟才能完成(非常糟糕)时间)。 我可能会添加应用程序需要处理 1 亿个项目的列表 至少(最终)。

这是有问题的过程的代码:

    public void removeItems(List<long> L, SortedList<long, List<long>> _subLists)
    {
        foreach (KeyValuePair<long, List<long>> kvp in _subLists)
        {
            foreach (long duplicate in kvp.Value)
            {
                int j = L.IndexOf(duplicate);
                L.RemoveRange(j,(int)kvp.Key); 

            }
        }
    }

L 是长值列表。 _subLists 是一个排序列表,其中每个值都是一个列表 L 中的值,开始一些差异(不相关)的算术级数。 与该值关联的键是值包含的系列的长度。

示例:

L = {1,2,3,5,6,7,18,20,21} _subLists = {2,<20>} {3,<1,5>}

该过程只是从 L 中删除算术级数级数。

The following procedure (explanation follows) works fine for really small lists, but when the list contains a larger number of items (1/2 million) the application enters "not responding" state,and it takes about 2.5 minutes to finish (very bad time).
I might add the application needs to process lists of 100 million items
at least (eventually).

here is the code for the problematic procedure:

    public void removeItems(List<long> L, SortedList<long, List<long>> _subLists)
    {
        foreach (KeyValuePair<long, List<long>> kvp in _subLists)
        {
            foreach (long duplicate in kvp.Value)
            {
                int j = L.IndexOf(duplicate);
                L.RemoveRange(j,(int)kvp.Key); 

            }
        }
    }

L is a list of long values.
_subLists is a sorted list where each value is a list of
values from L,starting an arithmetic progression series of some difference (not relevant).
the key associated with that value is the length of the series the values contain.

Example:

L = {1,2,3,5,6,7,18,20,21}
_subLists = {2,<20>}
{3,<1,5>}

The procedure simply removes the arithmetic progression series from L.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

九公里浅绿 2024-07-25 17:25:25

此过程的运行时间(以大 O 表示法表示)为 n^2,这相当慢,如果其中一个列表有 1 亿个条目,则运行时间可能会很慢。 这里不存在堆栈溢出问题,只是迭代这么多数据很慢。 我真的没有在这里看到一个问题,你想让这个更快吗? 如果是这样,那么嵌套的 for 循环肯定是问题所在。

The run time of this procedure in big O notation would be n^2, which is fairly slow and you can expect a slow run time if one of the lists has 100 million entries. There is no stack overflow problem here, it's simply slow to iterate through this much data. I don't really see a question here, are you looking to make this faster? If so, the nested for loop is definitely the problem.

清旖 2024-07-25 17:25:25

你的问题是你要从 L 中删除很多项目,这是一个非常昂贵的操作。 每次删除项目时,都会复制内存以将已删除项目上方的所有项目向下移动。 删除的项目越多,洗牌的项目越多,所需的时间就越长。 内存是性能的瓶颈,RAM 的运行速度比 CPU 慢,如果您分页到磁盘,速度会非常慢。

你如何改善这一点。

最简单的选择是使用 L 的容器,它在删除项目时具有更好的性能 - 例如 LinkedList。 当删除元素时,LinkedList 不需要在内存中移动项目,但它们确实需要更多内存来存储数据(每个值两个指针)。 如果开销太大,则可能使用 LinkedList > 来代替,其中每个 List 保存最大数量的值。

或者,更改删除算法,以便迭代列表 L 并创建一个包含 _subLists 中未找到的值的新列表。 您可以更改 _subLists 存储数据的方式,以便更快地查找范围内的项目。

Your problem is that you are removing a lot of items from L which is a very costly operation. Every time an item is removed, memory is copied to move all the items above the deleted items down. The more items that are removed and the more items to shuffle down, the longer it takes. Memory is a bottleneck to performance, RAM runs slower than the CPU, and if you're paging to disk than it's really slow.

How can you improve this.

The easiest option is to use a container for L that has better performance when removing items - a LinkedList for example. LinkedLists do not need to move items around in memory when elements are removed but they do require more memory to store the data (two pointers per value). If this is too much overhead, then perhaps a LinkedList <List <long>> instead where each List <long> holds a maximum number of values.

Alternatively, change the removal algorithm so that you iterate over the list L and create a new list containing the values not found in the _subLists. You can change the way _subLists stores data to make finding items in ranges quicker.

剧终人散尽 2024-07-25 17:25:25

如果可能:

A) 将 L 转换为排序链表。 O: n * log(n)

B) 将子列表转换为排序列表对,其中第一项是 L 中序列中的 #(在发布的代码片段中重复),第二项是序列的长度。 O: n * log (n)

C) 使用子列表执行一次 L 遍历,以确定在 L 中的给定位置要删除多少个元素。利用两个列表都已排序的事实,以便在任一列表中都不会回溯。 O: n

如果可以使用, 应该能够获得 O: n * log(n) 的复杂度。
当然,我并不是 100% 确定问题的所有细节。 例如 - L 可以有重复项吗? 如果是这样,子列表的顺序重要吗? 根据这些问题的答案,您可能被迫放弃或修改此类算法。 此外,这显然会使用更多的内存。

If possible:

A) Convert L into a sorted linked list. O: n * log(n)

B) Convert sublists into a sorted list pairs where the first item is the # in the sequence in L (duplicate in the posted code snippet) and the second item is the length of the sequence. O: n * log (n)

C) Perform a single pass through L using sublists to determine how many elements to remove at a given spot in L. Take advantage of the fact that both lists are sorted to not backtrack in either list. O: n

Should be able to get O: n * log(n) complexity out of this if it is possible to use.
Of course, I'm not 100% sure about all of the details of the problem. For instance - can L have duplicates? If so, does the order of the sublists matter? You may be forced to ditch or modify such an algorithm depending on the answers to those ?s. Also, this will obviously use more memory.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文