STL 的“partial_sum”有哪些实际用途?
STL 中 partial_sum
算法的实际用途是什么?
还有哪些其他有趣/重要的示例或用例?
What/where are the practical uses of the partial_sum
algorithm in STL?
What are some other interesting/non-trivial examples or use-cases?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(11)
关于部分和需要注意的一件事是,它是撤销相邻差值的操作,就像 - 撤销 + 一样。或者如果你还记得微积分就像微分消除积分的方式,那就更好了。更好,因为相邻差本质上是微分,部分和是积分。
假设您要模拟一辆汽车,并且在每个时间步都需要知道位置、速度和加速度。您只需要存储这些值之一,因为您可以计算其他两个值。假设您存储每个时间步的位置,您可以采用位置的相邻差值来给出速度,并采用速度的相邻差值来给出加速度。或者,如果您存储加速度,则可以采用部分和来给出速度,而速度的部分和给出位置。
部分求和是大多数人不会经常出现的函数之一,但当您找到正确的情况时,它会非常有用。很像微积分。
One thing to note about partial sum is that it is the operation that undoes adjacent difference much like - undoes +. Or better yet if you remember calculus the way differentiation undoes integration. Better because adjacent difference is essentially differentiation and partial sum is integration.
Let's say you have simulation of a car and at each time step you need to know the position, velocity, and acceleration. You only need to store one of those values as you can compute the other two. Say you store the position at each time step you can take the adjacent difference of the position to give the velocity and the adjacent difference of the velocity to give the acceleration. Alternatively, if you store the acceleration you can take the partial sum to give the velocity and the partial sum of the velocity gives the position.
Partial sum is one of those functions that doesn't come up too often for most people but is enormously useful when you find the right situation. A lot like calculus.
上次我(本来)使用它是在将离散概率分布(p(X = k) 的数组)转换为累积分布(p(X <= k) 的数组)时。要从分布中选择一次,您可以从 [0-1) 中随机选择一个数字,然后对累积分布进行二分搜索。
不过,该代码不是用 C++ 编写的,所以我自己完成了部分求和。
Last time I (would have) used it is when converting a discrete probability distribution (an array of p(X = k)) into a cumulative distribution (an array of p(X <= k)). To select once from the distribution, you can pick a number from [0-1) randomly, then binary search into the cumulative distribution.
That code wasn't in C++, though, so I did the partial sum myself.
您可以使用它来生成单调递增的数字序列。例如,以下代码生成一个包含数字 1 到 42 的
向量
:这是日常用例吗?可能不是,尽管我多次发现它很有用。
您还可以使用 std::partial_sum 生成阶乘列表。 (不过,这甚至没那么有用,因为典型的整数数据类型可以表示的阶乘数量非常有限。不过,这很有趣:-D)
You can use it to generate a monotonically increasing sequence of numbers. For example, the following generates a
vector
containing the numbers 1 through 42:Is this an everyday use case? Probably not, though I've found it useful on several occasions.
You can also use
std::partial_sum
to generate a list of factorials. (This is even less useful, though, since the number of factorials that can be represented by a typical integer data type is quite limited. It is fun, though :-D)个人用例:轮盘轮选择
我在轮盘轮选择算法中使用
partial_sum
(链接文本)。该算法从容器中随机选择元素,其概率与预先给定的某个值成线性。因为我要选择的所有元素都带有不一定标准化的值,所以我使用
partial_sum
算法来构造类似“轮盘赌轮”的东西,因为我对所有元素进行了求和。然后我在这个范围内选择一个随机变量(最后一个partial_sum
是所有变量的总和),并使用stl::lower_bound
搜索我的随机搜索落地的“轮子” 。lower_bound
算法返回的元素是所选元素。除了使用
partial_sum
获得清晰且富有表现力的代码的优势之外,我在尝试 GCC 并行模式,为某些算法带来并行化版本,其中之一是partial_sum (链接文本)。我知道的另一个用途:并行处理中最重要的算法原语之一(但可能与 STL 有点距离)
如果您对使用
partial_sum< 的深度优化算法感兴趣/code> (在这种情况下,同义词“scan”或“prefix_sum”下的结果可能比并行算法社区更多。他们一直需要它。您找不到基于 quicksort 或 mergesort 而不使用它。此操作是最重要的并行原语之一。我认为它最常用于计算动态算法中的偏移量。考虑快速排序中的分区步骤,该步骤被分割并馈送到并行线程。在计算之前,您不知道分区的每个槽中的元素数量。因此,您需要为所有线程提供一些偏移量以便以后访问。
也许您会在当前热门的 GPU 处理主题中找到更多信息。一篇关于 Nvidia 的 CUDA 和 scan-primitive 的简短文章,其中包含一些应用程序示例,您可以在 < em>第 39 章. 使用 CUDA 并行前缀和(扫描) 。
Personal Use Case: Roulette-Wheel-Selection
I'm using
partial_sum
in a roulette-wheel-selection algorithm (link text). This algorithm choses randomly elements from a container with a probability which is linear to some value given beforehands.Because all my elements to choose from bringing a not-necessarily normalized value, I use the
partial_sum
algorithm for constructing something like a "roulette-wheel", because I sum up all the elements. Then I chose a random variable in this range (the lastpartial_sum
is the sum of all) and usestl::lower_bound
for searching "the wheel" where my random search landed. The element returned by thelower_bound
algorithm is the chosen one.Besides the advantage of clear and expressive code with the use of
partial_sum
, I could also gain some speed when experimenting with the GCC parallel mode which brings parallelized versions for some algorithms and one of them is the partial_sum (link text).Another use I know of: One of the most important algorithmic primitives in parallel processing (but maybe a little bit away from STL)
If you're interested in heavy optimized algorithms which are using
partial_sum
(in this case maybe more results under the synonyms "scan" or "prefix_sum"), than go to the parallel algorithms community. They need it all the time. You won't find a parallel sorting algorithm based on quicksort or mergesort without using it. This operation is one of the most important parallel primitives used. I think it is most commonly used for calculating offsets in dynamic algorithms. Think of a partition step in quicksort, which is split and fed to the parallel threads. You don't know the number of elements in each slot of the partition before calculating it. So you need some offsets for all the threads for later access.Maybe you will find more informatin in the now-hot topic of GPU processing. One short article regarding Nvidia's CUDA and the scan-primitive with a few application examples you will find in Chapter 39. Parallel Prefix Sum (Scan) with CUDA.
个人用例:CLRS 计数排序的中间步骤:
Personal Use Case: intermediate step in counting sort from CLRS:
您可以构建一个“移动总和”(移动平均线的前体):
然后用以下命令调用它:
You could build a "moving sum" (precursor to a moving average):
And then call it with:
你知道,我实际上曾经使用过 partial_sum() 一次......这是我在工作面试中被问到的一个有趣的小问题。我非常喜欢它,我回家并把它编码了。
问题是:给定一个连续的整数序列,找到具有最高值的最短子序列。例如:
我们会找到子序列 [1,4]
现在明显的解决方案是运行 3 个 for 循环,迭代所有可能的开始和结束。结束,依次将每个可能的子序列的值相加。效率低下,但编码速度快且不易出错。 (特别是当第三个 for 循环只是 accumulate(start,end,0) 时。)
正确的解决方案涉及分而治之/自下而上的方法。例如,将问题空间分成两半,并为每一半计算该部分中包含的最大子序列、包括起始数字的最大子序列、包括结束数字的最大子序列以及整个部分的子序列。有了这些数据,我们就可以将两半结合在一起,而无需对其中任何一个进行任何进一步的评估。显然,每一半的数据可以通过进一步将每一半分成两半(四分之一)、将每个四分之一分成两半(八分之一)等来计算,直到我们有简单的单例情况。这一切都非常有效。
但除此之外,我还想探索第三种(效率稍低)的选择。它类似于 3-for-loop 情况,只是我们添加相邻的数字以避免这么多工作。这个想法是,当我们可以添加 t1=a+b、t2=t1+c 和 t3=t2+d 时,就不需要添加 a+b、a+b+c 和 a+b+c+d。这是空间/计算权衡的问题。它的工作原理是转换序列:
从而为我们提供所有可能的子字符串,从索引 = 0 开始,到索引 = 0,1,2,3,4 结束。
然后我们迭代这个集合,减去连续的可能的“开始”点...
从而给我们所有可能的子序列的值(总和)。
我们可以通过max_element()找到每次迭代的最大值。
第一步最容易通过 partial_sum() 完成。
其余步骤通过 for 循环和 transform(data+i,data+size,data+i,bind2nd(minus(),data[i-1]))< /em>.
显然 O(N^2)。但还是很有趣、很好玩...
You know, I actually did use partial_sum() once... It was this interesting little problem that I was asked on a job interview. I enjoyed it so much, I went home and coded it up.
The problem was: Given a sequential sequence of integers, find the shortest sub-sequence with the highest value. E.g. Given:
We would find the subsequence [1,4]
Now the obvious solution is to run with 3 for loops, iterating over all possible starts & ends, and adding up the value of each possible subsequence in turn. Inefficient, but quick to code up and hard to make mistakes. (Especially when the third for loop is just accumulate(start,end,0).)
The correct solution involves a divide-and-conquer / bottom up approach. E.g. Divide the problem space in half, and for each half compute the largest subsequence contained within that section, the largest subsequence including the starting number, the largest subsequence including the ending number, and the entire section's subsequence. Armed with this data we can then combine the two halves together without any further evaluation of either one. Obviously the data for each half can be computed by further breaking each half into halves (quarters), each quarter into halves (eighths), and so on until we have trivial singleton cases. It's all quite efficient.
But all that aside, there's a third (somewhat less efficient) option that I wanted to explore. It's similar to the 3-for-loop case, only we add the adjacent numbers to avoid so much work. The idea is that there's no need to add a+b, a+b+c, and a+b+c+d when we can add t1=a+b, t2=t1+c, and t3=t2+d. It's a space/computation tradeoff thing. It works by transforming the sequence:
Thereby giving us all possible substrings starting at index=0 and ending at indexes=0,1,2,3,4.
Then we iterate over this set subtracting the successive possible "start" points...
Thereby giving us the values (sums) of all possible subsequences.
We can find the maximum value of each iteration via max_element().
The first step is most easily accomplished via partial_sum().
The remaining steps via a for loop and transform(data+i,data+size,data+i,bind2nd(minus<TYPE>(),data[i-1])).
Clearly O(N^2). But still interesting and fun...
部分和在并行算法中通常很有用。考虑代码
如果您想并行化此代码,您需要知道部分和。我正在使用 GNU 并行版本的partial_sum 来做一些非常类似的事情。
Partial sums are often useful in parallel algorithms. Consider the code
If you want to parallelise this code, you need to know the partial sums. I am using GNUs parallel version of partial_sum for something very similar.
我经常使用部分求和,不是求和而是根据前一个值计算序列中的当前值。
例如,如果您集成一个功能。每个新步骤都是前一个步骤,
vt += dvdt
或vt =integrat_step(dvdt, t_prev, t_prev+dt);
。I often use partial sum not to sum but to calculate the current value in the sequence depending on the previous.
For example, if you integrate a function. Each new step is a previous step,
vt += dvdt
orvt = integrate_step(dvdt, t_prev, t_prev+dt);
.在非参数贝叶斯方法中,有一个 Metropolis-Hastings 步骤(每个观察)决定对新的或现有的聚类进行采样。如果必须对现有集群进行采样,则需要使用不同的权重来完成。这些加权似然在以下示例代码中进行模拟。
请注意,这与 https://stackoverflow.com/users/13005/steve-jessop。添加它是为了提供有关特定情况的更多上下文(非参数贝叶斯方法,例如 Neal 使用 Dirichlet 过程作为先前的算法)以及结合使用
partial_sum
和lower_bound 的实际代码
。In nonparametric Bayesian methods there is a Metropolis-Hastings step (per observation) that determines to sample a new or an existing cluster. If an existing cluster has to be sampled this needs to be done with different weights. These weighted likelihoods are simulated in the following example code.
Note that this is not different from the answer by https://stackoverflow.com/users/13005/steve-jessop. It's added to give a bit more context about a particular situation (nonparametric Bayesian mehods, e.g. the algorithms by Neal using the Dirichlet process as prior) and the actual code which uses
partial_sum
in combination withlower_bound
.我用它来减少我的玩具 lambda 演算解释器中简单标记-清除垃圾收集器的内存使用量。
GC 池是大小相同的对象的数组。目标是消除未链接到其他对象的对象,并将剩余对象压缩到数组的开头。由于对象是在内存中移动的,因此每个链接都需要更新。这需要一个对象重映射表。
partial_sum
允许以压缩格式存储表(每个对象仅一位),直到扫描完成并释放内存为止。由于对象很小,因此可以显着减少内存使用。remove_if
将标记的对象压缩到池的开头。partial_sum
来生成新池中的指针/索引表。将重映射表放在刚刚释放的、仍然很热的内存中对数据缓存特别友好。
I used it to reduce memory usage of a simple mark-sweep garbage collector in my toy lambda calculus interpreter.
The GC pool is an array of objects of identical size. The goal is to eliminate objects that aren't linked to other objects, and condense the remaining objects into the beginning of the array. Since the objects are moved in memory, each link needs to be updated. This necessitates an object remapping table.
partial_sum
allows the table to be stored in compressed format (as little as one bit per object) until the sweep is complete and memory has been freed. Since the objects are small, this significantly reduces memory use.remove_if
to condense the marked objects to the beginning of the pool.partial_sum
over the Boolean values to generate a table of pointers/indexes into the new pool.It's especially friendly to the data cache to put the remap table in the just-freed, thus still hot, memory.