递归填充动态大小向量
也许让我先用伪 C++ 代码陈述我的情况:
std:vector<double> sample(someFunctor f, double lower, double upper) {
double t = (lower + upper)/2;
double newval = f(t);
if (f(upper) - newval > epsilon)
subsample1 = sample(f, t, upper);
if (newval - f(lower) > epsilon)
subsample2 = sample(f, lower, t);
return concat(subsample2, newval, subsample1);
}
其中 concat 只是连接返回的向量。基本上,我对函数进行采样的方式使得两个保存的函数值之间只有很小的差异。
我对上述方式不满意,因为在每个递归步骤中似乎都有相当多的内存分配(分配两个子向量,然后连接这些子向量和另一个元素)。该代码必须在我的算法中对性能至关重要的部分中运行。一旦 upper - lower
相当小,计算 f
就不会花费大量时间。
所以我的问题是:
您是否看到了一种巧妙的方法,可以在所有递归调用中使用相同的数据结构并仅填充该向量的当前部分? (请记住,预先未知所需的功能评估数量)。对此的思考:
- 使用列表而不是向量。但我觉得内存检修不足以仅存储双精度数。
- 在向量中保留漏洞并维护另一个向量,说明哪些条目已填充。递归调用结束时会移动条目,以便
subsample
和newval
之间没有空洞。但现在我通过对第二个向量进行额外的工作来切换复制 - 可能是个坏主意。
你有没有找到一种完全摆脱递归的方法?然而,为了正确性,使用上述分而治之的模式很重要。函数
f
大量使用了上限和下限,并由此获得了相当大的性能。
谢谢你的想法。
根据 Space_C0wb0y 的要求,让我尝试重新表述我的问题。可能第一个解释不是很清楚。
我有一些函数(在数学意义上)我想在给定的时间间隔内采样(例如在某些点进行评估)。
假设区间为[0,100]。我知道 0 和 100 时的函数值。也许那就是 f(0)=0
和 f(100) = 40
。 现在,我以间隔中点(即 50)评估函数。假设我的函数返回 f(50)=10
。 由于 f(0)-f(50) <= 10
,我不需要区间 [0,50] 中的更多样本。但是,我需要对区间 [50,100] 进行进一步计算。因此,在下一步(递归)中,我评估 f(75)
。现在递归地重复上述逻辑。
最后,我想(两个)向量为我提供带有相应参数的函数值,如下所示:
parameter = vector(0, 50, 56.25, 62.5, 75, 100)
value = vector(0, 10, 17.21, 25 34, 40)
我正在寻找最好的(也是性能最高的)方法来递归地构建这些向量。
希望这能澄清事情。
Maybe let me state my situation in pseudo-C++-code first:
std:vector<double> sample(someFunctor f, double lower, double upper) {
double t = (lower + upper)/2;
double newval = f(t);
if (f(upper) - newval > epsilon)
subsample1 = sample(f, t, upper);
if (newval - f(lower) > epsilon)
subsample2 = sample(f, lower, t);
return concat(subsample2, newval, subsample1);
}
where concat just, well, concats the returned vectors. Basically, I am sampling a function in a way such that between two saved function values there is only a small difference.
I am not happy with the way stated above as there seems to be quite a bit of memory allocation (allocate two subvectors, then concat those and another element) in every recursive step. That piece of code will have to run in a part of my algorithm that is critical with respect to performance. Once upper - lower
is fairly small, evaluating f
will not take a great amount of time.
So my questions:
Do you see a clever way to use the same data structure in all recursive calls and just fill the current parts of that vector? (Keeping in mind that the number of function evaluations needed is not known upfront). Thoughts on this:
- Using a list instead of a vector. But I feel the memory overhaul is not adequate for just storing doubles.
- Keep holes in the vector and maintain another vector saying which entries are filled. A the end of a recursive call shift the entries so that there are no holes between the
subsample
s andnewval
. But now I switch copying by shifting with additional work for the second vector - probably bad idea.
Do you see a way to get rid of the recursion totally? However, for correctness it is important that I use the divide-and-conquer pattern stated above. The function
f
makes heavy use of the upper and lower bounds and gains quite a big deal of performance by this.
Thank's for your thoughts.
As per Space_C0wb0y's request let me try to rephrase my problem. Maybe the first explanation was not very clear.
I have some function (in a mathematical sense) that I want to sample (e.g. to evaluate at certain points) in a given interval.
Suppose that interval is [0,100]. I know the functions values at 0 and at 100. Maybe that is f(0)=0
and f(100) = 40
.
Now I evaluate the function at the intervals midpoint, which is 50. Say, my function returns f(50)=10
.
As f(0)-f(50) <= 10
, I do not need further samples in the interval [0,50]. However, I need further computations for the interval [50,100]. Thus, in the next (recursive) step I evaluate f(75)
. Now repeat the above logic recursively.
At the end I would like to (two) vectors giving me the function values with the corresponding parameters like this:
parameter = vector(0, 50, 56.25, 62.5, 75, 100)
value = vector(0, 10, 17.21, 25 34, 40)
I am looking for the best (as is most performant) approach to build these vectors recursively.
Hope this clarifies things.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
由于空间不是您主要关心的问题,所以我将继续使用递归。
1.使用按引用复制而不是按(返回)值复制。
2.不需要传入函子,因为它是常量。
3.如果
low
和high
是整数,速度可能会更快。但这取决于要求。Since space is not your major concern so I'll go on using the recursion.
1. Use copy by reference instead of copy by (return) value.
2. No need to pass in functor as it's constant.
3. It could have been faster if
low
andhigh
are integers. It's depends on requirements though.我建议采用以下方法:
不要使用两个向量,而是使用一个带有对的向量或自定义
struct
来表示参数和值:更改您的算法以写入结果对输出迭代器的评估:
上面是对伪代码的修改,显示了使用输出迭代器时的样子。它没有经过测试,所以我不确定它是否正确。原则上,您可以这样称呼它:
这样您就不必为每个递归调用创建新的向量。如果您可以猜测样本数量的合理下限,您也许能够进行预分配,这样就不需要重新分配。在这种情况下,您可以这样称呼它:
然后您必须添加一个额外的计数器来跟踪生成了多少个样本。
I would recommend the following approach:
Don't use two vectors, but rather one vector with pairs or a custom
struct
to represent parameters and values:Change your algorithm to write the results of the evaluations to an output iterator:
The above is a modification of your pseudo-code showing what it might look like when using an output iterator. It is not tested, so I am not sure if it is correct. In principle, you would call it like this:
This way you will not have to create new vectors for each recursive call. If you can guess a reasonable lower bound for the number of samples, you might be able to preallocate such that no reallocations are required. In that case you would call it like this:
You would then have to add an extra counter to keep track of how many samples were generated.
我不明白你为什么拒绝列表解决方案。
最坏的情况是您的列表大小是原始数据的 3 倍。
我认为这远远少于在每个函数调用上创建新向量时所拥有的。
您应该尝试一下,因为它不需要太多更改,因为两者的界面几乎相同。
I don't see why you decline a list solution.
The worst case would be that your list has 3 times the size than your raw data.
I think that is by far less than what you have when you create a new vector on each function-call.
You should try it out as it does not require that much change, because the interface of both is nearly the same.