寻找想法/参考/关键词：搜索算法的自适应参数控制（在线学习）

发布于 2024-10-04 03:41:10 字数 1109 浏览 8 评论 0原文

我正在寻找有关搜索算法参数（在线学习）的自适应参数控制的想法/经验/参考/关键字组合优化。

更详细一点：

我有一个框架，负责优化硬组合优化问题。这是在一些以迭代方式使用的“小启发式”的帮助下完成的（大邻域搜索；破坏和重建方法）。这些“小启发式”的每个算法都采用一些外部参数，这些参数在某种程度上控制启发式逻辑（目前：只是随机值；某种噪声；使搜索多样化）。

现在我想要一个控制框架，以尽可能通用的方式以改进收敛的方式选择这些参数，以便以后可以在不更改参数控制的情况下添加新的启发式方法。

至少需要做出两个一般决策：

A：选择在下一次迭代中使用的算法对（一个破坏算法和一个重建算法）。
B：选择算法的随机参数。

唯一的反馈是新发现的解决方案的评估函数。这让我想到了强化学习的主题。这是正确的方向吗？

这并不是真正的学习行为，但目前的简单化想法是：

A：根据迭代过程中收集的一些性能值进行轮盘赌选择（近过去的比旧的更有价值）。因此，如果启发式 1 确实找到了所有新的全局最佳解决方案 ->选择这个的概率很大。
B：还不知道。也许可以在 (0,1) 范围内使用一些非均匀随机值，并且我正在收集一些变化的动量。因此，如果启发式 1 上次使用 alpha = 0.3 并没有找到新的最佳解决方案，则使用 0.6 并找到新的最佳解决方案 ->有趋向 1 的势头 ->下一个随机值可能大于 0.3。可能的问题：振荡！

备注事项： - 一种特定算法良好收敛所需的参数可能会发生巨大变化 ->也许一开始需要更加多元化的经营，最后需要更加集约化的经营。 - 在一对特定的破坏/重建算法（有时称为：耦合邻域）中可能会产生良好的协同效应。人们如何认识这样的东西呢？这仍然属于强化学习领域吗？ - 不同的算法由不同数量的参数控制（有些取 1，有些取 3）。

有什么想法、经验、参考文献（论文）、关键词（ml-topics）吗？
如果对(b)的决策有以离线学习方式的想法。请毫不犹豫地提及这一点。

感谢您的所有意见。

萨沙

原文

I'm looking for ideas/experiences/references/keywords regarding an adaptive-parameter-control of search algorithm parameters (online-learning) in combinatorial-optimization.

A bit more detail:

I have a framework, which is responsible for optimizing a hard combinatorial-optimization-problem. This is done with the help of some "small heuristics" which are used in an iterative manner (large-neighborhood-search; ruin-and-recreate-approach). Every algorithm of these "small heuristics" is taking some external parameters, which are controlling the heuristic-logic in some extent (at the moment: just random values; some kind of noise; diversify the search).

Now i want to have a control-framework for choosing these parameters in a convergence-improving way, as general as possible, so that later additions of new heuristics are possible without changing the parameter-control.

There are at least two general decisions to make:

A: Choose the algorithm-pair (one destroy- and one rebuild-algorithm) which is used in the next iteration.
B: Choose the random parameters of the algorithms.

The only feedback is an evaluation-function of the new-found-solution. That leads me to the topic of reinforcement-learning. Is that the right direction?

Not really a learning-like-behavior, but the simplistic ideas at the moment are:

A: A roulette-wheel-selection according to some performance-value collected during the iterations (near past is more valued than older ones).
So if heuristic 1 did find all the new global best solutions -> high probability of choosing this one.
B: No idea yet. Maybe it's possible to use some non-uniform random values in the range (0,1) and i'm collecting some momentum of the changes.
So if heuristic 1 last time used alpha = 0.3 and found no new best solution, then used 0.6 and found a new best solution -> there is a momentum towards 1
-> next random value is likely to be bigger than 0.3. Possible problems: oscillation!

Things to remark:
- The parameters needed for good convergence of one specific algorithm can change dramatically -> maybe more diversify-operations needed at the beginning, more intensify-operations needed at the end.
- There is a possibility of good synergistic-effects in a specific pair of destroy-/rebuild-algorithm (sometimes called: coupled neighborhoods). How would one recognize something like that? Is that still in the reinforcement-learning-area?
- The different algorithms are controlled by a different number of parameters (some taking 1, some taking 3).

Any ideas, experiences, references (papers), keywords (ml-topics)?
If there are ideas regarding the decision of (b) in a offline-learning-manner. Don't hesitate to mention that.

Thanks for all your input.

Sascha

分享到QQ

分享到微博