复杂的线性规划

发布于 2024-07-11 22:58:45 字数 255 浏览 15 评论 0原文

我正在尝试通过一种方式解决一个看起来像标准线性规划问题的问题。

我们有一组“短语”作为输入,每个短语都有一个权重。 我们需要选择文本中每个短语的重复次数,以最大化总权重,但要遵守最大字符长度限制。

这似乎是一个简单的线性规划问题,除了一个短语可能是另一个短语的子短语这一事实。 例如,如果您的输入短语包括“foo bar”和“foo”,那么如果您重复短语“foo bar”,您也会重复短语“foo”。 我不知道如何在线性规划模型中处理这个问题。

I'm trying to solve a problem that looks like a standard Linear Programming problem with one twist.

We have as input a set of "phrases" each of which has a weight. We need to choose how many times to repeat each phrase in a text to maximize the total weight, subject to a max character length limitation.

This seems like a straightforward linear programming problem, except for the fact that one phrase could be subphrase of another. So for example if your input phrases include "foo bar" and "foo", then if you repeat the phrase "foo bar", you also repeat the phrase "foo". I don't know how to deal with this in a linear programming model.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

以可爱出名 2024-07-18 22:58:45

您还有另一个问题,两个短语的串联可能形成第三个短语。 例如,如果您有

list...4
sting...6
ingrave...7
abcd...5

一个解决方案“listingrave”,则“gain”为 17,而您可以使用“abcd”(“abcdingrave”执行任何操作) >") 只有 12,但对于长度 4,“abcd” 是最佳选择。

不过,这是一个有趣的问题,我认为您需要构建一个自动机来搜索文本中的单词,并且您需要在该自动机中找到一条给定长度的路径,该路径穿过“最甜蜜”的状态(例如,通过给予最多的糖果作为回报)。 我会进一步考虑这一点。

更新:它应该像这样工作:

  1. 从 Aho-Corasick 文本搜索算法构建 DFA 从
  2. 的起始状态开始,搜索具有最大增益的给定长度的行走(不是路径,即允许重复的边和顶点)外交部。 您可以通过动态编程来实现(从 1 个字符较短的步行构建较长的步行)。

You have another problem, a concatenation of two phrases might form a third phrase. For example, if you have

list...4
sting...6
ingrave...7
abcd...5

then a solution "listingrave" has "gain" 17, whereas anything you could do with "abcd" ("abcdingrave") has only 12, although for length 4, "abcd" is the optimum.

It is an interesting problem, though, I think you need to build an automaton that searches text for your words and you need to find a path in that automaton of given length which passes through the "candiest" states (eg. through the states which give the most candy in return). I will consider that further.

UPDATE: It should work like that:

  1. Build a DFA from the Aho-Corasick text searching algorithm
  2. Search for a walk (not path, ie. you allow repeated edges and vertices) of given length with the maximum gain, starting from the starting state of the DFA. You can do it with dynamic programming (build longer walks from 1 character shorter ones).
杯别 2024-07-18 22:58:45

只需重新计算权重即可。 例如:

key = 3
board = 2
keyboard = 5

这将更改为:

key = 3
board = 2
keyboard = 5 + 3 + 2 = 10

如果您这样做,则必须注意包含短语的短语。 您必须确保不这样做:

ey = 7
key = 3 + 7
board = 2
keyboard = 5 + 7 + (3 + 7) + 2

您可以通过在短语长度之后对列表进行排序来避免这种情况。 (最长的第一个)

顺便说一句:这不是一个动态编程问题吗?

Just recalculate the weights. For example:

key = 3
board = 2
keyboard = 5

This changes to:

key = 3
board = 2
keyboard = 5 + 3 + 2 = 10

If you do this you have to watch out for phrases which include a phrase that includes a phrase. You have to make sure you don't do this:

ey = 7
key = 3 + 7
board = 2
keyboard = 5 + 7 + (3 + 7) + 2

You can avoid it by ordering the list after the lengths of the phrases. (Longest first)

BTW: Isn't this a dynamic programming problem?

柠栀 2024-07-18 22:58:45

也许只使用动态编程并每次重新计算整个过程就可以了。 人们可以使用大量优化来加快速度。 (就像尝试仅重新计算最后一部分一样)我认为速度是合理的。 O(n^2 * m) 左右。

Maybe it would work by using only dynamic programming and recalculating the whole thing each time. One could use a whole lot of optimizations to speed it up. (Like trying to recalculate only the last part) I think the speed would be reasonable. O(n^2 * m) or so.

浸婚纱 2024-07-18 22:58:45

这看起来可以作为背包问题来解决,请参阅http:// en.wikipedia.org/wiki/Knapsack_problem,权重按照 gs 建议定义。

This looks like it can be solved as a knapsack problem, see http://en.wikipedia.org/wiki/Knapsack_problem, with weights defined as suggested by gs.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文