如何使用 PuLP 选择数据帧行进行线性优化

发布于 2025-01-11 05:16:34 字数 1403 浏览 3 评论 0原文

我正在尝试解决以下问题：

我有一个包含多列的 pandas dataframe df 。我想找到值 a 和 b 来最大化列“结果”的总和除以数据框中所选行的数量，其中 a 和 b 用于使用以下约束选择数据框的行：

df['x'] >= a & df['y'] <= b
2.5 <= a <= 20
0.05 <= b <= 0.35

我尝试使用 PuLP，但是以前从未使用过它，因此陷入困境。这是该问题的示例代码以及我尝试解决它的方法：

import pandas as pd
from pulp import LpMaximize, LpProblem, LpStatus, lpSum, LpVariable

df = pd.DataFrame({'x': [2.94, 10.33, 8.67, 10.18, 2.82], 'y': [0.34, 0.21, 0.06, 0.24, 0.28], 'result': [-0.5, 9.55, 13.59, -0.2, 11.59]})
model = LpProblem(name='find_values', sense=LpMaximize)
a = LpVariable(name='a', lowBound=2.5, upBound=20)
b = LpVariable(name='b', lowBound=0.05, upBound=0.35)

# add constraints
model += a <= 20
model += b <= 0.35
# Set the objective
model += lpSum(
    df[(df['x'] >= a) & (df['y'] <= b)]['result']) / len(
    df[(df['x'] >= a) & (df['y'] <= b)])

print(model)
model.solve()
# Get the results
print(model.status)
print(LpStatus[model.status])
print(model.objective.value())
print(a.value())
print(b.value())

运行后，将显示以下输出：

find_values:
MAXIMIZE
6.806
SUBJECT TO
_C1: a <= 20

_C2: b <= 0.35

VARIABLES
2.5 <= a <= 20 Continuous
0.05 <= b <= 0.35 Continuous

1
Optimal
Objective value
None
2.5
0.05

对我来说，错误似乎出现在目标函数中，因为模型有一个固定值指示代码已经被评估了。然而，我不知道我必须如何制定它才能工作，或者这是否可以用 PuLP 实现。

原文

I am trying to solve the following problem:

I have a pandas dataframe df with multiple columns. I want to find values a and b to maximise the sum of column 'result' divided by the number of selected rows from the dataframe where a and b are used to select rows of the dataframe using the following constraints:

df['x'] >= a & df['y'] <= b
2.5 <= a <= 20
0.05 <= b <= 0.35

I tried using PuLP, but never worked with it before and therefore am stuck.
Here's a sample code for the problem and how I tried to solve it:

import pandas as pd
from pulp import LpMaximize, LpProblem, LpStatus, lpSum, LpVariable

df = pd.DataFrame({'x': [2.94, 10.33, 8.67, 10.18, 2.82], 'y': [0.34, 0.21, 0.06, 0.24, 0.28], 'result': [-0.5, 9.55, 13.59, -0.2, 11.59]})
model = LpProblem(name='find_values', sense=LpMaximize)
a = LpVariable(name='a', lowBound=2.5, upBound=20)
b = LpVariable(name='b', lowBound=0.05, upBound=0.35)

# add constraints
model += a <= 20
model += b <= 0.35
# Set the objective
model += lpSum(
    df[(df['x'] >= a) & (df['y'] <= b)]['result']) / len(
    df[(df['x'] >= a) & (df['y'] <= b)])

print(model)
model.solve()
# Get the results
print(model.status)
print(LpStatus[model.status])
print(model.objective.value())
print(a.value())
print(b.value())

Once this is run, the following output is shown:

find_values:
MAXIMIZE
6.806
SUBJECT TO
_C1: a <= 20

_C2: b <= 0.35

VARIABLES
2.5 <= a <= 20 Continuous
0.05 <= b <= 0.35 Continuous

1
Optimal
Objective value
None
2.5
0.05

For me it looks like the mistake is in the objective function, since the model has a fixed value there indicating the code is already evaluated.
However, I have no idea, how I have to formulate it to work or if this is even possible with PuLP.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

卸妝后依然美 2025-01-18 05:16:34

您只能使用纸浆变量来构建目标表达式。但是，由于您的目标表达式无法用纸浆变量来表达，因此您需要重新表述问题并将其作为混合整数程序来解决。

或者，您可以保留目标（似乎不可微分）并通过 scipy.optimize.dual_annealing 使用黑盒优化。但请注意，这种方法不能保证局部最大值：

from scipy.optimize import dual_annealing

def objective(ab):
    a, b = ab
    mask = (df.x >= a) & (df.y <= b)
    if (mask == False).all():
        return -1000.0
    else:
        vals = df[mask].result.values
        return np.sum(vals) / vals.size

# transform the maximization problem into a minimization problem
res = dual_annealing(lambda ab: -1.0*objective(ab), bounds=((2.5, 20.0), (0.05, 0.35)))

这会产生目标值为 13.59 的解 (a=6.55，b=0.13)。

You can only use pulp variables to build the objective expression. However, because your objective expression can't be expressed by pulp variables, you need to reformulate the problem and solve it as a mixed-integer program.

Alternatively, you can keep the objective (which doesn't seem to be differentiable) and use a black-box optimization by means of scipy.optimize.dual_annealing. But please be aware that this approach doesn't guarantee a local maximum:

from scipy.optimize import dual_annealing

def objective(ab):
    a, b = ab
    mask = (df.x >= a) & (df.y <= b)
    if (mask == False).all():
        return -1000.0
    else:
        vals = df[mask].result.values
        return np.sum(vals) / vals.size

# transform the maximization problem into a minimization problem
res = dual_annealing(lambda ab: -1.0*objective(ab), bounds=((2.5, 20.0), (0.05, 0.35)))

This yields a solution (a=6.55, b=0.13) with objective value 13.59.

回复收藏 0 原文

~没有更多了~