最大限度地减少运输时间
[底部更新(包括解决方案源代码)]
我有一个具有挑战性的业务问题,计算机可以帮助解决。
沿着山区,有一条蜿蜒曲折的长河,水流湍急。沿着河流的某些部分有一些环境敏感的土地,适合种植需求量很大的特定类型的稀有水果。一旦田间劳动者收获了水果,就开始将水果运送到加工厂。尝试将水果向上游或通过陆地或空中运送的成本非常高。到目前为止,将它们运送到工厂的最具成本效益的机制是使用仅由河流恒流驱动的下游集装箱。我们有能力建造 10 个加工厂,需要将这些加工厂建在河边,以尽量减少水果在运输途中的总时间。水果可能需要很长时间才能到达最近的下游工厂,但这会直接影响它们的销售价格。实际上,我们希望最小化到最近的各个下游工厂的距离总和。工厂可以位于距离水果接入点下游 0 米处。
问题是:如果找到了32个水果种植区,那么为了利润最大化,这10个加工厂应该建在河的多远的地方,这些地区距河床上游的距离是(米): 10, 40, 90, 160, 250, 360, 490, ... (n^2)*10 ... 9000, 9610, 10320?
[希望所有致力于解决这个问题以及创造类似问题和使用场景的工作都能够帮助提高人们对软件/商业方法专利的破坏性和窒息性的认识并产生普遍的抵制(无论这些专利在多大程度上)可能被认为在某个地区是合法的)。]
更新
Update1:忘记添加:我相信这个问题是这个。
更新2:我编写的一个算法在不到一秒的时间内给出了答案,我相信它相当好(但它在样本值上还不稳定)。我稍后会提供更多详细信息,但简短如下。将植物以相等的间距放置。循环遍历所有内部植物,在每个植物上,您通过测试其两个邻居之间的每个位置来重新计算其位置,直到问题在该空间内得到解决(贪婪算法)。因此,您可以在固定 1 和 3 的情况下优化工厂 2。然后工厂 3 保持 2 和 4 固定...当你到达终点时,你循环返回并重复,直到你进入一个完整的循环,其中每个加工厂的重新计算位置停止变化..也在每个循环结束时,你尝试移动一个个挤在一起、离果场较近的加工厂变成了一个果场距离较远的地区。有很多方法可以改变细节,从而产生准确的答案。我还有其他候选算法,但都有故障。 [我稍后会发布代码。] 正如迈克·邓拉维(Mike Dunlavey)下面提到的,我们可能只是想要“足够好”。
为了了解什么可能是“足够好”的结果:
10010 total length of travel from 32 locations to plants at
{10,490,1210,1960,2890,4000,5290,6760,8410,9610}
更新3:嗯,首先给出了正确的精确解决方案,但没有(尚未)发布程序或算法,所以我写了一个产生相同值的程序或算法。
/************************************************************
This program can be compiled and run (eg, on Linux):
$ gcc -std=c99 processing-plants.c -o processing-plants
$ ./processing-plants
************************************************************/
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
//a: Data set of values. Add extra large number at the end
int a[]={
10,40,90,160,250,360,490,640,810,1000,1210,1440,1690,1960,2250,2560,2890,3240,3610,4000,4410,4840,5290,5760,6250,6760,7290,7840,8410,9000,9610,10240,99999
};
//numofa: size of data set
int numofa=sizeof(a)/sizeof(int);
//a2: will hold (pt to) unique data from a and in sorted order.
int *a2;
//max: size of a2
int max;
//num_fixed_loc: at 10 gives the solution for 10 plants
int num_fixed_loc;
//xx: holds index values of a2 from the lowest error winner of each cycle memoized. accessed via memoized offset value. Winner is based off lowest error sum from left boundary upto right ending boundary.
//FIX: to be dynamically sized.
int xx[1000000];
//xx_last: how much of xx has been used up
int xx_last=0;
//SavedBundle: data type to "hold" memoized values needed (total traval distance and plant locations)
typedef struct _SavedBundle {
long e;
int xx_offset;
} SavedBundle;
//sb: (pts to) lookup table of all calculated values memoized
SavedBundle *sb; //holds winning values being memoized
//Sort in increasing order.
int sortfunc (const void *a, const void *b) {
return (*(int *)a - *(int *)b);
}
/****************************
Most interesting code in here
****************************/
long full_memh(int l, int n) {
long e;
long e_min=-1;
int ti;
if (sb[l*max+n].e) {
return sb[l*max+n].e; //convenience passing
}
for (int i=l+1; i<max-1; i++) {
e=0;
//sum first part
for (int j=l+1; j<i; j++) {
e+=a2[j]-a2[l];
}
//sum second part
if (n!=1) //general case, recursively
e+=full_memh(i, n-1);
else //base case, iteratively
for (int j=i+1; j<max-1; j++) {
e+=a2[j]-a2[i];
}
if (e_min==-1) {
e_min=e;
ti=i;
}
if (e<e_min) {
e_min=e;
ti=i;
}
}
sb[l*max+n].e=e_min;
sb[l*max+n].xx_offset=xx_last;
xx[xx_last]=ti; //later add a test or a realloc, etc, if approp
for (int i=0; i<n-1; i++) {
xx[xx_last+(i+1)]=xx[sb[ti*max+(n-1)].xx_offset+i];
}
xx_last+=n;
return e_min;
}
/*************************************************************
Call to calculate and print results for given number of plants
*************************************************************/
int full_memoization(int num_fixed_loc) {
char *str;
long errorsum; //for convenience
//Call recursive workhorse
errorsum=full_memh(0, num_fixed_loc-2);
//Now print
str=(char *) malloc(num_fixed_loc*20+100);
sprintf (str,"\n%4d %6d {%d,",num_fixed_loc-1,errorsum,a2[0]);
for (int i=0; i<num_fixed_loc-2; i++)
sprintf (str+strlen(str),"%d%c",a2[ xx[ sb[0*max+(num_fixed_loc-2)].xx_offset+i ] ], (i<num_fixed_loc-3)?',':'}');
printf ("%s",str);
return 0;
}
/**************************************************
Initialize and call for plant numbers of many sizes
**************************************************/
int main (int x, char **y) {
int t;
int i2;
qsort(a,numofa,sizeof(int),sortfunc);
t=1;
for (int i=1; i<numofa; i++)
if (a[i]!=a[i-1])
t++;
max=t;
i2=1;
a2=(int *)malloc(sizeof(int)*t);
a2[0]=a[0];
for (int i=1; i<numofa; i++)
if (a[i]!=a[i-1]) {
a2[i2++]=a[i];
}
sb = (SavedBundle *)calloc(sizeof(SavedBundle),max*max);
for (int i=3; i<=max; i++) {
full_memoization(i);
}
free(sb);
return 0;
}
[Updates at bottom (including solution source code)]
I have a challenging business problem that a computer can help solve.
Along a mountainous region flows a long winding river with strong currents. Along certain parts of the river are plots of environmentally sensitive land suitable for growing a particular type of rare fruit that is in very high demand. Once field laborers harvest the fruit, the clock starts ticking to get the fruit to a processing plant. It's very costly to try and send the fruits upstream or over land or air. By far the most cost effective mechanism to ship them to the plant is downstream in containers powered solely by the river's constant current. We have the capacity to build 10 processing plants and need to locate these along the river to minimize the total time the fruits spend in transit. The fruits can take however long before reaching the nearest downstream plant but that time directly hurts the price at which they can be sold. Effectively, we want to minimize the sum of the distances to the nearest respective downstream plant. A plant can be located as little as 0 meters downstream from a fruit access point.
The question is: In order to maximize profits, how far up the river should we build the 10 processing plants if we have found 32 fruit growing regions, where the regions' distances upstream from the base of the river are (in meters):
10, 40, 90, 160, 250, 360, 490, ... (n^2)*10 ... 9000, 9610, 10320?
[It is hoped that all work going towards solving this problem and towards creating similar problems and usage scenarios can help raise awareness about and generate popular resistance towards the damaging and stifling nature of software/business method patents (to whatever degree those patents might be believed to be legal within a locality).]
UPDATES
Update1: Forgot to add: I believe this question is a special case of this one.
Update2: One algorithm I wrote gives an answer in a fraction of a second, and I believe is rather good (but it's not yet stable across sample values). I'll give more details later, but the short is as follows. Place the plants at equal spacings. Cycle over all the inner plants where at each plant you recalculate its position by testing every location between its two neighbors until the problem is solved within that space (greedy algorithm). So you optimize plant 2 holding 1 and 3 fixed. Then plant 3 holding 2 and 4 fixed... When you reach the end, you cycle back and repeat until you go a full cycle where every processing plant's recalculated position stops varying.. also at the end of each cycle, you try to move processing plants that are crowded next to each other and are all near each others' fruit dumps into a region that has fruit dumps far away. There are many ways to vary the details and hence the exact answer produced. I have other candidate algorithms, but all have glitches. [I'll post code later.] Just as Mike Dunlavey mentioned below, we likely just want "good enough".
To give an idea of what might be a "good enough" result:
10010 total length of travel from 32 locations to plants at
{10,490,1210,1960,2890,4000,5290,6760,8410,9610}
Update3: mhum gave the correct exact solution first but did not (yet) post a program or algorithm, so I wrote one up that yields the same values.
/************************************************************
This program can be compiled and run (eg, on Linux):
$ gcc -std=c99 processing-plants.c -o processing-plants
$ ./processing-plants
************************************************************/
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
//a: Data set of values. Add extra large number at the end
int a[]={
10,40,90,160,250,360,490,640,810,1000,1210,1440,1690,1960,2250,2560,2890,3240,3610,4000,4410,4840,5290,5760,6250,6760,7290,7840,8410,9000,9610,10240,99999
};
//numofa: size of data set
int numofa=sizeof(a)/sizeof(int);
//a2: will hold (pt to) unique data from a and in sorted order.
int *a2;
//max: size of a2
int max;
//num_fixed_loc: at 10 gives the solution for 10 plants
int num_fixed_loc;
//xx: holds index values of a2 from the lowest error winner of each cycle memoized. accessed via memoized offset value. Winner is based off lowest error sum from left boundary upto right ending boundary.
//FIX: to be dynamically sized.
int xx[1000000];
//xx_last: how much of xx has been used up
int xx_last=0;
//SavedBundle: data type to "hold" memoized values needed (total traval distance and plant locations)
typedef struct _SavedBundle {
long e;
int xx_offset;
} SavedBundle;
//sb: (pts to) lookup table of all calculated values memoized
SavedBundle *sb; //holds winning values being memoized
//Sort in increasing order.
int sortfunc (const void *a, const void *b) {
return (*(int *)a - *(int *)b);
}
/****************************
Most interesting code in here
****************************/
long full_memh(int l, int n) {
long e;
long e_min=-1;
int ti;
if (sb[l*max+n].e) {
return sb[l*max+n].e; //convenience passing
}
for (int i=l+1; i<max-1; i++) {
e=0;
//sum first part
for (int j=l+1; j<i; j++) {
e+=a2[j]-a2[l];
}
//sum second part
if (n!=1) //general case, recursively
e+=full_memh(i, n-1);
else //base case, iteratively
for (int j=i+1; j<max-1; j++) {
e+=a2[j]-a2[i];
}
if (e_min==-1) {
e_min=e;
ti=i;
}
if (e<e_min) {
e_min=e;
ti=i;
}
}
sb[l*max+n].e=e_min;
sb[l*max+n].xx_offset=xx_last;
xx[xx_last]=ti; //later add a test or a realloc, etc, if approp
for (int i=0; i<n-1; i++) {
xx[xx_last+(i+1)]=xx[sb[ti*max+(n-1)].xx_offset+i];
}
xx_last+=n;
return e_min;
}
/*************************************************************
Call to calculate and print results for given number of plants
*************************************************************/
int full_memoization(int num_fixed_loc) {
char *str;
long errorsum; //for convenience
//Call recursive workhorse
errorsum=full_memh(0, num_fixed_loc-2);
//Now print
str=(char *) malloc(num_fixed_loc*20+100);
sprintf (str,"\n%4d %6d {%d,",num_fixed_loc-1,errorsum,a2[0]);
for (int i=0; i<num_fixed_loc-2; i++)
sprintf (str+strlen(str),"%d%c",a2[ xx[ sb[0*max+(num_fixed_loc-2)].xx_offset+i ] ], (i<num_fixed_loc-3)?',':'}');
printf ("%s",str);
return 0;
}
/**************************************************
Initialize and call for plant numbers of many sizes
**************************************************/
int main (int x, char **y) {
int t;
int i2;
qsort(a,numofa,sizeof(int),sortfunc);
t=1;
for (int i=1; i<numofa; i++)
if (a[i]!=a[i-1])
t++;
max=t;
i2=1;
a2=(int *)malloc(sizeof(int)*t);
a2[0]=a[0];
for (int i=1; i<numofa; i++)
if (a[i]!=a[i-1]) {
a2[i2++]=a[i];
}
sb = (SavedBundle *)calloc(sizeof(SavedBundle),max*max);
for (int i=3; i<=max; i++) {
full_memoization(i);
}
free(sb);
return 0;
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
让我给你一个 Metropolis-Hastings 算法的简单示例。
假设您有一个状态向量x和一个拟合优度函数P(x),它可以是您想要编写的任何函数。
假设您有一个随机分布 Q,可用于修改向量,例如
x' = x + N(0, 1) * sigma
,其中 N 是大约 0 的简单正态分布,sigma 是您选择的标准差。通常,它需要大约 1000 个周期的老化时间,然后开始收集样本。再经过大约 10,000 个循环后,样本的平均值就是您的答案。
它需要诊断和调整。通常会绘制样本,并且您正在寻找的是稳定(不会移动太多)并且具有高接受率(非常模糊)的“模糊毛毛虫”图。您可以使用的主要参数是sigma。
如果sigma太小,绘图就会模糊,而且会四处游荡。
如果太大,绘图将不会模糊 - 它将有水平线段。
通常,起始向量x是随机选择的,并且通常会选择多个起始向量,以查看它们是否最终位于同一位置。
没有必要同时改变状态向量x的所有分量。您可以循环使用它们,一次改变一个,或者使用类似的方法。
另外,如果您不需要诊断图,则可能不需要保存样本,而只需动态计算平均值和方差即可。
在我熟悉的应用程序中,P(x) 是概率的度量,它通常位于对数空间中,因此它可以从 0 到负无穷大变化。
然后执行“也许接受”步骤,即
(exp(logp' - logp))
Let me give you a simple example of a Metropolis-Hastings algorithm.
Suppose you have a state vector x, and a goodness-of-fit function P(x), which can be any function you care to write.
Suppose you have a random distribution Q that you can use to modify the vector, such as
x' = x + N(0, 1) * sigma
, where N is a simple normal distribution about 0, and sigma is a standard deviation of your choosing.Usually it is done with a burn-in time of maybe 1000 cycles, after which you start collecting samples. After another maybe 10,000 cycles, the average of the samples is what you take as an answer.
It requires diagnostics and tuning. Typically the samples are plotted, and what you are looking for is a "fuzzy caterpilar" plot that is stable (doesn't move around much) and has a high acceptance rate (very fuzzy). The main parameter you can play with is sigma.
If sigma is too small, the plot will be fuzzy but it will wander around.
If it is too large, the plot will not be fuzzy - it will have horizontal segments.
Often the starting vector x is chosen at random, and often multiple starting vectors are chosen, to see if they end up in the same place.
It is not necessary to vary all components of the state vector x at the same time. You can cycle through them, varying one at a time, or some such method.
Also, if you don't need the diagnostic plot, it may not be necessary to save the samples, but just calculate the average and variance on the fly.
In the applications I'm familiar with, P(x) is a measure of probability, and it is typically in log-space, so it can vary from 0 to negative infinity.
Then to do the "maybe accept" step it is
(exp(logp' - logp))
除非我犯了错误,否则这里有确切的解决方案(通过动态编程方法获得):
Unless I've made an error, here are exact solutions (obtained through a dynamic programming approach):