嘿,
我想评估一个数学函数(用户定义),它在 C++ 中返回数组中的多个值(该函数是基于向量的函数 f:R^n->R^m,具有 n 个输入坐标和 m 个输出函数)某些参数,例如:
double *my_func(const mxArray *point)
{
double *dat = mxGetPr(point);
double *vals = new double[ 3 ];
vals[0] = dat[0]*dat[0]*dat[0]*dat[0]*dat[0];
vals[1] = sin(dat[0])*dat[1]*dat[2]*dat[2]*cos(dat[1]);
vals[2] = exp(dat[0])*sin(dat[0])*dat[3];
double *pnt = vals;
return pnt;
}
目前我在CPU上执行此操作。所以我调用该函数一次并返回一个包含所有函数值的数组。因为我现在想在 GPU 上并行化它,所以我考虑了如何去做。
我认为在每个线程中完全评估 my_func() 有点愚蠢,因为每个线程都会计算整个函数数组。 这是正确的假设吗?
是否有任何方法可以轻松地仅计算函数数组的第 n 个元素并返回它,以便 5 个线程可以轻松地并行计算函数数组一个CPU完全“单独”计算它?
我能想到的唯一方法是:
double my_func0(const mxArray *point)
{
double *dat = mxGetPr(point);
return dat[0]*dat[0]*dat[0]*dat[0]*dat[0];
}
double my_func1(const mxArray *point)
{
double *dat = mxGetPr(point);
return sin(dat[0])*dat[1]*dat[2]*dat[2]*cos(dat[1]);
}
double my_func2(const mxArray *point)
{
double *dat = mxGetPr(point);
return exp(dat[0])*sin(dat[0])*dat[3];
}
等等...但这对于稍后使用该程序的用户来说会非常“不舒服”,因为如果他想扩展函数数组而不是,他总是必须创建新的 C++ 函数只是调整一个 C++ 函数。另一个问题是:我必须动态调用该函数,因为函数的数量是“动态”的,因此我必须调用 my_func_%%i%%
并且不这样做知道这是否是一个好方法...所以问题是是否有更好的方法来处理这个问题?
Hey there,
I want to evaluate a mathematical function (user-defined) which returns several values in an array (this function is a vector based function f:R^n->R^m with n input coordinates and m output functions) in C++ for certain parameters, e.g.:
double *my_func(const mxArray *point)
{
double *dat = mxGetPr(point);
double *vals = new double[ 3 ];
vals[0] = dat[0]*dat[0]*dat[0]*dat[0]*dat[0];
vals[1] = sin(dat[0])*dat[1]*dat[2]*dat[2]*cos(dat[1]);
vals[2] = exp(dat[0])*sin(dat[0])*dat[3];
double *pnt = vals;
return pnt;
}
Currently I do this on the CPU. So I call the function once and get back an array with all function values. As I want to parallelize it now on the GPU, I thought about how to do it.
I assume it would be kind of stupid to evaluate my_func() completely in each thread since than each thread would calculate the whole function-array. Is this the right assumption?
Would there be any way to comfortable calculate only the n-th element of the function-array and return it, so that 5 threads could easily calculate the function-array in parallel instead of one CPU calculating it completely 'alone'?
The only way I could think off was:
double my_func0(const mxArray *point)
{
double *dat = mxGetPr(point);
return dat[0]*dat[0]*dat[0]*dat[0]*dat[0];
}
double my_func1(const mxArray *point)
{
double *dat = mxGetPr(point);
return sin(dat[0])*dat[1]*dat[2]*dat[2]*cos(dat[1]);
}
double my_func2(const mxArray *point)
{
double *dat = mxGetPr(point);
return exp(dat[0])*sin(dat[0])*dat[3];
}
etc... But this would be quite 'uncomfortable' for the user who uses the program later because he always would have to create new C++ functions if he wants to extend the function-array instead of just adapting ONE single C++-function. And a further problem would be: I have to dynamically call the function since the number of functions is 'dynamic' and thus I would have to do a call to my_func_%%i%%
and don't know if this is a good way to do it... So the question is if there would be a better way to deal with this problem?
发布评论
评论(1)
当你说“user_defined”时,我想你的意思是其他人编写了 my_func() 然后你的代码调用了它?
如果是这种情况,请考虑并行运行对
my_func()
的多次调用,而不是尝试分解该函数。这意味着编写my_func()
的人只需要编写一个函数,而您将负责委托多个调用,确保它们有正确的数据可以处理,并收集结果。根据评论更新
在您的情况下,如果计算
vals
每个成员所需的操作不同,则用户必须参数化my_func()
按所需索引;正如您建议的那样,请注意它现在如何返回单个双精度值而不是整个结果数组。或者为每个索引提供不同的my_func()
;double my_func_n(const mxArray *point)
。然后,您可以从任意多个不同的线程调用此函数或函数集,并获得单个结果以进行进一步计算。我们忽略了许多并发问题,但是需要考虑同时读取/写入数据。
一般多任务建议
在研究 GPU 的多任务处理之前,请先了解一下 CPU 上的标准多线程处理(我建议使用 Boost 线程库来提供帮助:http://www.boost.org/)。一旦您了解了如何创建和使用线程,您可能会发现您更好地理解了可以用它们做什么以及如何去做。
如果您将数学函数应用于非常大的矩阵或向量,并且可以使用某些图形函数的硬件实现来获得数学结果,那么使用 GPU 进行多任务处理会变得更加有用。还有更多库支持 GPGPU(通用 GPU)编程,例如 OpenCL、Nvidia 的 CUDA 或 ATI 的 Stream。查看这些库提供的内容,让您了解它们如何适用于您的情况。
When you say "user_defined" I presume you mean that someone else writes
my_func()
and then your code calls it?If this is the case, consider running many calls to
my_func()
in parallel rather than trying to break the function up. This means whoever writesmy_func()
only needs to write one function, and you will be responsible for delegating multiple calls, ensuring they have the correct data to work on, and gathering up the results.Update Based on Comments
In your situation, If the operation required to calculate each member of the
vals
is different then the user would either have to parameterise themy_func()
by the index required; as you suggesteddouble my_func(const mxArray *point, const unsigned & index)
, note how it now returns a single double value as opposed to the whole result array. Or provide a differentmy_func()
for each index;double my_func_n(const mxArray *point)
.You could then call this function or set of functions from as many different threads as you like and get a single result for further computation. We are ignoring many concurrency issues however to do with reading/writing data simultaneously which need thinking about.
General Mutlitasking Advice
Before looking into multitasking with your GPU have a look at standard multithreading on a CPU (I recommend Boost Thread Libraries to help: http://www.boost.org/). Once you see how threads are created and used you may find you better understand what you can do with them and how you'd go about doing it.
Multitasking with a GPU becomes more useful if you are applying mathematical functions to very large matrices or vectors and it is possible to use hardware implementations of certain graphical functions to achieve the mathematical result. There are further libraries to support GPGPU (General Purpose GPU) programming, such as OpenCL, Nvidia's CUDA, or ATI's Stream. Have a look at what these libraries provide to give you an idea of how applicable they are to your situation.