不改变框架的平台独立并行化?

发布于 2024-11-19 03:43:30 字数 779 浏览 3 评论 0原文

希望标题没有误导你。

我的问题如下:目前我尝试加速光线追踪器,这是在显卡的帮助下完成的。尽管事实上它变慢了,但它工作得很好。 :)

这是因为我在显卡(我的“跟踪服务器”)上一次在整个几何体上跟踪一条光线,然后获取结果,这非常慢,所以我必须收集一些光线并计算它们并将结果放在一起以加快速度。

下一个问题是,我不允许重写周围的框架,该框架应该对这种并行化一无所知或最不可能。

所以这是我的方法: 我考虑过使用多个线程,每个线程获取一条光线并请求我的“跟踪服务器”计算交叉点。然后线程停止,直到收集到足够的光线来计算显卡上的交点并有效地返回结果。这意味着每个线程将等待,直到获取结果。

你看,我已经有了一些计划,但我不知道以下内容:

  • 我应该采用哪个线程框架来实现平台无关?
  • 我应该使用固定大小的线程池还是根据需要创建它们?
  • 任何给定的线程库是否可以处理至少 1000 个等待线程(因为这将是我需要收集的数量才能使获取高效)?

但我也可以想象用一个线程来执行此操作,

  1. 将其负载(一条新光线)转储到“跟踪服务器”并获取下一个负载,直到
  2. 有足够的负载来获取结果。
  3. 然后线程将一一获取结果,进行进一步的计算,直到处理完所有结果,然后返回到第一步,直到完成所有光线。

另外,如果您对如何并行化有更好的想法,请告诉我。

问候,

没人

PS 如果您需要此信息:我想使用的两个平台是 Linux 和 Windows。

I hope the title did not mislead you.

My problem is the following: Currently I try to speed up a raytracer and this is done with the help of the graphics card. It works fine despite the fact that it got slower by this. :)

This is caused by the fact, that I trace one ray on the whole geometry at once on the graphics card(my "tracing server") and then fetch the results, which is awfully slow, so I have to gather some rays and calc them and fetch the results together to speed this up.

The next problem is, that I am not allowed to rewrite the surrounding framework that should know nothing or least possible about this parallelization.

So here is my approach:
I thought about using several threads, where each one gets a ray and requests my "tracing server" to calc the intersections. Then the thread is stopped until enough rays were gathered to calc the intersections on the graphics card and get the results back efficiently. This means that each thread will wait until the results were fetched.

You see I already have some plan but following I do not know:

  • Which threading framework should I take to be platformindependent?
  • Should I use a threadpool of fixed size or create them as needed?
  • Can any given thread library handle at least 1000 waiting threads(because that would be the number that I need to gather for my fetch to be efficient)?

But I also could imagine doing this with one thread that

  1. dumps its load (a new ray) to the "tracing server" and fetches the next load until
  2. there is enough to fetch the results.
  3. Then the thread would take the results one by one, do the further calculations until all results are processed and then goes back to step one until all rays are done.

Also if you have some better idea how to parallelize this, tell me about it.

Regards,

Nobody

PS
If you need this information: The two platforms I want to use are Linux and Windows.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

提笔书几行 2024-11-26 03:43:30

使用线程构建块或 boost::thread。

http://www.boost.org/doc/libs/ 1_46_0/doc/html/thread.html

http://threadingbuildingblocks.org/

作为就线程池/按需线程而言 - 线程池通常是更好的主意,因为它避免了创建开销。

等待线程的数量将更多地取决于底层系统:

Linux 中每个进程的最大线程数?

use either Thread Building Blocks or boost::thread.

http://www.boost.org/doc/libs/1_46_0/doc/html/thread.html

http://threadingbuildingblocks.org/

As far as threadpool/on-demand-threads - threadpool is generally better idea as it avoids creation overhead.

Number of waiting threads is gonna depend on the underlying system more than anything else:

Maximum number of threads per process in Linux?

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文