高并发多线程应用需要硬件

发布于 2024-08-14 17:29:55 字数 642 浏览 18 评论 0原文

我正在寻找一种硬件,它必须在 24 小时模式下运行大约 256 个计算密集型实时并发任务(一个多线程 C 应用程序)。每个任务大约需要 40-50 MFLOP,因此所有任务大约需要 10 GFLOP。 CPU-RAM 速度无关紧要。所有任务必须由 Linux 内核(32 位,带 SMP)管理。

我正在寻找一种带有一个多核CPU的单主板解决方案(如果存在这样的CPU)。如果这样的CPU不存在,那么我需要一种多插槽主板解决方案(具有多个CPU)。

您能给我推荐任何可以满足此类要求的专业CPU/主板解决方案吗? Linux 内核 (2.6.25) 没有问题也非常重要。没有虚拟化,不需要巨大的 RAM 或 CPU 缓存。我还更喜欢英特尔架构和久经考验的稳定性。我仍然怀疑它是否可行。

先感谢您。

更新: 我想我已经找到了正确的答案这里并且此处

I am looking for a hardware, which must run about 256 computationally intensive real-time concurrent tasks in 24 hour mode (one multi-threaded C application). Each task takes about 40-50 MFLOPs, so all tasks require about 10 GFLOPs. CPU-RAM speed is insignificant. All tasks must be managed by a Linux Kernel (32 bit, with SMP).

I am looking for a one-mainboard solution with one multi-core CPU (if such CPU exist). If such CPU doesn't exist, then I need one mulit-socket mainboard solution (with multiple CPUs).

Can you please recommend me any professional CPU/Mainboard solution which will satisfy such requirements? It is also very important that there are no issues with Linux Kernel (2.6.25). No virtualization, no needs in huge RAM or CPU cache. I also would prefer Intel architecture and well-proved stability. I still have doubts that it is feasible at all.

Thank you in advance.

UPDATE:
I think I have found a right answer here and here.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

拥抱我好吗 2024-08-21 17:29:55

UltraSPARC T2 有 8 个内核,每个内核有 8 个线程。集成高带宽内存和IO。 T5140 携带其中两个,可支持 128 个硬件线程。

8 个浮点单元的理论最大原始性能为每秒 11 千兆次浮点运算 (GFlops/s)。然而,与其他实现相比,一个巨大的优势是 64 个线程可以共享单元,因此我们可以实现极高的理论峰值百分比。我们的实验已经实现了 11 Gflop/s 的近 90%。 - (http://blogs.oracle.com/deniss/entry/floating_point_performance_on_the)

UltraSPARC T2 has 8 cores with 8 threads each. Integrated high-bandwidth memory and IO. The T5140 carries two of them for 128 hardware threads.

The theoretical max raw performance of the 8 floating point units is 11 Giga flops per second (GFlops/s). A huge advantage over other implementations however is that 64 threads can share the units and thus we can achieve an extremely high percentage of theoretical peak. Our experiments have achieved nearly 90% of the 11 Gflop/s. - (http://blogs.oracle.com/deniss/entry/floating_point_performance_on_the)

£冰雨忧蓝° 2024-08-21 17:29:55
  1. 租用一些 Amazon EC2 节点。

  2. 更新:那 PS3 怎么样? NASA 使用它们作为模拟引擎。

  3. 也许在商业服务器中使用CPU+GPU?

  4. 围绕FPGA构建:如今,一些变体包括可以运行Linux的处理器.

  1. Rent some Amazon EC2 nodes.

  2. Updated: How about PS3's then? The NASA uses them for their simulation engines.

  3. Maybe use CPU+GPU's in commercial servers?

  4. Build it around FPGAs: nowadays, some variants include processors that can run Linux.

寄人书 2024-08-21 17:29:55

即使您已经向我们提供了您认为需要的规格,但如果您告诉我们该应用程序的目的是什么以及它是如何实现的,我们也许能够更好地帮助您。

可能有更好的方法来分割工作或处理它,而不是您当前的解决方案。

Even though you've given us the specs you think you need, we might be able to help you out better if you tell us what the application is intended to accomplish, and how it was implemented.

There may be a better way to split the work up or deal with it rather than your current solution.

最偏执的依靠 2024-08-21 17:29:55

不是 Intel 架构,但它们运行 Linux,并且在单个芯片上有 64 个内核。

TILEPro64

Not Intel architecture but these run linux and have 64 cores on a single die.

TILEPro64

乄_柒ぐ汐 2024-08-21 17:29:55

获取一组四核或八核机器,并使用某种网格或集群软件在机器之间分割处理。也许看看Beowulf

正如您所提到的,10GFlops 确实不容小觑,因此在一台机器中,它会很昂贵。还有一个问题是,当机器坏了时你该怎么办,你不太可能有第二台类似规格的机器可用。如果您使用商用硬件构建集群,那么您的弹性会更强一些,并且更容易找到替代机器。

Get a bunch of four- or eight-core machines and split the processing across the machines using some sort of grid or clustering software. Maybe have a look at Beowulf.

As you mentioned, 10GFlops isn't exactly to be sneezed at so in a single machine, it'll be expensive. There's also the problem what you do when the machine breaks, you're unlikely to have a second machine of similar spec available. If you build a cluster using commodity hardware, you're a little more resilient and it's easier to find replacement machines.

向日葵 2024-08-21 17:29:55

MFLOPS 和 GFLOPS 是衡量程序在任何给定 CPU 上运行情况的非常差的指标。如今,缓存占用空间变得更加重要。也许分支预测的准确性也是如此。

几乎没有办法在不实际测试的情况下衡量给定应用程序在不同架构上的性能。即便如此,如果您不幸地在不知不觉中使用破坏了缓存占用空间的编译器选项进行构建,或者使用了错误的线程库,或者其他一百种东西中的任何一种,您可能也不会得到一个好主意。

MFLOPS and GFLOPS are very poor indicators of how well a program can run on any given CPU. These days, cache footprint is much more important; perhaps branch prediction accuracy as well.

There's almost no way to gauge performance of a given application on different architectures without actually giving it a spin. And even then, you may not get a good idea if you were unlucky enough to unknowingly build with compiler options that ruined your cache footprint, or used a bad threading library, or any of a hundred other things.

老子叫无熙 2024-08-21 17:29:55

我发现您更喜欢英特尔,但如果您需要一种芯片,我会再次建议单元处理器 -
它的理论峰值性能约为 25GFlops - 内核 2.6.25 已经支持它。

您可以尝试使用超薄型 Playstation 3 进行试验(这会花费很少的费用),或者花费 8,000 美元左右购买基于服务器的解决方案 - 您将必须重新编写和微调您的线程以利用SPU 协处理器在那里,但您可以使用单个 CELL(1 个 PPC 核心 + 8 个 SPU)毫不费力地满足您的计算需求

注意:使用 PlayStation 3,您只有 6 个可用协处理器 - 但您这个项目似乎没有预算 -
因此,您至少可以尝试 IBM 的单元开发工具包,它提供了一个模拟器,看看是否可以编写您的解决方案以在其上运行。

有商用的 CELL 产品,既可以作为刀片工厂中的独立服务器,也可以作为 PC 工作站的 PCI Express 附加板。
水星计算机系统:
http://www.mc.com/microsites/cell/products。 aspx?id=6986

Mercury 并未在网站上列出任何价格,但这些 PCI Express 卡的定价约为之前提到的 8000.00 美元。

Playstation 3 视频游戏的售价约为 300.00 美元,它允许您对应用程序进行原型设计,并检查它是否达到所需的性能。 (我自己也有一个,并在其上运行 Fedora 9,尽管我是作为业余爱好者这样做的,而且到目前为止还没有使用它进行任何计算 - 我还组装了一个 Playstation-3 12 机器集群,用于分子模拟他们运行的应用程序并没有利用多媒体 SPU,但即便如此,主频为 3.5GHz 的它们的性能也比标准价格的 PC 好,即使考虑到 PS3 的价格是 5 倍。这里附近更高)

I see you'd prefer intel, but if you need one chip, I will again suggest the cell processor -
its theoretical peak performance is arount 25GFlops - kernel 2.6.25 had support for it already.

You could try a pre-slim playstation 3 for experimenting with (that would cost you little) or get yourself a server-based solution at around US$8K - you will have to re-write and fine tune your threads to take advabtage of the SPU co-processors there, but you could achieve your computational needs without breaking a sweat with a single CELL (1 PPC core + 8 SPU's)

NB.: with a playstation 3, you'd have only 6 available co-processors - but you don't seen to be on a budget with this project -
So you could at least try IBM's cell developer kit, which offers an emulator, to see if you can code your solution to run on it.

Thre are commercially available CELL products, both as stand-alone servers in blade form factory, and PCI Express add-on boards for PC workstations from
Mercury Computer Systems:
http://www.mc.com/microsites/cell/products.aspx?id=6986

Mercury does not list any prices on the site, but the pricing seens to be around the previoulsy mentioned U$8000.00 for these PCI Express cards.

A playstation 3 videogame can be purchased for about U$300.00 - and would allow you to prototype your application, and check if it is up to the needed performance. (I myself got one and have Fedora 9 running on it, although I did that as a hobbyst and have not, so far, used it for any calculations - I had also put together a Playstation-3 12 machinne cluster for Molecular simulations at the local University. The application they run did not take advantage of the multimedia SPU's, while I was in touch with then. But even so, clocked at 3.5GHz they performed better than standard ,s imlarly priced, PC's, even considering PS3's are priced 5x higher around here)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文