Raspberry Pi 集群、神经元网络和大脑模拟
由于RBPI(Raspberry Pi)具有非常低的功耗和非常低的生产价格,这意味着人们可以用它们构建一个非常大的集群。我不确定,但 100000 个 RBPI 的集群只需要很少的电力和空间。
现在我认为它在 FLOPS 或其他类型的计算测量方面可能不如现有的超级计算机那么强大,但它可以实现更好的神经网络模拟吗?
我不确定“1 个 CPU = 1 个神经元”是否是一个合理的说法,但它似乎足够有效。
那么这是否意味着这样的集群对于神经网络模拟来说会更有效,因为它比其他经典集群更加并行?
Since the RBPI (Raspberry Pi) has very low power consumption and very low production price, it means one could build a very big cluster with those. I'm not sure, but a cluster of 100000 RBPI would take little power and little room.
Now I think it might not be as powerful as existing supercomputers in terms of FLOPS or others sorts of computing measurements, but could it allow better neuronal network simulation ?
I'm not sure if saying "1 CPU = 1 neuron" is a reasonable statement, but it seems valid enough.
So does it mean such a cluster would more efficient for neuronal network simulation, since it's far more parallel than other classical clusters ?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(8)
使用 Raspberry Pi 本身并不能解决构建大规模并行超级计算机的整个问题:如何将所有计算核心有效地连接在一起是一个非常大的问题,这就是为什么超级计算机是专门设计的,而不仅仅是由商品部件制成的。也就是说,研究单位确实开始将 ARM 内核视为一种高效的方式,以利用计算能力来解决这个问题:例如,这个项目旨在用一百万个 ARM 内核来模拟人脑。
http://www.zdnet.co.uk/news/emerging-tech/2011/07/08/million-core-arm-machine-aims-to-simulate-brain-40093356/《百万核ARM机器旨在模拟大脑》
http://www.eetimes.com/ electronics-news/4217840 /Million-ARM-cores-brain-simulator“一百万个 ARM 内核来托管大脑模拟器”
这是非常专业的定制硬件,但从概念上讲,它离你建议的树莓派网络不远。不要忘记,ARM 内核具有 JohnB 提到的 Xeon 所具有的所有功能(高级 SIMD 而不是 SSE,可以进行 64 位计算、重叠指令等),但每瓦 MIPS 却截然不同-spot:对于包含哪些功能,您有不同的选择(如果您不需要浮点,只需购买不带浮点的芯片),所以我可以理解为什么它是一个有吸引力的选择,特别是当您考虑到这种能力时使用是超级计算机最大的持续成本。
Using Raspberry Pi itself doesn't solve the whole problem of building a massively parallel supercomputer: how to connect all your compute cores together efficiently is a really big problem, which is why supercomputers are specially designed, not just made of commodity parts. That said, research units are really beginning to look at ARM cores as a power-efficient way to bring compute power to bear on exactly this problem: for example, this project that aims to simulate the human brain with a million ARM cores.
http://www.zdnet.co.uk/news/emerging-tech/2011/07/08/million-core-arm-machine-aims-to-simulate-brain-40093356/ "Million-core ARM machine aims to simulate brain"
http://www.eetimes.com/electronics-news/4217840/Million-ARM-cores-brain-simulator "A million ARM cores to host brain simulator"
It's very specialist, bespoke hardware, but conceptually, it's not far from the network of Raspberry Pis you suggest. Don't forget that ARM cores have all the features that JohnB mentioned the Xeon has (Advanced SIMD instead of SSE, can do 64-bit calculations, overlap instructions, etc.), but sit at a very different MIPS-per-Watt sweet-spot: and you have different options for what features are included (if you don't want floating-point, just buy a chip without floating-point), so I can see why it's an appealing option, especially when you consider that power use is the biggest ongoing cost for a supercomputer.
对我来说似乎不太可能是一个好的/便宜的系统。
考虑现代 xeon cpu。它有 8 个内核,以 5 倍的时钟速度运行,因此在此基础上可以完成 40 倍的工作。另外,它还有 SSE,它似乎适合这个应用程序,并且可以让它并行计算 4 件事。所以我们的工作量可能是原来的 160 倍。然后它具有多线程,可以进行 64 位计算、重叠指令等。我猜这种工作至少会快 200 倍。
最后,至少 200 个本地“神经元”的结果将保存在本地内存中,但在树莓派网络上,您必须在其中 200 个神经元之间进行通信……这会慢得多。
我认为树莓派很棒,并且肯定计划至少购买一个 :P 但你不会建立一个廉价且快速的网络来与“真正的”计算机网络竞争 :P
无论如何,最快的硬件对于这种情况,很可能是显卡 GPU,因为它被设计为并行运行一个小程序的多个副本。或者只是用几百个“硬件”神经元副本来编写一个 FPGA。
Seems unlikely to be a good/cheap system to me.
Consider a modern xeon cpu. It has 8 cores running at 5 times the clock speed, so just on that basis can do 40 times as much work. Plus it has SSE which seems suited for this application and will let it calculate 4 things in parallel. So we're up to maybe 160 times as much work. Then it has multithreading, can do 64 bit calculations, overlap instructions etc. I would guess it would be at least 200 times faster for this kind of work.
Then finally, the results of at least 200 local "neurons" would be in local memory but on the raspberry pi network you'd have to communicate between 200 of them... Which would be very much slower.
I think the raspberry pi is great and certainly plan to get at least one :P But you're not going to build a cheap and fast network of them that will compete with a network of "real" computers :P
Anyway, the fastest hardware for this kind of thing is likely to be a graphics card GPU as it's designed to run many copies of a small program in parallel. Or just program an fpga with a few hundred copies of a "hardware" neuron.
GPU 和 FPU 的这种思考能力比 CPU 好得多,支持 CDUA 编程的 Nvidia GPU 实际上拥有 100 个独立的处理单元。或者至少它可以利用像素管线的演变(卡可以并行渲染多个像素)来大幅提高速度。 CPU 允许多个核心执行相对复杂的步骤。 GPU 允许 100 个线程执行简单的步骤。
因此,对于具有简单线程的任务,诸如单个 GPU 之类的任务将胜过强大的 CPU 集群。 (或一堆Raspberry pi)
但是,为了创建一个运行“condor”之类的集群,它可以用于疾病爆发建模之类的事情,在这种情况下,您可以使用不同的起点运行相同的数学模型数百万次。 (爆发的大小、风向、疾病的传染性等)所以像 Pi 这样的东西是理想的。因为您通常正在寻找可以运行标准代码的完整 CPU。http://research.cs.wisc.edu/condor/
这种方法的一些众所周知的用法是“Seti”或“在家折叠”(搜索外星人和癌症研究)
很多大学都有这样的集群,所以我可以看到他们中的一些人尝试使用多种 Raspberry Pi 的方法,
但是为了模拟大脑中的神经元,您需要节点之间的非常低的延迟,它们是特殊的操作系统以及使乘法系统合一的应用程序。您还需要特殊的网络将其链接在一起,以提供节点之间的延迟 <; 1 毫秒。
http://en.wikipedia.org/wiki/InfiniBand
Raspberry 无法管理此操作反正。
所以是的,我认为人们会用它们来制作集群,并且我认为他们会非常高兴。但我认为更多的是大学和小型组织。他们不会与顶级超级计算机竞争。
据说我们将获得一些并针对集群中当前的节点进行测试,看看它们与具有双核 3.2ghz CPU 且售价 650 英镑的桌面相比如何!我估计我们可以得到 25 个 Raspberry,它们使用的电量要少得多,所以比较起来会很有趣。这将用于疾病爆发建模。
GPU and FPU do this kind of think much better then a CPU, the Nvidia GPU's that support CDUA programming has in effect 100's of separate processing units. Or at least it can use the evolution of the pixel pipe lines (where the card could render mutiple pixels in parallel) to produce huge incresses in speed. CPU allows a few cores that can carry out reletivly complex steps. GPU allows 100's of threads that can carry out simply steps.
So for tasks where you haves simple threads things like a single GPU will out preform a cluster of beefy CPU. (or a stack of Raspberry pi's)
However for creating a cluster running some thing like "condor" Which cand be used for things like Disease out break modelling, where you are running the same mathematical model millions of times with varible starting points. (size of out break, wind direction, how infectious the disease is etc.. ) so thing like the Pi would be ideal. as you are general looking for a full blown CPU that can run standard code.http://research.cs.wisc.edu/condor/
some well known usage of this aproach are "Seti" or "folding at home" (search for aliens and cancer research)
A lot of universities have a cluster such as this so I can see some of them trying the approach of mutipl Raspberry Pi's
But for simulating nurons in a brain you require very low latency between the nodes they are special OS's and applications that make mutiply systems act as one. You also need special networks to link it togather to give latency between nodes in terms of < 1 millsecond.
http://en.wikipedia.org/wiki/InfiniBand
the Raspberry just will not manage this in any way.
So yes I think people will make clusters out of them and I think they will be very pleased. But I think more university's and small organisations. They arn't going to compete with the top supercomputers.
Saying that we are going to get a few and test them against our current nodes in our cluster to see how they compare with a desktop that has a duel core 3.2ghz CPU and cost £650! I reckon for that we could get 25 Raspberries and they will use much less power, so will be interesting to compare. This will be for disease outbreak modelling.
我正在混沌时间序列预测(使用回声状态网络)领域进行大量的神经网络研究。
尽管我发现以这种方式使用树莓派 与强大的 CPU 或 GPU 相比几乎没有任何好处,但我一直在使用树莓派 PI 来管理模拟作业到多台机器的分配。大内核的处理能力优势将在树莓派 PI 上实现这一点,不仅如此,在此配置中运行多个 PI 将产生等待它们同步、数据传输等的大量开销。
由于 PI 的低成本和稳健性,我让它托管网络数据源,并将作业调解到代理计算机。如果模拟失败导致机器停机,它还可以硬重置并重新启动机器,从而实现最佳正常运行时间。
I am undertaking a large amount of neural network research in the area of chaotic time series prediction (with echo state networks).
Although I see using the raspberry PI's in this way will offer little to no benefit over say a strong cpu or a GPU, I have been using a raspberry PI to manage the distribution of simulation jobs to multiple machines. The processing power benefit of a large core will nail that possible on the raspberry PI, not just that but running multiple PI's in this configuration will generate large overheads of waiting for them to sync, data transfer etc.
Due to the low cost and robustness of the PI, i have it hosting the source of the network data, as well as mediating the jobs to the Agent machines. It can also hard reset and restart a machine if a simulation fails taking the machine down with it allowing for optimal uptime.
神经网络的训练成本很高,但运行成本却很低。虽然我不建议使用这些(甚至是集群)来迭代学习集无限的时期,但一旦你有了权重,你就可以将学习工作转移到它们中。
以这种方式使用,一个树莓派应该不仅仅对单个神经元有用。考虑到内存与 CPU 的比率,它的规模可能会受到内存限制。假设大约有 300 兆可用内存(这将根据操作系统/驱动程序/等而变化)并假设您正在使用 8 字节双精度权重,则您的上限将是 5000 个“神经元”(在成为存储限制之前),尽管许多其他因素可以改变这一点,这就像问:“一根绳子有多长?”
Neural networks are expensive to train, but very cheap to run. While I would not recommend using these (even clustered) to iterate over a learning set for endless epochs, once you have the weights, you can transfer the learning effort into them.
Used in this way, one raspberry pi should be useful for much more than a single neuron. Given the ratio of memory to cpu, it will likely be memory bound in its scale. Assuming about 300 megs of free memory to work with (which will vary according to OS/drivers/etc.) and assuming you are working with 8 byte double precision weights, you will have an upper limit on the order of 5000 "neurons" (before becoming storage bound), although so many other factors can change this and it is like asking: "How long is a piece of string?"
南安普顿大学的一些工程师建造了一台 Raspberry Pi 超级计算机:
Some engineers at Southampton University built a Raspberry Pi supercomputer:
我已经移植了一个尖峰网络(请参阅 http:// /www.raspberrypi.org/phpBB3/viewtopic.php?f=37&t=57385&e=0 了解详细信息)到 Raspberry Pi 及其与我 2005 年使用 SSE 和预取优化的旧 Pentium-M 笔记本电脑相比,运行速度大约慢 24 倍。
I have ported a spiking network (see http://www.raspberrypi.org/phpBB3/viewtopic.php?f=37&t=57385&e=0 for details) to the Raspberry Pi and it runs about 24 times slower than on my old Pentium-M notebook from 2005 with SSE and prefetch optimizations.
这完全取决于您想要执行的计算类型。如果您正在执行数值非常密集的算法,并且处理器缓存和 RAM 内存之间的内存移动不多,那么就需要使用 GPU 解决方案。中间立场是使用 SIMD 汇编语言指令的 Intel PC 芯片 - 您仍然很容易受到与 RAM 之间的数据传输速率的限制。以几乎相同的成本,您可以获得 50 个 ARM 板,每板有 4 个内核,每板有 2Gb RAM。即 200 个内核和 100 GB RAM。每秒可在 CPU 和 RAM 之间传输的数据量非常高。对于使用大权重向量的神经网络来说,这可能是一个不错的选择。此外,最新的 ARM GPU 和基于 nVidea ARM 的新型芯片(用于 slate 平板电脑)也具有 GPU 计算功能。
It all depends on the type of computing you want to do. If you are doing very numerically intensive algorithms with not much memory movement between the processor caches and RAM memory then a GPU solution is indicated. The middle ground is an Intel PC chip using the SIMD assembly language instructions - you can still easily end up being limited by rate you can transfer data to and from RAM. For nearly the same cost you can get 50 ARM boards with say 4 cores per board and 2Gb RAM per board. That's 200 cores and 100 Gb of RAM. The amount of data that can be shuffled between the CPUs and RAM per second is very high. It could be a good option for neural nets that use large weight vectors. Also the latest ARM GPU's and the new nVidea ARM based chip (used in the slate tablet) have GPU compute as well.