FPGA 上的并行处理。如何开始?
我有一个计算密集型任务,我使用 CUDA 来实现它,现在我想用 FPGA 让它更快(如果可能的话)。
我想要实现的系统是一系列计算,每个计算都类似于并行意义上的矩阵乘法。它之间还有一些不平行的部分。它适用于大量数据。
虽然我希望尽可能快,但我有足够的时间来学习和探索 FPGA。
我在这里寻求有关如何开始我的道路的建议?选择哪种 FPGA 以及从哪里了解它。有什么网站或在线课程或书籍吗?无论如何,我已经决定这样做,但是您关于这在 FPGA 上是否会更快的想法也会有所帮助。
I have a computational intensive task which I used CUDA to implement it and now I want to make it even faster with FPGAs (if possible)
The system I want to implement is a series of computations each similar to matrix multiplication in sense of being parallel. It also has some non-parallel parts in between. It works with big amounts of data.
Although I want it as fast as possible, I have enough time to learn and explore with FPGAs.
here I'm asking for suggestions on how I start my path? Which FPGA to choose and where to learn about it. any website or online class or books? I've decided to do this anyway but your idea of whether this will be faster on FPGA or not would be helpful too.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
与使用 GPU 相比,FPGA 的最大优势在于:
缺点是从 FPGA 获取数据和从 FPGA 获取数据。在开始之前绘制数据传输图。即使 FPGA 提供了无限的加速,如果有大量数据需要来回移动,您可能仍然会发现不值得付出努力!
您可能需要一块基于 PCI Express 的主板。在您使用 FPGA 进行任何操作之前,(我想)这是一个全新的学习曲线 - 但如果您愿意这样做,这将是一项非常有趣的任务!
在选择 FPGA 方面,多尝试一下不同厂商的软件工具——在学习阶段这比芯片本身更重要。您不会(在这个早期学习阶段)在任何各种芯片中发现令人惊叹的功能。还要考虑具有所需接口的板的可用性,以及进行高速接口(例如 PCIe)可能需要的任何 IP 核
The big wins from an FPGA over using a GPU come from:
The downside is getting data to and from the FPGA. Draw a data-transfer diagram before you start. Even if the FPGA provides infinite speedup, you might still find it's not worth the effort if there's loads of data to be shuffled to and fro!
It's likely you'll be wanting a PCI express based board. Which is (I imagine) a whole new learning-curve before you get to do anything with the FPGA - but if you're up for it, it'll be a very interesting task!
In terms of choosing FPGAs, have a play with the software tools from the various vendors - at the learning stage that's much more important than the chips themselves. You won't find (at this early learning-stage) a show-stopper feature in any of the various chips. Also take into account the availability of boards with your required interfaces on, and any IP-core you might need to do the high-speed interfacing (eg PCIe)
使用 FPGA 可以显着提高解决大多数并行问题的速度。
然而,除了在 FPGA 上实现计算之外,还涉及从 CPU/主存储器来回获取数据的大量工作。这将需要在 FPGA 逻辑中实现(例如)PCI Express 端点(总线主控以实现最大速度)以及软件方面的自定义驱动程序。大多数操作系统将要求这些驱动程序在内核模式下开发。
而且您也不能只使用最直接的方法进行 FPGA 编程。您将需要担心管道和时钟同步,以便最大限度地提高吞吐量。
换句话说,即使对于拥有多年 FPGA 经验的工程师来说,这也是一项相当困难的任务。我强烈建议你找人一起处理这件事。根据您的项目的专有程度,您可能会发现熟练的学者愿意与您合作,只要您向他们提供所有材料和出版权。
如果您决定单干,则需要一些硬件。许多不同的公司提供连接作为加速器的 FPGA,例如 http://www.nallatech.com /pci-express-cards.html
根据您选择的是 Xilinx 还是 Altera FPGA,您会发现大量用于使 PCI Express 工作的示例代码和教程。
You can get a substantial speedup on most parallel problems with an FPGA.
However, in addition to implementing your computation on the FPGA, there's a lot of work involved in getting the data back and forth from the CPU/main memory. This will require implementation of (for example) a PCI Express endpoint in the FPGA logic (bus mastering for maximum speed) and custom drivers on the software side. Most operating systems will require those drivers to be developed in kernel mode.
And you can't just use the most straightforward approach for FPGA programming either. You're going to need to worry about pipelining and clock synchronization in order to maximize throughput.
In other words, it's a substantially difficult task even for engineers with years of FPGA experience. I strongly suggest you find someone to work with on this. Depending on how proprietary your project is, you might find skilled academics willing to work with you as long as you provide them with all materials and publication rights.
If you're determined to go it alone, you'll need some hardware. Many different companies offer FPGA wired up as accelerators, for example http://www.nallatech.com/pci-express-cards.html
Depending on whether you choose a Xilinx or Altera FPGA, you'll find considerable sample code and tutorials for getting PCI express working.