如何知道硬件/软件协同设计是否对特定应用有用?
下学期我将进入最后一年(电气和计算机工程),我正在寻找嵌入式系统或硬件设计的毕业项目。我的教授建议我寻找当前的系统并尝试使用硬件/软件协同设计来改进它,他给了我一个“自动车牌识别系统”的例子,我可以通过 VHDL 或 verilog 使用专用硬件来使系统执行更好的 。
我搜索了一下,发现一些 YouTube 视频显示系统工作正常。
所以不知道还有没有改进的空间。如何知道某些算法或系统是否缓慢并且可以从协同设计中受益?
I will be in my final year (Electrical and Computer Engineering )the next semester and I am searching for a graduation project in embedded systems or hardware design . My professor advised me to search for a current system and try to improve it using hardware/software codesign and he gave me an example of the "Automated License Plate Recognition system" where I can use dedicated hardware by VHDL or verilog to make the system perform better .
I have searched a bit and found some youtube videos that are showing the system working ok .
So I don't know if there is any room of improvement . How to know if certain algorithms or systems are slow and can benefit from codesign ?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
在许多情况下,这是一个架构问题,只能通过大量的经验甚至大量的系统建模和分析来回答。在其他情况下,信封背面的 5 分钟可能会向您展示专门的协处理器增加了数周的工作量,但性能没有提高。
硬壳的一个例子是任何现代移动电话处理器。看一下 TI OMAP5430。请注意,它至少有 10 个不同类型的处理器(仅 PowerVR 块就有多个执行单元)和数十个全定制外设。每当您希望从“主”CPU 卸载某些内容时,都必须考虑潜在的总线带宽/硅面积/上市时间成本。
一个简单的例子就像你的教授提到的那样。 DSP/GPU/FPGA 将执行图像处理任务,例如 2D 卷积,比 CPU 快几个数量级。但诸如文件管理之类的“内务”任务并不是 FPGA 能够解决的任务。
就你而言,我不认为你的教授希望你做一些“真正的”事情。我认为他正在寻找的是你对 CPU/GPU/DSP 擅长什么以及定制硬件擅长什么的了解。您可能希望寻找一个有趣的利基问题,例如生物信息学中的问题。
In many cases, this is an architectural question that is only answered with large amounts of experience or even larger amounts of system modeling and analysis. In other cases, 5 minutes on the back of an envelop could show you a specialized co-processor adds weeks of work but no performance improvement.
An example of a hard case is any modern mobile phone processor. Take a look at the TI OMAP5430. Notice it has a least 10 processors, of varying types(the PowerVR block alone has multiple execution units) and dozens of full-custom peripherals. Anytime you wish to offload something from the 'main' CPUs, there is a potential bus bandwidth/silicon area/time-to-market cost that has to be considered.
An easy case would be something like what your professor mentioned. A DSP/GPU/FPGA will perform image processing tasks, like 2D convolution, orders of magnitude faster than a CPU. But 'housekeeping' tasks like file-management are not something one would tackle with an FPGA.
In your case, I don't think that your professor expects you to do something 'real'. I think what he's looking for is your understanding of what CPUs/GPUs/DSPs are good at, and what custom hardware is good at. You may wish to look for an interesting niche problem, such as those in bioinformatics.
我不知道什么是codesign,但我之前做过一些verilog;我认为简单的图像(或信号)处理任务是此类嵌入式系统的良好候选者,因为很多时候它们涉及大量数据的实时处理(最好是 SIMD 操作)。
图像处理任务通常看起来很简单,因为我们的大脑为我们进行了令人难以置信的复杂处理,但实际上它们非常具有挑战性。我认为这个挑战才是重要的,而不是以前实施过这样的系统。我会继续实现霍夫变换(首先是直线和圆,而不是广义变换——它被认为是图像处理中的一种缓慢算法)并进行一些实时分割。我相信随着它的发展,这将是一项具有挑战性的任务。
I don't know what codesign is, but I did some verilog before; I think simple image (or signal) processing tasks are good candidates for such embedded systems, because many times they involve real time processing of massive loads of data (preferably SIMD operations).
Image processing tasks often look easy, because our brain does mind-bogglingly complex processing for us, but actually they are very challenging. I think this challenge is what's important, not if such a system were implemented before. I would go with implementing Hough transform (first for lines and circles, than the generalized one - it's considered a slow algorithm in image processing) and do some realtime segmentation. I'm sure it will be a challenging task as it evolves.
分区时要做的第一件事是查看数据流。绘制每个“子算法”适合位置的框图,以及数据的输入和输出。每当您必须将大量数据从一个域移动到另一个域时,就开始考虑将部分问题移至拆分的另一侧。
例如,考虑一个图像处理管道,它执行边缘检测,然后与阈值进行比较,然后进行更多处理。边缘检测的输出将是(比如说)16 位有符号值,每个像素一个。最终输出是一个二值图像(一个位集指示“重要”边缘的位置)。
一种(显然很幼稚,但很有意义)实现可能是在硬件中进行边缘检测,将边缘图像发送到软件,然后对其进行阈值处理。这涉及“跨越鸿沟”传送 16 位值的整个图像。
更好的是,在硬件上也做门槛。然后您可以移动 8 个“1 位像素”/字节。 (或者甚至对它进行游程编码)。
一旦有了合理的带宽分区,您就必须确定适合每个域的块是否适合该域,或者可能考虑不同的分区。
First thing to do when partitioning is to look at the dataflows. Draw a block diagram of where each of the "subalgorithms" fits, along with the data going in and out. Anytime you have to move large amounts of data from one domain to another, start looking to move part of the problem to the other side of the split.
For example, consider an image processing pipeline which does an edge-detect followed by a compare with threshold, then some more processing. The output of the edge-detect will be (say) 16-bit signed values, one for each pixel. The final output is a binary image (a bit set indicates where the "significant" edges are).
One (obviously naive, but it makes the point) implementation might be to do the edge detect in hardware, ship the edge image to software and then threshold it. That involves shipping a whole image of 16-bit values "across the divide".
Better, do the threshold in hardware also. Then you can shift 8 "1-bit-pixels"/byte. (Or even run length encode it).
Once you have a sensible bandwidth partition, you have to find out if the blocks that fit in each domain are a good fit for that domain, or maybe consider a different partition.
我想补充一点,一般来说,硬件/软件协同设计在降低成本时很有用。
嵌入式系统有2个主要成本因素:
生产量越高,生产成本就越重要,开发成本就变得不那么重要。
如今,开发硬件比开发软件更难。这意味着今天协同设计解决方案的开发成本将会更高。这意味着它主要用于大批量生产。然而,如今您需要 FPGA(或类似的)来进行协同设计,而且它们的成本很高。
这意味着,当所需 FPGA 的成本低于针对您的问题类型(CPU、GPU、DSP 等)的现有解决方案(假设两种解决方案均满足您的其他要求)时,协同设计非常有用。对于高性能系统来说(大部分)都是这种情况,因为当今 FPGA 的成本很高。
所以,基本上,如果您的系统要大批量生产并且是高性能设备,您将需要对系统进行协同设计。
这有点简单化,并且可能在十年左右的时间内变得错误。目前正在对高级规格的硬件/软件综合进行研究 + FPGA 价格正在下降。这意味着在十年左右的时间内,协同设计可能会对大多数嵌入式系统变得有用。
I would add that in general, HW/SW codesign is useful when it reduces cost.
There are 2 major cost factors in embedded systems:
The higher is your production volume, the more important is the production cost, and development cost becomes less important.
Today it is harder to develop hardware than software. That means that development cost of codesign-solution will be higher today. That means that it is useful mostly for high-volume production. However, you need FPGAs (or similar) to do codesign today, and they cost a lot.
That means that codesign is useful when cost of necessary FPGA will be lower than an existing solution for your type of problem (CPU, GPU, DSP, etc), assuming both solutions meet your other requirements. And that will be the case (mostly) for high-performance systems, because FPGAs are costly today.
So, basically you will want to codesign your system if it will be produced in high volumes and it is a high-performance device.
This is a bit simplified and might become false in a decade or so. There is an ongoing research on HW/SW synthesis from high-level specifications + FPGA prices are falling. That means that in a decade or so codesign might become useful for most of embedded systems.
对于你最终要做的任何项目,我的建议是制作算法的软件版本和硬件版本来进行性能比较。您还可以对开发时间等进行比较。如果您选择发布任何内容,这将使您的项目更加科学并且对其他人有帮助。盲目地认为硬件比软件快并不是一个好主意,因此分析很重要。
Any project you end up doing, my suggestion would be to make a software version and a hardware version of the algorithm to do performance comparison. You can also do a comparison on development time etc. This will make your project a lot more scientific and helpful for everyone else, should you choose to publish anything. Blindly thinking hardware is faster than software is not a good idea, so profiling is important.