与 FPGA 设备的链接丢失
我正在尝试调试 PCIe FPGA 设备的设备驱动程序中的一些奇怪问题。设备驱动程序和 FPGA 映像都是在内部开发的。
目标系统是x86,操作系统是fedora 9。它有一个PCIe卡,FPGA插在它唯一的PCIe插槽中。 FPGA 映像在启动后从 EEPROM 加载。
驱动程序的编写方式是使用 /sys/bus/pci/devices/0000:02:00.0/ 资源文件(其中 0000:02:00.0 是包含 FPGA 的卡的 PCI 插槽)来配置FPGA。
当系统启动时(或从休眠状态返回时),FPGA 链路会丢失,资源文件也会丢失。当 FPGA 正常启动时,一切正常(资源文件就在那里)。 当系统进入休眠状态时,FPGA 断电。当从休眠状态返回时,FPGA 上电,然后开始驱动程序初始化。
我怀疑接下来的事情:
- 固件中的错误 - 与 PCI 插件相关的东西?
- 内核中的错误 - 可能性最小,因为其他 PCI 卡可以正常识别。仅
这个 PCI 卡出现问题
问题是:
- 有人遇到过类似的问题吗?
- 还有什么可能是错误的?
- 关于如何调试这个问题有什么建议吗?
编辑
我刚刚发现这个错误,这是非常严重的与我看到的问题类似。
I am trying to debug somewhat strange problem in the device driver for the PCIe FPGA device. Both the device driver and the FPGA image are developed in the house.
The target system is x86, and the OS is fedora 9. It has a PCIe card with the FPGA plugged in it's only PCIe slot. The FPGA image is loaded after the boot from the EEPROM.
The driver is written in such a way that it uses the /sys/bus/pci/devices/0000:02:00.0/ resource files (where 0000:02:00.0 is the PCI slot of the card containing the FPGA) to configure the FPGA.
When the system boots (or when it returns from the hibernation), the FPGA link seams to be lost, and the resource files are missing. When the FPGA boots properly, everything works fine (the resource files are there).
When the system enters the hibernation, the FPGA is powered off. When it returns from the hibernation, the FPGA is powered on, before starting the driver initialization.
I am suspecting at next things :
- a bug in firmware - something related to PCI plug in?
- a bug in kernel - least likely, because other PCI cards are recognized fine. Only
this PCI card makes problems
And the questions are :
- Has anyone had similar problems?
- What else could be wrong?
- Any suggestions on how to debug this issue?
EDIT
I just found this bug, which is very similar to the problem I am seeing.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我终于成功调试了我的问题。在进入休眠之前,所有仍在使用资源文件的进程都将被终止。由于某种未知原因,一个进程没有释放资源,并被杀死。我们有一个看门狗,它会重新生成所有未运行的进程。
当从休眠状态回来时,该进程重新生成,并且由于无法打开资源文件,它再次死亡,然后声明严重错误。经过一段很短的时间,操作系统添加了资源文件,并且该过程可以正常继续。
I finally managed to debug my problem. Just before entering the hibernation, all processes which are still using the resource files are being killed. For some unknown reason, one process didn't release resources, and was killed. We have a watchdog, which respawns all processes which are not running.
When coming back from the hibernation, this process respawned, and since it couldn't open the resource files, it died again, and then a critical error was declared. After some very small time, the resources files were added by the OS, and this process could continue normally.
PCIe 卡必须在一定时间内回复“有人在吗”消息。您的卡在休眠/重置后是否可能响应不够快?
如果没有更多的设计细节,除了猜测之外什么也做不了。
您能否列出系统工作和不工作之间的区别,即您采取哪些不同措施来使卡工作?
A PCIe card has to reply to a "Is anybody there" message within a certain time. Is is possible that your card is not responding quickly enough after hibernation / reset?
Without more details of your design, it is hard to do anything but guess.
Can you list the differences between the system working and not working, i.e. what do you do differently to get the card to work?