使用 Cuda 并行读取多个文本文件
我想使用 CUDA 在多个文件中并行搜索给定字符串。我计划使用 pfac 库来搜索给定的字符串。问题是如何并行访问多个文件。
示例:我们有一个包含 1000 个文件的文件夹,需要搜索。
这里的问题是我应该如何访问给定文件夹中的多个文件。应该动态获取文件夹中的文件,并且应该为每个线程分配一个文件来搜索给定的字符串。
是否可以????
编辑:
在这篇文章中:非常快的文本文件处理(C++)。He正在使用 boost 库在 16 秒内读取 3 GB 文本文件。而就我而言,我必须读取 1000 个较小的文件,
谢谢
I would like to search for a given string in multiple files in parallel using CUDA. I have planned to use pfac library to search for the given string. The problem with this is how to access multiple files in parallel.
Example: We have a folder containing 1000s of files which has to be searched.
The problem here is how should i access multiple files in the given folder.The files in the folder should be dynamically obtained and each thread should be assigned a file to search the given string.
Is it possible????
Edit:
In this post: very fast text file processing (C++) .He is using the boost library to read a 3 GB text file in 16 seconds.While in my case I have to read 1000s of smaller files
Thank you
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
在 CUDA 中完成任务并不会比在 CPU 中完成同样的任务有多大帮助。
假设您的文件存储在标准的磁性 HDD 上,典型的单线程 CPU 程序将消耗:
对于单个文件来说,即 15.1 毫秒。如果您有 1000 个文件,则需要 15.1 秒才能完成该工作。
现在,如果我给你一个超级强大的 GPU,具有无限的内存带宽、无延迟和无限的处理器速度,你将能够立即执行任务 (3)。然而,HDD 读取仍然会消耗完全相同的时间。 GPU 无法并行另一个独立设备的工作。
因此,您现在只需 15.0 秒即可完成,而不是花费 15.1 秒。
无限 GPU 将为您带来 0.6% 的加速。真正的 GPU 根本无法接近这个数字!
在更一般的情况下:如果您考虑使用 CUDA,请问自己:实际计算是问题的瓶颈吗?
如果您处理数千个小文件并且需要经常执行读取,请考虑可以“攻击”瓶颈的技术。其中可能包括:
可能有更多选择,我不是该领域的专家。
Doing your task in CUDA will not help much over doing the same thing in CPU.
Assuming that your files are stored on a standard, magnetic HDD, the typical single-threaded CPU program would consume:
That is 15.1ms for a single file. If you have 1000 files, it will take 15.1s to do the work.
Now, if I give you super-powerful GPU with infinite memory bandwith, no latency, and infinite processor speed, you will be able to perform the task (3) with no time. However, HDD reads will still consume exactly the same time. GPU cannot parallelise the work of another, independent device.
As a result, instead of spending 15.1s, you will now do it in 15.0s.
The infinite GPU would give you a 0.6% speedup. A real GPU would be not even close to that!
In more general case: If you consider using CUDA, ask yourself: is the actual computation the bottleneck of the problem?
If you deal with thousants of tiny files and you need to perform reads often, consider techniques that can "attack" your bottleneck. Some may include:
there may be more options, I am not an expert in that area.
是的,如果您可以减少读取延迟/带宽的影响,那么使用 CUDA 可能会获得加速。一种方法是同时执行多个搜索。即,如果您可以在大干草堆中搜索 [needle1], .. [needle1000],那么每个线程都可以搜索干草堆碎片并存储命中。需要对每次比较所需的吞吐量进行一些分析,以确定是否可以通过使用 CUDA 来改进您的搜索。这可能有用 http://dl.acm.org/itation.cfm?id= 1855600
Yes, it's probably possible to get a speed-up using CUDA if you can reduce the impact of read latency/bandwidth. One way would be by performing multiple searches concurrently. I.e. If you can search for [needle1], .. [needle1000] in your large haystack then each thread could search haystack-pieces and store the hits. Some analysis of the throughput required per-comparisons is required to determine whether your search is likely to be improved by employing CUDA. This may be useful http://dl.acm.org/citation.cfm?id=1855600