C# 并行任务在 OCR 应用程序中的使用?
我正在构建一个 Windows 服务应用程序,它将包含扫描图像的目录作为输入。我的应用程序将迭代所有图像,并且对于每个图像,它将执行一些 OCR 操作,以获取条形码、发票编号和客户编号。
一些背景信息:
- 该应用程序执行的任务非常消耗 CPU 资源
- 有大量图像需要处理,扫描的图像文件很大(~2MB)
- 该应用程序在 8 核服务器上运行16GB 内存。
我的问题:
由于它正在处理文件系统上的图像,我不确定如果我以使用 .NET 并行任务的方式更改我的应用程序是否真的会产生影响。
有人可以给我这方面的建议吗?
非常感谢!
I'm building a Windows Service application that takes as input a directory containing scanned images. My application will iterates through all images and for every image, it will perform some OCR operations in order to grab the barcode, invoice number and customer number.
Some background info:
- The tasks performed by the application are pretty CPU intensive
- There are large number of images to procss and the scanned image file are large (~2MB)
- The application runs on a 8-core server with 16GB of RAM.
My question:
Since it's doing stuff with images on the file system I'm unsure if it will really make a difference if I change my application in a way that it will use .NET Parallel Tasks.
Can anybody give me advice about this?
Many thanks!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
如果处理一个图像比从磁盘读取 N 个图像花费的时间更长,那么同时处理多个图像是一个胜利。如图所示,您可以在 100 毫秒(包括寻道时间)内从磁盘读取 2 MB 文件。图一秒读取8张图像到内存中。
因此,如果您的图像处理每个图像的时间超过一秒,I/O 就不是问题。同时进行。如果需要,您可以按比例缩小(即,如果处理需要 1/2 秒,那么您可能最好只使用 4 个并发图像)。
您应该能够相当快地测试这一点:编写一个程序,随机从磁盘读取图像,并计算打开、读取和关闭文件的平均时间。还编写一个程序来处理图像样本并计算平均处理时间。这些数字应该告诉您并发处理是否有帮助。
If processing an image takes longer than reading N images from the disk, then processing multiple images concurrently is a win. Figure you can read a 2 MB file from disk in under 100 ms (including seek time). Figure one second to read 8 images into memory.
So if your image processing takes more than a second per image, I/O isn't a problem. Do it concurrently. You can scale that down if you need to (i.e. if processing takes 1/2 second, then you're probably best off with only 4 concurrent images).
You should be able to test this fairly quickly: write a program that randomly reads images off the disk, and calculate the average time to open, read, and close the file. Also write a program that processes a sample of the images and compute the average processing time. Those numbers should tell you whether or not concurrent processing will be helpful.
我认为答案是“视情况而定”。
我会尝试使用某种类型的性能监控(甚至是任务管理器中的性能监控)来运行应用程序,并查看 CPU 的运行情况。
如果 CPU 已达到极限;并行运行会提高性能。如果不是,磁盘就是瓶颈,如果没有其他一些更改,您可能不会获得太多(如果有的话)收益。
I think the answer is, 'It Depends'.
I'd try running the application with some type of Performance Monitoring (even the one in Task Manager) and see how high the CPU gets.
If the CPU is maxing out; it would improve performance to run it in paralell. If not, the disk is the bottleneck and without some other changes, you probably wouldn't get much (if any) gain.