MODI 内存泄漏
我有一个应用程序,我使用 MODI 2007 来 OCR 几个多页 tiff 文件。我发现,当我在包含几个好的 tiff 和一些无法在 Windows 图片和传真查看器中打开的 tiff 的目录上启动它时,MODI 也无法 OCR 那些“坏”tiff。发生这种情况时,应用程序无法回收 MODI 用于 OCR 这些 tiff 的任何内存。当该工具尝试 OCR 过多的此类“不良”争吵后,机器内存不足并且应用程序崩溃。我已经尝试了一些来自网络的代码修复,据说可以修复任何 MODI 内存泄漏,但到目前为止没有一个对我有用。我粘贴下面执行 OCR 的代码部分:
StringBuilder strRecText = new StringBuilder(10000);
MODI.Document doc1 = new MODI.Document();
doc1.Create(name);
try
{
doc1.OCR(MODI.MiLANGUAGES.miLANG_ENGLISH, true, true); // this will ocr all pages of a multi-page tiff file
}
catch (Exception e)
{
doc1.Close(false); // clean up
if (doc1 != null)
{
GC.Collect();
GC.WaitForPendingFinalizers();
GC.Collect();
GC.WaitForPendingFinalizers();
System.Runtime.InteropServices.Marshal.FinalReleaseComObject(doc1);
doc1 = null;
}
}
MODI.Images images = doc1.Images;
for (int imageCounter = 0; imageCounter < images.Count; imageCounter++)
{
if (imageCounter > 0)
{
if (!noPageBreakFlag)
{
strRecText.Append((char)pageBreakChar);
}
}
MODI.Image image = (MODI.Image)images[imageCounter];
MODI.Layout layout = image.Layout;
strRecText.Append(layout.Text);
GC.Collect();
GC.WaitForPendingFinalizers();
GC.Collect();
GC.WaitForPendingFinalizers();
if (layout != null)
{
System.Runtime.InteropServices.Marshal.FinalReleaseComObject(layout);
layout = null;
}
if (image != null)
{
System.Runtime.InteropServices.Marshal.FinalReleaseComObject(image);
image = null;
}
}
File.AppendAllText(ocrFile, strRecText.ToString()); // write the OCR file out to disk
GC.Collect();
GC.WaitForPendingFinalizers();
GC.Collect();
GC.WaitForPendingFinalizers();
if (images != null)
{
System.Runtime.InteropServices.Marshal.FinalReleaseComObject(images);
images = null;
}
GC.Collect();
GC.WaitForPendingFinalizers();
GC.Collect();
GC.WaitForPendingFinalizers();
doc1.Close(false); // clean up
if (doc1 != null)
{
System.Runtime.InteropServices.Marshal.FinalReleaseComObject(doc1);
doc1 = null;
}
GC.Collect();
GC.WaitForPendingFinalizers();
GC.Collect();
GC.WaitForPendingFinalizers();
I have an app where I'm using MODI 2007 to OCR several multi-page tiff files. I have found that when I kick it off on a directory that contains several good tiffs but also some tiffs that cannot be opened in Windows Picture and Fax Viewer, then MODI also fails to OCR those "bad" tiffs. When this happens, the app is unable to reclaim any of the memory that was used by MODI to OCR those tiffs. After the tool tries to OCR too many of these "bad" tiffs, the machine runs out of memory and the app crashes. I have tried several code fixes from the web that supposedly fix any MODI memory leaks, but so far none have worked for me. I am pasting in the part of the code below that does the OCRing:
StringBuilder strRecText = new StringBuilder(10000);
MODI.Document doc1 = new MODI.Document();
doc1.Create(name);
try
{
doc1.OCR(MODI.MiLANGUAGES.miLANG_ENGLISH, true, true); // this will ocr all pages of a multi-page tiff file
}
catch (Exception e)
{
doc1.Close(false); // clean up
if (doc1 != null)
{
GC.Collect();
GC.WaitForPendingFinalizers();
GC.Collect();
GC.WaitForPendingFinalizers();
System.Runtime.InteropServices.Marshal.FinalReleaseComObject(doc1);
doc1 = null;
}
}
MODI.Images images = doc1.Images;
for (int imageCounter = 0; imageCounter < images.Count; imageCounter++)
{
if (imageCounter > 0)
{
if (!noPageBreakFlag)
{
strRecText.Append((char)pageBreakChar);
}
}
MODI.Image image = (MODI.Image)images[imageCounter];
MODI.Layout layout = image.Layout;
strRecText.Append(layout.Text);
GC.Collect();
GC.WaitForPendingFinalizers();
GC.Collect();
GC.WaitForPendingFinalizers();
if (layout != null)
{
System.Runtime.InteropServices.Marshal.FinalReleaseComObject(layout);
layout = null;
}
if (image != null)
{
System.Runtime.InteropServices.Marshal.FinalReleaseComObject(image);
image = null;
}
}
File.AppendAllText(ocrFile, strRecText.ToString()); // write the OCR file out to disk
GC.Collect();
GC.WaitForPendingFinalizers();
GC.Collect();
GC.WaitForPendingFinalizers();
if (images != null)
{
System.Runtime.InteropServices.Marshal.FinalReleaseComObject(images);
images = null;
}
GC.Collect();
GC.WaitForPendingFinalizers();
GC.Collect();
GC.WaitForPendingFinalizers();
doc1.Close(false); // clean up
if (doc1 != null)
{
System.Runtime.InteropServices.Marshal.FinalReleaseComObject(doc1);
doc1 = null;
}
GC.Collect();
GC.WaitForPendingFinalizers();
GC.Collect();
GC.WaitForPendingFinalizers();
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
过去几个月我一直在使用 MODI 开发一个项目。 MODI 是迄今为止我尝试过的最准确的 OCR 引擎,但它存在一些释放资源和崩溃的重大问题。
我最终构建了一个命令行应用程序,它将图像的路径作为命令行参数,然后将结果文本保存到文件中并退出。然后,我可以通过任何需要 modi 功能的软件来使用此命令行应用程序。这听起来像是一个奇怪的解决方案,但它是解决 MODI 内存泄漏问题的一种非常简单直接的方法,因为当命令行进程存在时,它的内存会被操作系统释放,因此您不必担心应用程序崩溃或资源没有被清理。我发现启动命令行 exe 然后读取它创建的文件所需的时间与实际 OCR 图像所需的时间相比是相当微不足道的,因此您实际上并没有损失太多性能。
I've been working on a project using MODI for the last few months. MODI has by far been the most accurate OCR engine I've tried, but it has some major issues releasing resources and crashing.
I ended up building a commandline app that takes the path to an image as a commandline parameter, then saves the resulting text to a file and quits. I then use this commandline application by any software that requires modi functionality. It sounds like an odd solution but it's a very simple and straightforward way to solve the memory leak issues that MODI has because when the commandline process exists it's memory is freed by the operating system so you don't have to worry about your application crashing or resources not being cleaned up. I have found that the time it takes to fire up the commandline exe and then read the file that it creates is quite insignificant compared to the time it takes to actually OCR the image, so you are not actually losing much in the way of performance.