解析图像以从中获取信息
几天来,我一直在思考 a 的三重工作
。得到 b.解析 c.存储多个页面。
两天前,我认为获取页面将是主要任务。不,情况并非如此 - 我猜解析器工作将是一项艰巨的任务。每个要解析的页面都是一个 png 图像。
所以问题是——在得到所有这些之后。如何解析它们!?这似乎是问题所在。猜猜那里有一些 perl 模块 - 可以帮助做到这一点......
好吧 - 我认为这项工作只能通过嵌入一些 OCR 来完成!问题:是否有一个 perl 模块可以在这里使用来支持此任务:
顺便说一句:请参阅结果页面。
顺便说一句;:正如我所想,我可以在一定范围内找到所有 790 个结果页面 Id= 0 和 Id= 100000 我想,我可以循环使用:
http://www.foundationfinder.ch/ShowDetails.php?Id=11233&InterfaceLanguage=&Type=Html http://www.foundationfinder.ch/ShowDetails。 php?Id=927&InterfaceLanguage=1&Type=Html http://www.foundationfinder.ch/ShowDetails。 php?Id=949&InterfaceLanguage=1&Type=Html http://www.foundationfinder.ch/ShowDetails。 php?Id=20011&InterfaceLanguage=1&Type=Html http://www.foundationfinder.ch/ShowDetails。 php?Id=10579&InterfaceLanguage=1&Type=Html
我以为我可以采用 Perl-Way,但我不太确定: 我试图在相同的 URL 上使用 LWP::UserAgent [见下文] 使用不同的查询参数,我想知道 LWP::UserAgent 是否提供了 我们如何循环查询参数?我不确定 LWP::UserAgent 有没有方法可以让我们做到这一点。嗯 - 我有时听说使用 Mechanize 更容易。但这真的容易吗?
但是——坦白说; 第一个任务“获取所有页面并不是很困难 - 如果我们将此任务与解析进行比较......如何完成此任务 !?
任何想法 - 建议 -
期待收到您的来信...
零
Several days i mused about the three-folded job of
a. getting
b. parsing
c. storing a number of pages.
Two days ago i thought that getting the pages would be the major-task. No this isnt the case - i guess that the parser-job would be a heroic task. Each of the pages that are intended to be parsed is a png-image.
So the question is - after getting all them. How to parse them!? This seems to be the issue. Guess that there are some perl-modules out there - that can help in doing this...
Well - i think that this job only can be done with some OCR embedded! Question: is there a perl-module that can be use here to support this task:
BTW: see the result-pages.
BTW;: and as i thought i can find all 790 resultpages within a certain range between
Id= 0 and Id= 100000 i thought, that i can go the way with a loop:
http://www.foundationfinder.ch/ShowDetails.php?Id=11233&InterfaceLanguage=&Type=Html
http://www.foundationfinder.ch/ShowDetails.php?Id=927&InterfaceLanguage=1&Type=Html
http://www.foundationfinder.ch/ShowDetails.php?Id=949&InterfaceLanguage=1&Type=Html
http://www.foundationfinder.ch/ShowDetails.php?Id=20011&InterfaceLanguage=1&Type=Html
http://www.foundationfinder.ch/ShowDetails.php?Id=10579&InterfaceLanguage=1&Type=Html
i thought i can go the Perl-Way but i am not very very sure:
I was trying to use LWP::UserAgent on the same URLs [see below]
with different query arguments, and i am wondering if LWP::UserAgent provides a
way for us to loop through the query arguments? I am not sure that LWP::UserAgent has a method for us to do that. Well - i sometimes heard that it is easier to use Mechanize. But is it really easier!?
But - to be frank; The first task " GETTING all the pages is not very difficult - if we compare this task with the parsing... How can this be done!?
Any ideas - suggestions -
look forward to hear from you...
zero
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我建议使用
Image::OCR::Tesseract
我过去使用 C++ 在 Tesseract 方面有很好的经验。
请参阅此了解更多信息信息。
I would suggest using
Image::OCR::Tesseract
I've had good experience with Tesseract in the past using C++.
See this for further info.