C# - 从现有进程中读取文本
我们必须从现有的 VB6 应用程序中读取文本。因此,我们使用kernel32之外的FindWindow、GetWindowText和EnumChildWindows方法,可以枚举并读取该过程中显示的文本。
我们可以用我们的方法阅读 90% 的文本,但有一个特定的控件(或框)通常我们无法阅读。
我们无法使用 UI 间谍类型程序来定位需要读取的文本,因此我假设它们必须使用 GDI/GDI+ 将其直接渲染到屏幕上。他们无法使用控件或窗口来呈现我们需要的文本。
有没有办法确定他们如何呈现文本并可能阅读它?
我们不想抓取窗口的 hDC 并将其渲染到位图上并以某种方式反转验证码文本......这可能是一场噩梦。
解决方案:我们发现可以仅在此框中查找 2-3 个短语,而不是实际 OCR 文本。因此,我们将把它渲染为位图,并将其与 2-3 个预存储的位图进行比较,这样我们就只能逐个像素地进行比较。
最佳答案让我们找到了这个解决方案。
We are having to read text off of an existing VB6 application. So we use the methods FindWindow, GetWindowText, and EnumChildWindows out of kernel32 and can enumerate and read the displayed text in this process.
We are able to read 90% of the text with our method, but there is a specific control (or box) in general that we cannot read.
We cannot target the text we need to read with UI spy-type programs, so I assume they must be rendering it directly to the screen with GDI/GDI+. They cannot be using a control or window to render the text we need.
Is there a way to determine how they are rendering the text, and possibly read it?
We do not want to grab the hDC of the window and render it onto a bitmap and somehow reverse-CAPTCHA the text... that could be a nightmare.
SOLUTION: We discovered it is possible for use to merely look for 2-3 phrases in this box versus actually OCR-ing the text. So we are going to render it to a bitmap and compare it with 2-3 pre-stored bitmaps so we can merely compare pixel by pixel.
Top answer brought us to this solution.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
如果他们直接在表面上绘图,那么没有一些奇怪的 OCR 内容就无法获取文本。
更新:在考虑了您的问题之后,我认为执行您所描述的操作(抓取窗口的 hDC 并从中创建位图)将是一项相对简单的任务(相对于尝试拦截首先呈现文本的 API 调用)。
例如,它不会像对手写体进行 OCR 那样困难。只要您能够确定 Visual Basic 6 应用程序绘制文本所使用的字体,并且只要每次将要抓取的文本绘制到窗体上的同一位置,就相对容易破坏该文本。将文本绘制成离散字符(作为微小的位图),然后将每个字符与您使用相同字体和相同大小绘制的预先生成的字符集合进行比较。这些字符将在逐像素的基础上完美匹配。
如果程序在不同的系统上运行并使用不同的字体绘制文本,则可能会出现问题。
If they're drawing direct to a surface, there's no way to get the text without some weird OCR stuff.
Update: after thinking about your problem, I think that doing what you describe (grabbing the window's hDC and creating a bitmap from it) would be a relatively easy task (relative to trying to intercept the API calls that were rendering the text in the first place).
It wouldn't be as difficult as doing OCR on handwriting, for example. As long as you can determine the font used by the Visual Basic 6 application to draw the text, and as long as the text you want to scrape is drawn to the same location on the form each time, it would be relatively easy to break the drawn text up into discrete characters (as tiny little bitmaps) and then compare each one to a pre-generated collection of characters that you've drawn with the same font at the same size. The characters would match perfectly on a pixel-by-pixel basis.
There might be a problem if the program runs on different systems and draws the text with different fonts.