OCR RSA 密钥卡(安全令牌)
我组装了一个快速的 WinForm/嵌入式 IE 浏览器控件,每天早上登录我们公司的银行网站并抓取/导出所需的存款信息(该银行是一家小型区域银行)。由于我们有几十个从同一个主账户提取的“伪账户”,因此检索实际上需要 10-15 分钟。
无论如何,唯一的问题是我们的商业银行帐户需要 RSA 安全令牌 (http://www.rsa.com/node.aspx?id=1156)--如果你不熟悉,它是一个小设备,每 15(?) 秒显示一个随机的 6 位数字,所以我必须在开始之前提示输入这个值。这是基于网站登录的安全模型之上的,因此即使您创建了一个无法执行任何操作的只读帐户,您仍然必须输入 RSA 号码。我们为不同的人提供了 5 个这样的令牌公司。
从我们的角度来看,这是令人讨厌的安全问题。我开玩笑说使用网络摄像头对钥匙扣上的数字进行 OCR 识别,这样他们就不必输入它——主要是为了在早上有人到达之前完成抓取/导出。好吧,他们问我是否真的能做到。
现在我问你,你认为从相机生成的 JPEG 图像中可靠地 OCR 这些数字需要多努力(多少小时)?我已经知道我可以轻松获取 JPEG。我认为您会尝试登录 3 次,因此确实需要达到 99% 的准确率。我可以在休息时间处理这个问题,但他们不希望我花超过几个小时的时间,所以我想尽可能多地利用现有代码。这是一个 7 段显示器(如闹钟),因此它并不完全是 OCR 包用来查看的文本。
另外,显示屏侧面还有一个倒计时器;通常,当它下降到 1 格时,您会等到下一个数字出现,然后从 5 格重新开始(就像手机上的信号强度)。因此,这也需要是 OCRd,但它不是文本。
不管怎样,当我打字的时候,我想得越多,我就越不相信我能真正把它做好,所以也许我应该在业余时间做这件事?
I put together a quick WinForm/embedded IE browser control which logs into our company's bank website each morning and scrapes/exports the desired deposit information (the bank is a smallish regional bank). Since we have a few dozen "pseudoaccounts" that draw from the same master account, this actually takes 10-15 minutes to retrieve.
Anyway, the only problem is that our business bank account reuires an RSA security token (http://www.rsa.com/node.aspx?id=1156)--if you are not familiar, it is a small device which shows a random 6 digit number every 15(?) seconds, so I have to prompt for this value before starting. This is on top of the website's login based security model, so even if you create a read-only account that can't do anything, you still have to put the RSA number in. We have 5 of these tokens for different people in the company.
From our perspective this is nusiance security. I was joking about using a web camera to OCR the digits from the key fob so they didn't have to type it in -- mainly so that the scraping/export would be done before anyone arrives in the morning. Well, they asked if I could really do it.
So now I ask you, how hard (how many hours) do you think it would take to OCR these digits reliably from a JPEG image produced by the camera? I already know I can get the JPEG easily. I think you get 3 tries to log in, so it really needs to hit a 99% accuracy rate. I could work on this on my off time, but they don't want me to put more than a few hours into it, so I want to leverage as much existing code as possible. This is a 7-segment display (like an alarm clock) so it's not exactly text that an OCR package would be used to seeing.
Also--there is a countdown timer on the side of the display; typically when it is down to 1 bar, you wait until the next number appears and it starts over at 5 bars (like signal strength on your cell phone). So this would need to be OCRd as well but it is not text.
Anyway the more I think about it as I type this, the less convinced I am that I can truly get this right, so maybe I should just work on it in my spare time?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
至少有两个有据可查的开源七段 OCR 程序,专门为自动读取 RSA SecurID fobs 的任务而设计:
ssocr: 七段光学字符识别。它具有其他 OCR 和图像处理软件的链接。
LCDOCR.pm:OCR 使用 Perl 模块构建您自己的模块 - 2007 年 5 月 - Linux 在线杂志
和更多通用软件现在可能已经开放:
帮助视障人士的最新工作似乎是 针对视障人士的 LED/LCD 显示屏的实时检测和读取 - Proc IEEE Workshop Appl Comput维斯。 2011年
There are at least two well-documented open source seven-segment OCR programs designed precisely for the task of automatically reading RSA SecurID fobs:
ssocr: Seven Segment Optical Character Recognition. It has links to other OCR and image processing software.
LCDOCR.pm: OCR Build Your Own with Perl modules - May 2007 - Linux Magazine Online
and more general-purpose software that runs on Symbian cell phones may be open by now:
The latest work on helping visually-impaired people seems to be Real-Time Detection and Reading of LED/LCD Displays for Visually Impaired Persons - Proc IEEE Workshop Appl Comput Vis. 2011
这实际上比乍一看要容易。我过去曾使用过这种技术,因为数字总是看起来相同,并且总是出现在相同的位置。
只需创建十个小蒙版,每个数字一个,并准备一个脚本,将一张 jpg 图像分成几部分,每个数字一个。将相机对齐一次,然后就这样。现在您有 0-9 的十个掩码以及设备上的实际数字。将每个掩码中的像素值乘以每个数字,并选择每种情况下的最高值。这将告诉您每个数字最适合哪个掩码,您可以使用它来确定数字。
免责声明:正如其他评论者指出的那样,出于安全原因,我认为这不是一个好主意。
This is actually easier than it might at first appear. I've used this technique in the past, based on the fact that the digits always look the same, and always appear in the same locations.
Just create ten little masks, one for each of the digits, and prepare a script that splits your one jpg image into pieces, one for each digit. Line up the camera once, then leave it like that. Now you have ten masks for 0-9, and the actual digits on the device. Multiply the pixel values in each mask by each digit, and choose the highest value in each case. That will tell you which mask each digit best fits, and you can use that to determine the digits.
Disclaimer: I don't think this is a great idea for security reasons, as other commenters have pointed out.
我相信 RSA SecurID 令牌有一个软件版本。 请参阅此处
我不确定它是否适合您的情况(您可能需要与银行联系),但如果确实如此,它可能比 OCR 更容易、更可靠。
I believe there is a software version of the RSA SecurID token. See here
I'm not sure it'll work for your situation (you might have to talk to the bank), but if it does it's probably easier and more reliable than OCR would be.
只是为了一笑,您可以尝试将 RSA 令牌的扫描数据输入 Tesseract OCR 并看看它开箱即用的表现如何。我的猜测是,您需要对扫描对比度/亮度值进行大量调整,以获得用于扫描的清晰文本图像。
Just for grins you might try feeding a scan of your RSA token into Tesseract OCR and see how well it performs out-of-the-box. My guess is that you'll need to do considerable tweaking of the scan contrast/brightness values in order to get a clear text image for scanning.
您可以尝试使用 OCR API http://www.webservius.com/corp/ docs/wisetrend.pdf - 对于您正在谈论的卷,它可能对您免费。要快速测试是否能识别数字,您可以将测试图像发送至[email] ;protected],您将通过电子邮件收到 OCR 结果。
You can try using the OCR API at http://www.webservius.com/corp/docs/wisetrend.pdf - for the volumes you're talking about, it will likely be free for you. To quickly test whether or not the digits will be recognized, you can send a test image to [email protected] and you will get back the OCR results over email.