如何解析MICR线数据?
我有一台数字支票扫描仪,能够从支票中捕获 MICR 线。它将以字符串形式返回原始格式的 MICR 行,并使用分隔符分隔帐号、路由号码和支票号码。然而,每个银行对该 MICR 行的格式都不同,因此没有标准的方法来解析该数据。
我尝试过的一些公司是 Inlite Research Inc 和 Accusoft Pegasus。 Inlite Research 的 API 适用于某些银行,但无法正确读取美国银行支票。我仍在测试 Accusoft 的 API。
我想问的是是否有人知道一个 API 可以准确解析不同组件的 MICR 行。如果我遇到 API 无法正确处理的新检查,是否有一个 API 可以让我添加新的检查格式定义?或者,如果有人知道如何或已经编写了一个例程来解析 MICR 行。
如果我能得到任何帮助,我将不胜感激。谢谢。
I have a digital check scanner that is able to capture the MICR line from the check. It will return the MICR line in raw format as a string, with delimiters to separate the account number, routing number, and check number. However, each bank formats this MICR line differently, so there's no standard way to parse this data.
Some companies I have tried are Inlite Research Inc and Accusoft Pegasus. The API from Inlite Research works for some banks, but cannot read Bank of America checks correctly. I'm still testing out the API from Accusoft.
What I am asking is if anyone know of an API that will accurately parse the MICR line for the different components. Is there an API that will let me add new definitions of check format if I encounter a new check that the API cannot handle correctly? Or, if anyone know how to or has written a routine to parse the MICR line.
I would appreciate any help I can get. Thank you.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
根据我的研究,这也应该是正确的答案。 MICR 模式种类繁多,如果没有一组正则表达式匹配模式来提取相关信息,则无法可靠地解析。最好的是看到您使用组名称提出的正则表达式模式的集合,例如:
This should be the correct answer based on my research as well. MICR patterns are too varied to reliably parse without having a collection of regex matching patterns to pull the relevant information. What would be nice is to see the collection of regex patterns you have come up with with group names such as:
这个问题最初被问到六年后,在过去的两周里我多次遇到这个问题。我终于找到了一个实际的解决方案,以及如何正确解析 MICR 行。我已经编写了一些代码来执行此操作,并且它适用于到目前为止我扫描过的 99.9% 的支票,因此我必须分享并确保人们了解应该如何完成此操作。
我从事这项工作已有11年了。我们一直使用 Magtek 检查扫描仪。最近,我决定改用成像扫描仪,这样我们就可以扫描所有支票。我选择了帕尼尼支票扫描仪。不幸的是,他们的 API 并没有分解 MICR 系列,但我们的 Magtek 扫描仪可以编程,为我们提供我们想要的任何东西。我创建了一个每次都可以与模式匹配的基本字符串。它总是会显示为:其中a是路线号码,b是帐号,c是支票号码。我一遍又一遍地想知道扫描仪,只是一个简单的串行设备,如何能够弄清楚并在十年内每次都正确。
我首先使用帕特里克自己的答案,构建了一个我以前从未见过的 MICR 模式表。问题是,我遇到了这样的情况:一种模式与另一种检查非常接近,并且数据会略有偏差。然后我尝试根据路线编号进行操作,直到我遇到美国银行的两张支票,它们具有相同的路线编号和完全不同的 MICR 行。我非常失望,我的脸沮丧地贴在桌子上。
经过更多研究后,正确的方法是从左到右解析 MICR 行。 MICR 线是从左到右的,当然给我们带来最大麻烦的字段是 on-us 字段。我的所有示例片段都是 C# 代码。
首先向后循环字符串:
循环时评估每个字符。如果您的第一个字符是金额字符,那么它是一张商业支票。阅读直到获得另一个金额字符,然后保存该值。如果下一个字符是 on-us 符号,则假设支票号码位于 on-us 字段的最左侧。如果下一个字符是数字,请继续读取并用数字填充缓冲区(记住您正在倒退!),直到到达 on-us 字符。如果您的缓冲区仅包含数字,那就是您的支票号码。如果它是空的,则继续前进并将整个 on-us 字段收集到缓冲区中,直到到达中转字符。一旦到达中转字符,继续读取并填充缓冲区,直到到达下一个中转字符。您的缓冲区现在就是您的路由号码。如果是商业支票,您还有更多字符需要阅读。继续阅读,直到找到另一个我们的角色。您现在已到达辅助 on-us 字段,该字段应该是支票号码。阅读直到到达下一个 on-us 字符,这应该是字符串的末尾。您现在已经有了支票号码。
现在,查看您从常规 on-us 字段中剥离的值。如果您有支票号码,那就是您的帐号。如果您没有支票号码,那么您应该用空格分隔 on-us 字段,并假设最左边的一组数字(数组元素 0)是您的支票号码。但是,如果按空格分割后,数组中只有一个元素,则意味着 on-us 字段可能包含分隔项目的破折号。用破折号分割 on-us 字段,并假设最左边的数组元素是支票号码,其余的是您的帐号。我见过一些在 on-us 字段中有多达 3 个破折号,如下所示:nnnn-1234-56-7,其中 nnnn 是支票号码,其余是帐号。
将帐号与支票号码分开后,删除其中的任何杂项字符(空格、破折号等)即可完成。
这是我所有 MICR 问题的解决方案。希望它对其他人有帮助。
部分感谢此文档:http://www.transact-tech.com/uploads/printers/files/100-9094-Rev-C-MICR-Programmers-Guide.pdf
6 years after this question was originally asked, and I have run across this question numerous times in the past 2 weeks. I finally found an ACTUAL solution, and how to properly parse a MICR line. I've written some code to do so and it works on 99.9% of checks I've scanned this far, so I have to share and make sure people understand how this should be done.
For 11 years I have done this job. We have always used Magtek check scanners. Recently I decided to move to an imaging scanner so we could get scans of all our checks. I went with Panini check scanners. Unfortunately, their API doesn't break apart the MICR line, but our Magtek scanners were programmable to give us whatever we wanted. I created a basic string that could be matched with a pattern every time. It would always come out as: <aaaaaaaaa/bbbbbbbb/ccc> where a is route number, b is account number, and c is check number. Over and over I keep wondering how the scanner, just a simple serial device, can figure it out and get it right EVERY SINGLE TIME for a decade.
I started by using Patrick's own answer, sort of, to build a table of MICR patterns I hadn't seen before. Problem is that I ran to a point where one pattern would get a close match to another check and the data would be off slightly. I then tried doing it based on route number until I ran across two checks from BofA that had identical route numbers and completely different MICR lines. I was so disappointed that my face met my desk in frustration.
After much more research, the proper way is left-to-right parsing of the MICR line. MICR lines are left-to-right, and of course the field giving us the most trouble is the on-us field. All my example snippets are C# code.
Start by looping through the string backwards:
Evaluate each character as you loop. If your first character is the amount character, it's a business check. Read until you get another amount character, then save that value. If the next character is the on-us symbol, assume that the check number is at the far left of the on-us field. If the next character is a digit, keep reading and filling a buffer (REMEMBER YOU ARE WORKING BACKWARDS!) with the digits until you reach the on-us character. If your buffer contains only digits, that's your check number. If it's empty, just move on and collect the entire on-us field in a buffer until you reach the transit character. Once you reach the transit character, keep reading and filling your buffer until you reach the next transit character. Your buffer is now your routing number. If it's a business check, You still have more characters to read. Keep reading until you reach ANOTHER on-us character. You've now reached the auxiliary on-us field, which should be the check number. Read until you reach the next on-us character and that should be the end of your string. You now have your check number.
Now, look at the value you stripped from the regular on-us field. If you have a check number, then that's your account number. If you DO NOT have a check number, then you should split the on-us field by spaces, and assume that your far left set (array element 0) of digits are your check number. HOWEVER, if after splitting by space you only have ONE element in the array, that means the on-us field likely contains dashes separating the items. Split the on-us field by dashes and assume that your far left array element is the check number and the rest are your account number. I've seen some that have as many as 3 dashes in the on-us field, like this: nnnn-1234-56-7, where nnnn is the check number and the rest is the account number.
Once you've got your account number separated from check number, strip any miscellaneous characters (spaces, dashes, etc.) from it and you're done.
This is my solution to all my MICR problems. Hopefully it helps someone else.
Thanks goes, partially, to this document: http://www.transact-tech.com/uploads/printers/files/100-9094-Rev-C-MICR-Programmers-Guide.pdf
抱歉回复晚了。我没有看到这个问题的任何答案,所以我认为没有人回应。
为了回答上述问题,我经过思考并与各个供应商交谈后找到了解决方案。我正在使用的支票扫描仪已经能够读取 MICR 行。问题在于解析 MICR 行以获取相关信息,例如路由转接号码、帐号、支票/序列号和金额(如果有)。在与一些第三方公司交谈并尝试了 MICR 解析器的可用试用版之后,我得出的结论是,没有通用的解析器。我仍然面临 On-Us 字段不合格的问题。每个银行对该字段的格式都不同。有时符号的排列方式也不同。所以,我决定编写自己的解析器。我认为这是最合乎逻辑的方式,因为这些第三方供应商告诉我,他们各自推出了自己的解析软件。
我编写解析器的方式是保留一个 MICR 行模式表。每次遇到新的 MICR 行格式时,我都会更新此表。我的解析器将匹配针对该表扫描的任何检查,如果找到匹配项,它将使用该模式来解析相关信息。
我希望我的经验和我提出的解决方案也能帮助那些遇到同样问题的人。
感谢所有回复的人,祝你好运。
Sorry for the late reply. I didn't see any answers to the question so I thought nobody responded.
To answer the questions above, I found a solution after thinking the problem over and talking with various vendors. The Check scanner that I'm using is already able to read the MICR line. The problem lies in parsing the MICR line for relevant information such as the routing transit number, account number, check/serial number, and amount (if there is one). After speaking with a handful of 3rd party companies and trying out available trial versions of MICR parser, I come to the conclusion that there is no universal parser out there. I'm still faced with the problem of the non-comforming On-Us field. Each bank formats this field differently. Sometimes the symbols are arranged differently as well. So, I decided to write my own parser. I think this is the most logical way to proceed as I've been informed by these 3rd party vendors that they each roll their own parsing software.
The way I wrote the parser was I kept a table of MICR line patterns. Each time I encounter a new MICR line format, I will update this table. My parser will match any check scanned against this table and if it finds a match, it will use that pattern to parse the relevant information.
I hope my experience and the solution I came up with will also help those who ran across the same issue.
Thank you for all those who responded and good luck.
MICR 的基本模式:
xxxxxxxxxxx /rrrrrrrrrr/ ooooooooooo baaaaaaaaaab,
其中 'x' 是 AuxOnUs,'r' 是路由号码,'o' 是 OnUs,'a' 是金额,'b' 和 '/' 是特殊的MICR 符号。
最小的 MICR 行只是:
/rrrrrrrrrr/ ooooooooo
AuxOnUs 通常仅用于业务检查,并且它几乎总是意味着有一个序列号。
路由号码始终一致,它是 MICR 中唯一通用的部分。
金额通常不会编码在 MICR 中,但有时会编码。
OnUs 是棘手的部分。它通常由支票序列号和帐户组成,但每个银行的处理方式有所不同。通常序列号为 4 位数字,但也可能为 5 位或更多。如果有 AuxOnUs 字段,您可以非常确定 OnUs 只是帐号。
OnU 可以包含空格和破折号。如果有一致的划分方式那就太好了,但我已经看到了很多变化,我认为最好将其保留为“OnUs”字段,而不是将其分为序列和帐户,除非您付款银行,在这种情况下,您应该知道您自己的支票是什么格式。
The basic pattern of a MICR:
xxxxxxxxxxx /rrrrrrrrr/ ooooooooooo baaaaaaaaaab
where 'x' is AuxOnUs, 'r' is routing number, 'o' is OnUs, and 'a' is amount, with 'b' and '/' are special MICR symbols.
A minimal MICR line is just:
/rrrrrrrrr/ ooooooooo
AuxOnUs is generally only used by business checks, and it pretty much always means there is a serial number.
Routing number is always consistent, it's the only part of the MICR that is universal.
Amount is generally not encoded in the MICR, but sometimes it is.
OnUs is the tricky part. It normally consists of the check serial number and the account, but each bank handles it differently. Usually the serial number will be 4 digits, but it may be 5 or more. If there's an AuxOnUs field, you can be pretty sure the OnUs is just the account number.
The OnUs can contain spaces and dashes. It would be nice if there were a consistent way they were divided, but I've seen so many variations, I think it's better to just leave it as an "OnUs" field instead of separating it into serial and account, unless you're the paying bank, in which case you should know what format your own checks are.