解析 Quickbook IIF 格式文件
我正在使用 Quickbook 的 IIF 文件格式,需要编写一个解析器来读取和写入 IIF 文件,并且在读取文件时遇到一些问题。
这些文件很简单,它们是制表符分隔的。 每一行要么是一个表定义,要么是一行。 定义以“!”开头 和表名,行仅以表名开头。 这是我遇到的问题:某些字段允许换行。
当我第一次遇到这个问题时,我想,好吧,只需逐个选项卡而不是逐行解析它,但要做到这一点,我必须用制表符替换换行符,并最终获得比列更多的值,但我缠绕带有换行符的值分布在太多列中。
您将如何解析这样的文件?
编辑:一个例子
!CUST NAME REFNUM TIMESTAMP BADDR1 BADDR2 BADDR3 BADDR4 BADDR5 SADDR1 SADDR2 SADDR3 SADDR4 SADDR5 PHONE1 PHONE2 FAXNUM CONT1 CONT2 CTYPE TERMS TAXABLE LIMIT RESALENUM REP TAXITEM NOTEPAD SALUTATION COMPANYNAME FIRSTNAME MIDINIT LASTNAME CUSTFLD1 CUSTFLD2 CUSTFLD3 CUSTFLD4 CUSTFLD5 CUSTFLD6 CUSTFLD7 CUSTFLD8 CUSTFLD9 CUSTFLD10 CUSTFLD11 CUSTFLD12 CUSTFLD13 CUSTFLD14 CUSTFLD15 JOBDESC JOBTYPE JOBSTATUS JOBSTART JOBPROJEND JOBEND HIDDEN DELCOUNT
CUST St. Mark 359 1176670332 Saint Mark Catholic Church 609 W Main St City, State Zip
!CLASS NAME REFNUM TIMESTAMP HIDDEN DELCOUNT
!INVITEM NAME REFNUM TIMESTAMP INVITEMTYPE DESC PURCHASEDESC ACCNT ASSETACCNT COGSACCNT QNTY QNTY PRICE COST TAXABLE PAYMETH TAXVEND TAXDIST PREFVEND REORDERPOINT EXTRA CUSTFLD1 CUSTFLD2 CUSTFLD3 CUSTFLD4 CUSTFLD5 DEP_TYPE ISPASSEDTHRU HIDDEN DELCOUNT USEID
INVITEM Labor 1 1119915308 SERV Labor 0
!TIMEACT DATE JOB EMP ITEM PITEM DURATION PROJ NOTE XFERTOPAYROLL BILLINGSTATUS
TIMEACT 3/8/08 876 Development Jane Doe {Consultant} Labor 00:15 Renewing all domain name for 876 Development.
REIMBURSEMENT: 44.72 for one year renewal on all domain names. N 1
TIMEACT 3/17/08 Greg:Bridge Jane Doe {Consultant} Labor 01:00 Preparing Studio N 1
TIMEACT 3/17/08 John Doe and Associates Jane Doe {Consultant} Labor 00:06 REIMBURSEMENT: Toner cartridge on ebay & Fuser from FastPrinters- ask wendell before invoicing to see if this fixed the problem
49.99 (include tax) toner
$175.18 (include tax) fuser
N 1
TIMEACT 3/17/08 John Doe II Jane Doe {Consultant} Labor 01:00 Fixing Kandis's computer - replaced entire computer with similar system N 1
I'm working with Quickbook's IIF file format and I need to write a parser to read and write IIF files and I'm running into some issues reading the files.
The files are simple, they're tab deliminated. Every line is either a table definition or a row. Definitions begin with'!' and the table name, and rows begin with just the table name. Here's the problem I'm running into: some of the fields allow line breaks.
When I first encountered this, I thought, okay just parse it tab by tab instead of line by line, but to do that I had to replace the line breaks with tabs, and wound up with more values than there were columns, but I wound up with the values with line breaks spread out across too many columns.
How would you parse such a file?
Edit: An example
!CUST NAME REFNUM TIMESTAMP BADDR1 BADDR2 BADDR3 BADDR4 BADDR5 SADDR1 SADDR2 SADDR3 SADDR4 SADDR5 PHONE1 PHONE2 FAXNUM CONT1 CONT2 CTYPE TERMS TAXABLE LIMIT RESALENUM REP TAXITEM NOTEPAD SALUTATION COMPANYNAME FIRSTNAME MIDINIT LASTNAME CUSTFLD1 CUSTFLD2 CUSTFLD3 CUSTFLD4 CUSTFLD5 CUSTFLD6 CUSTFLD7 CUSTFLD8 CUSTFLD9 CUSTFLD10 CUSTFLD11 CUSTFLD12 CUSTFLD13 CUSTFLD14 CUSTFLD15 JOBDESC JOBTYPE JOBSTATUS JOBSTART JOBPROJEND JOBEND HIDDEN DELCOUNT
CUST St. Mark 359 1176670332 Saint Mark Catholic Church 609 W Main St City, State Zip
!CLASS NAME REFNUM TIMESTAMP HIDDEN DELCOUNT
!INVITEM NAME REFNUM TIMESTAMP INVITEMTYPE DESC PURCHASEDESC ACCNT ASSETACCNT COGSACCNT QNTY QNTY PRICE COST TAXABLE PAYMETH TAXVEND TAXDIST PREFVEND REORDERPOINT EXTRA CUSTFLD1 CUSTFLD2 CUSTFLD3 CUSTFLD4 CUSTFLD5 DEP_TYPE ISPASSEDTHRU HIDDEN DELCOUNT USEID
INVITEM Labor 1 1119915308 SERV Labor 0
!TIMEACT DATE JOB EMP ITEM PITEM DURATION PROJ NOTE XFERTOPAYROLL BILLINGSTATUS
TIMEACT 3/8/08 876 Development Jane Doe {Consultant} Labor 00:15 Renewing all domain name for 876 Development.
REIMBURSEMENT: 44.72 for one year renewal on all domain names. N 1
TIMEACT 3/17/08 Greg:Bridge Jane Doe {Consultant} Labor 01:00 Preparing Studio N 1
TIMEACT 3/17/08 John Doe and Associates Jane Doe {Consultant} Labor 00:06 REIMBURSEMENT: Toner cartridge on ebay & Fuser from FastPrinters- ask wendell before invoicing to see if this fixed the problem
49.99 (include tax) toner
$175.18 (include tax) fuser
N 1
TIMEACT 3/17/08 John Doe II Jane Doe {Consultant} Labor 01:00 Fixing Kandis's computer - replaced entire computer with similar system N 1
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
我做到了:
I did it:
我已经有一段时间没有完成 IIF 了,但是除非他们修复了它,否则 QuickBooks 无论如何都会在这些换行符上呕吐。 看来这些人也有同样的问题,他们用空格处理了它。
就我个人而言,当进入 QuickBooks 时,我会倾向于管道或能够清楚地描绘换行符的东西。 如果您绝对必须有换行符,请加入 Intuit 开发者网络 并使用 SDK 发送这些值一旦你的程序导入它们,就发送给 QB。
It has been a while since I have done IIF but unless they have fixed it QuickBooks will barf on those line breaks anyway. It seems these folks have the same issue and they handled it with spaces.
Personally I would lean toward pipes or something that will clearly delineate the line break when it comes into QuickBooks. If you absolutely positively must have the line breaks, join the Intuit Developer Network and use the SDK to send these values to QB once your program imports them.
为什么不用空格而不是制表符来替换换行符?
Why not replace line breaks with spaces instead of tabs?
我经常遇到这样的事情。 关键是在进行解析时处理此类特殊情况,即将特殊情况替换为文本中极不可能出现的内容,然后在完成后再次替换。
例如,输出中存在换行符,可以使用正则表达式轻松检测到换行符。 使用 Regex.Replace 将它们转换为 LINEBREAK 之类的内容。 使其在编辑器中脱颖而出以进行调试。 然后照常进行其余的解析,最后一步,用原始值(或新值)替换特殊标记。
I run into this sort of thing all the time. The key is to handling special cases like this when you're doing parsing is to replace the special case with something that's extremely unlikely to occur in the text, and then replace it again when you're done.
For example, you have line breaks in the output which can be readily detected with a Regex. Use Regex.Replace to convert them to something like LINEBREAK. Make it something that stands out in an editor for debugging. Then do the rest of your parsing as normal, and as the last step, replace the special token with the original value (or something new).
想法:
预处理你的文件,用一些高位 ASCII 字符替换换行符(假设它是单个 CR 或 LF)。 然后按制表符进行解析,最后用换行符替换所述高位 ASCII 字符。
不要逐行处理,而是逐字符处理。 请注意,只有当嵌入的换行符与记录末尾的标准 CRLF 有所不同时,这仍然有效。
Ideas:
Preprocess your file, replacing line break (assuming it's a single CR or LF) with some high-ascii character. Then parse by tab and finally replace said high-ascii with the line breaker afterwards.
Rather than process line by line, process character by character. Mind you, that still only works if the embedded line breaks are somehow different from the standard CRLF at the end of a record.
我在 CodePlex 上找到了这个。 您可以使用 nugget 包获取它。
https://qif.codeplex.com/
我测试了它,我能够读写快速书籍格式。
I found this on CodePlex. you can get it using the nugget package.
https://qif.codeplex.com/
I tested it and I was able to read and write quick book format.