导入并解析包含 PCL 的文本文件:ASP.NET C# 技术建议?
我需要抓取包含打印机控制语言 (PCL) 的旧大型机文本文件以进行数据导入。改变大型机功能不是一个选择。打印输出包含产品销售信息并具有分层输出。
我希望设置一个 Sql Server 集成服务导入 (SSIS)。最终,这将是一个带有 SQL 2005 数据库的数据导入 ASP.NET MVC 3 网站,因此我们可以避免 SSIS。我目前构建 C# ASP.NET MVC 3 网站,因此使用相关技术应该是可以管理的。
有没有人成功地在 C# 或 SSIS 中使用文本模式(如正则表达式)将文本报告解析回有用的数据导入?有没有使用状态设计模式的示例?
我发现很多这些答案显示答案的一小部分:如何加载文本文件并在 C# 中获取第 n 列。这个涉及的比较多。我需要根据我所处的导入状态使用模式来识别每种线类型。现成的软件会更好。
文本文件示例:
this part may be a header for the page which needs skipped
this part may be a header for the page which needs skipped
this part may be a header for the page which needs skipped
first line containing prices
second line containing product description for the first line
third line containing a related product (listing all flavors)
fourth line containing a description for the third line
[third and forth may repeat]
[product set summary line]
[ repeat for next product]
this part may be a footer for the page that needs skipped
this part may be a footer for the page that needs skipped
at any point, the products will span between pages,
having header and footer lines between product data.
I need to scrape an old mainframe text file containing Printer Control Language (PCL) for a data import. Altering the mainframe functions isn't an option. The print out contains product sales information and has a hierarchical output.
My hope is that I setup a Sql Server Integration Service import (SSIS). Ultimately this will be a data import ASP.NET MVC 3 website with a SQL 2005 database, so we could avoid SSIS. I currently build C# ASP.NET MVC 3 websites, so using related technologies should be manageable.
Has anyone succeeded in parsing a text report back in to a useful data import with text patterns (like Regular Expressions) in C# or SSIS? Are there any examples out there using a state design pattern?
I find a lot of these answers showing a small part of the answer: how to load a text file and take the nth column in C#. This is more involved. I need to identify each line type with a pattern based on what import state I am within. Off the shelf software would be even better.
Text file example:
this part may be a header for the page which needs skipped
this part may be a header for the page which needs skipped
this part may be a header for the page which needs skipped
first line containing prices
second line containing product description for the first line
third line containing a related product (listing all flavors)
fourth line containing a description for the third line
[third and forth may repeat]
[product set summary line]
[ repeat for next product]
this part may be a footer for the page that needs skipped
this part may be a footer for the page that needs skipped
at any point, the products will span between pages,
having header and footer lines between product data.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我用 C# 做了很多解析。但是,在这里,我不清楚您需要解析哪种文本(您的示例似乎没有显示实际文本)。显然,您需要某种方法来识别每行的类型。
以下是一些可能有所帮助的文章:
文本解析助手类
.NET 的 sscanf() 替代品
I've done a lot of parsing in C#. However, here, it's not clear to me what kind of text you need to parse (your example doesn't appear to show the actual text). Obviously, you need some way to identify the type of each line.
Here are a couple of articles that may help:
A Text Parsing Helper Class
A sscanf() Replacement for .NET
我已经从事 cobol 集成多年了,我必须根据具有字段规范的“cobol 书”来破坏文本字符串。
您可以使用 agpc.fixedlayout 来帮助集成,而无需使用子字符串来获取有关每个字段的信息
这是 nuget https://www.nuget.org/packages/AGPC.FixedLayout
I've been worked some years with cobol integrations, I had to broken text strings based in a "cobol book" that had fields specifications.
You can use the agpc.fixedlayout to help integration without need to use substrings to get informations about each field
This is the nuget https://www.nuget.org/packages/AGPC.FixedLayout