在 Biztalk 2006 中从非常大的 CSV 文件导入少量记录
我有一个 Biztalk 项目,它导入传入的 CSV 文件并将其转储到数据库表中。导入工作正常,但我只需要从一个超过一百万行的文件中保留大约 200-300 条记录。我的编排丢弃了这些行,但问题是我导入的平面文件仍然是 250MB,当使用常规平面文件管道转换为 XML 时,需要几个小时才能处理,有时会导致服务器耗尽内存。
我可以做些什么来让自定义管道本身丢弃我不关心的行吗?每个 CSV 行中的第一项是几个字符串之一,我只想保留以某个字符串开头的行。
感谢您提供的任何帮助。
I have a Biztalk project that imports an incoming CSV file and dumps it to a database table. The import works fine, but I only need to keep about 200-300 records from a file with upwards of a million rows. My orchestration discards these rows, but the problem is that the flat file I'm importing is still 250MB, and when converted to XML using a regular flat file pipeline, it takes hours to process and sometimes causes the server to run out memory.
Is there something I can do to have the Custom Pipeline itself discard rows I don't care about? The very first item in each CSV row is one of a few strings, and I only want to keep rows that start with a certain string.
Thanks for any help you're able to provide.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
自定义管道组件肯定是最好的解决方案;但它需要在反汇编器组件之前的解码阶段执行。
使其 100% 支持流式传输会很复杂(但肯定是可行的),但根据生成的修剪后的 CVS 文件的大小,您可以在自定义组件运行后立即预处理整个输入文件并生成结果如果它很小,则将其存储在内存中(在 MemoryStream 中),或者将它们写入文件,然后将生成的 FileStream 返回到 BizTalk 以从那里继续处理。
A custom pipeline component would certainly be the best solution; but it would need to execute in the decode stage before the disassembler component.
Making it 100% streaming-enabled would be complex (but certainly doable), but depending on the size of the resulting trimmed CVS file, you could simply pre-process the entire input file as soon as your custom component runs and either generate the results in memory (in a MemoryStream) if it's small, or write them to a file and then return the resulting FileStream to BizTalk to continue processing from there.