如何在 SSIS 或其他工具中处理带有许多抑制列的不规则的右文本文件?

发布于 2024-07-07 16:21:58 字数 474 浏览 6 评论 0原文

参差不齐的右侧平面文件选项非常适合以 CRLF 终止的单个尾随字段,但是当任意数量的空白尾随字段被抑制并且该行以 CRLF 提前终止时,我在 SSIS 中找不到其他选择,只能阅读一些内容列包含单个“REMAINDER”列,然后使用带有 SUBSTRING 操作的派生列转换来一一提取“可选”列。

我想到的一种可能性是,在连接管理器获取该线路之前,是否有任何方法可以将该线路通过 SSIS 中的“padder”组件。 然后,您可以获得使用连接管理器设置传入列长度的所有好处。 我总是可以创建一个外部程序(或一个完全独立的 SSIS 数据流)来传递一个文件,通过该文件将用空格填充每一行(并且用参差不齐的右侧 - 只需添加相同的最小数量的空格来解释每个行的所有抑制字段行就足够了),但这似乎相当浪费磁盘空间,并且还需要再次完全读取(和写入)文件一次。

或者,也许有一个第三方数据源组件,如果该行提前终止,它会自动将其简单地设置为 NULL 或空白和列。

还有其他选择吗?

The ragged right flat file option is great for a single trailing field terminated with a CRLF, but when any number of trailing fields which are blank are suppressed and the line terminated early with CRLF, I have found no alternative in SSIS but to read a few columns in with a single "REMAINDER" column and then used a Derived Column transform with SUBSTRING operations to extract the "optional" columns one by one.

One possibility which has occurred to me is if there is any way to pass the line through a "padder" component within SSIS prior to the connection manager getting a hold of it. Then you could get all the benefits of using the connection manager to set up the incoming column lengths. I could always create an external program (or a completely separate SSIS dataflow) to pass a file through which would pad every row out with spaces (and with ragged right - just adding the same minimum number of spaces to account for all suppressed fields to every line would be sufficient), but that does seem rather wasteful of disk space and also require the file to be read (and written) completely one more time.

Alternatively, perhaps there is a third party data source component which will simply NULL or blank and columns automatically if the line is terminated prematurely.

Any other options?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

家住魔仙堡 2024-07-14 16:21:58

我认为你提出的解决方案听起来很合理。 磁盘空间很便宜。 在导入数据之前清理格式的两步过程是完全可以接受的。

I think your proposed solution sounds reasonable. Disk space is cheap. It is perfectly acceptable to have a two-step process in which you clean the formatting before you import the data.

冰火雁神 2024-07-14 16:21:58

使用脚本组件将列分隔为数据源。

Use a script component to separated the column as the data source.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文