SSIS:导入文件,有些带有列名,有些没有

发布于 2024-12-08 18:39:27 字数 548 浏览 0 评论 0原文

或者可能是由于日志设备的配置不一致,我需要通过 SSIS 加载 csv 文件的集合,该集合有时会具有带有列名称的第一行,有时则不会。文件格式在其他方面是相同的。

日志记录配置似乎有可能标准化,因此我不想将编程时间浪费在打开每个文件并确定它是否具有标题行,然后根据情况进行不同处理的脚本任务上。

相反,我想指定类似 Destination.MaxNumberOfErrors 的内容,这将允许每个文件最多一个错误行(因此,如果文件中的唯一问题是标头,则它不会失败)。不过,平面文件源错误是致命的,所以我没有找到让它继续运行的方法。

失败代码的含义是由组件定义的,但是 错误是致命的,管道停止执行。可能有错误 在此之前发布的消息包含有关失败的更多信息。

我最好的选择似乎是暂时忽略第一个数据行,然后看看是否可以实现更统一的配置。当然,当该策略到位时,数据集是无效的。我应该补充一点,数据非常大,因此 ETL 例程需要尽可能高效。在我看来,如果有任何替代方案,这会禁止任何文件解析或条件分割。

问题是是否有办法配置文件源以继续解决此致命错误?

OrPresumably due to inconsistent configuration of logging devices, I need to load a collection of csv files via SSIS that will sometimes have a first row with column names and will sometimes not. The file format is otherwise identical.

There seems a chance that the logging configuration can be standardized, so I don't want to waste programming time with a script task that opens each file and determines whether it has a header row and then processes it differently depending.

Rather, I would like to specify something like Destination.MaxNumberOfErrors, that would allow up to one error row per file (so if the only problem in the file was the header, it would not fail). The Flat File Source error is fatal though, so I don't see a way of getting it to keep going.

The meaning of the failure code is defined by the component, but the
error is fatal and the pipeline stopped executing. There may be error
messages posted before this with more information about the failure.

My best choice seems to be to simply ignore the first data row for now and wait to see if a more uniform configuration can be achieved. Of course, the dataset is invalid while this strategy is in place. I should add that the data is very big, so the ETL routines need to be as efficient as possible. In my opinion this contraindicates any file parsing or conditional splitting if there is any alternative.

The question is if there is a way to configure the File Source to continue from this fatal error?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

伏妖词 2024-12-15 18:39:27

是的,有!

在编辑器的“错误输出”页面中,将每行的错误响应更改为“重定向行”。然后,您可以通过源的错误输出将问题行(在您的情况下为标题)作为单个列来捕获它们。

Yes there is!

In the "Error Output" page in the editor, change the Error response for each row to "Redirect row". Then you can trap the problem rows (the headers, in your case) by taking them as a single column through the error output of your source.

回忆那么伤 2024-12-15 18:39:27

如果您可以假设标头名称的值永远不会出现在数据中,则将平面文件连接管理器定义为没有标头。数据流中的第一步将检查第 1-N 列的值与标题行的值。仅当值不匹配时才让数据流过。

还有比这更复杂的问题吗?

If you can assume the values for header names would never appear in your data, then define your flat file connection manager as having no headers. The first step inside your data flow would check the values of column 1-N vs the header row values. Only let the data flow through if the values don't match.

Is there something more complex to the problem than that?

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文