当前位置：文江博客话题详情

如何从 DTS/SSIS 导入中删除控制字符？

发布于 2024-09-07 06:44:38 字数 289 浏览 5 评论 0原文

我工作的公司有很多系统，其中源数据来源于平面文件或 Excel 工作簿。然后使用 DTS 将这些文件导入到 SQL Server 数据库中。

这些文件经常包含各种控制字符（例如，我刚刚花了一个小时发现某些记录中有一个 \0 ，显然 SAP 喜欢用它来填充内容），理想情况下我们希望在它们进入我们的数据库之前将其删除。

我没有大量使用 DTS/SSIS，但我认为它们中一定已经有一些东西，或者至少是我应该使用的消毒最佳实践？

任何有关如何在许多导入中以可重复的方式执行此操作的想法将不胜感激。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

度的依靠╰つ 2024-09-14 06:44:38

我只能根据我使用 SSIS 的经验来发言，所以如果这只是一点用处，我深表歉意。

在一般卫生方面，我还没有遇到任何适合这项工作的专用组件。我认为这是因为不同的场景对“坏数据”的定义可能有很大不同，但可以说，诸如删除不可打印字符之类的事情可能是常见的业务需求，因此这有点令人惊讶。

当然，您始终可以依靠派生列转换，或者在更复杂的情况下，依靠脚本组件，但这些很难使跨包的逻辑复制变得轻松。在某些情况下，我们使用 CozyRoc 组件来创建可以在包之间共享的脚本，但除此之外由于首先必须实际访问这些组件，因此每次需要该功能时仍然需要进行一定级别的设置。

因此，在这种情况下，我个人认为最好的选择是将常见的卫生习惯合并到自定义数据流组件。我们为自己开发了一个来处理一些非常常见的数据验证逻辑，并设置组件编辑器，以便您可以选择复选框来启用您想要应用于给定列的任何检查。然后，只需拖放并连接数据流即可。至少对我们来说，花时间开发这个组件绝对是值得的。

我不知道 DTS 在这方面提供了什么（我认为它相当有限，或者至少不友好，但我可能是错的），所以这可能对您的旧包没有用处。我觉得凯德的解决方案在这种情况下可能是最没有压力的，假设它对于您的输入数据是可行的。

就一般 SSIS 见解而言，我找到了 Jamie Thomson 的博客（以前是此处）非常有帮助，因此您可能需要检查一下他之前是否讨论过这个特定主题。

无论如何，希望这至少有一点帮助，祝你好运找到解决方案。

附注：感谢您希望在将数据放入数据库之前对其进行清理。我们从一个永远不清理数据的特定供应商处获取数据，然后尝试将其放入 XML 中以通过 Web 服务将其发送给我们。不用说，XML 解析器不太喜欢不可打印的字符。

I can only speak coming from my experiences with SSIS, so I apologise if this is only marginally useful.

In terms of general sanitation, I haven't come across any sort of purpose-build component for this job. I assume that's due to the notion that different scenarios can have widely different definitions of "bad data", but arguably things like stripping out non-printable characters might be a common business requirement so this is somewhat surprising.

Naturally you can always fall back on the Derived Column Transformation or, in more complex cases, the Script Component, but these hardly make replication of the logic across packages painless. There are cases where we've used CozyRoc components to create scripts that we could share between packages, but in addition to having to actually have access to those components in the first place, it still involves a certain level of setup every time you need the functionality.

So, in cases like this, I personally feel like the best option is to consolidate your common sanitation routines into a Custom Data Flow Component. We developed one for ourselves to handle some very common data validation logic, and set the component editor up so that you could select checkboxes to enable whichever checks you wanted applied to a given column. Then it just becomes a matter of drag-and-drop and wiring up your data flow. At least for us, it was definitely worth taking the time to develop the component.

I don't know what DTS offers in this regard (I think it's fairly limited, or at least not friendly, but I could be wrong), so this is likely not useful for your legacy packages. I feel like Cade's solution would probably be most stress-free in that case, assuming it's feasible for your input data.

As far as general SSIS insight goes, I've found Jamie Thomson's blogs (previously here) very helpful, so you might want to check to see if he's discussed this particular topic before.

In any case, hopefully this was at least a little helpful, and good luck coming up with a solution.

On a side note: Kudos for wanting to sanitize your data before sticking it in your database. We get data from a particular vendor who doesn't ever clean their data, then tries to shove it in XML to send it to us via a web service. Needless to say, the XML parser isn't too fond of non-printable characters.

回复收藏 0 原文