SSIS 或 SQL Server 中的字符级分析
我需要分析数据库中的参考字段以了解它们组成的模式。这需要在字符级别完成,因为参考字段中不会有空格或标点符号。
作为一个例子,我正在寻找一个将接受如下输入的解决方案:
ABA1235DV6778 ABA1235DV6788 ABA2335DV6778
并建议如下模式:
ABA\d\d35DV67\d\d
一旦我能够理解这些列中的允许值,这将用于稍后验证这些参考字段。
我查看了 SSIS 中的分析功能,但它似乎缺乏粒度。有谁知道我如何调整 SSIS 2008 中的分析或具有可用于实现此目的的 SQL Server 2008 的有效功能?
任何帮助将不胜感激,
尼尔
I need to profile reference fields in a database to understand the patterns they are composed of. This needs to be done at a character level as there will be no spaces or punctuation in the reference fields.
As an example I'm looking for a solution that will take input like:
ABA1235DV6778
ABA1235DV6788
ABA2335DV6778
And suggest patterns like:
ABA\d\d35DV67\d\d
This will be used to later validate those reference fields once I can understand the permissable values in those columns.
I have looked at the profiling functionality in SSIS but it seems to lack granularity. Does anybody know how I can tune the profiling in SSIS 2008 or have an efficient function for SQL Server 2008 that can be used to achieve this?
Any help would be greatly appreciated,
Niall
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
从您的帖子中并不清楚您想要对字符串应用什么逻辑。我猜您想使用某种形式的 编辑距离 计算来识别相似的字符串,然后 < a href="http://www.regular-expressions.info/regexmagic.html" rel="nofollow">生成与它们全部匹配的正则表达式。这些任务通常是在用适当语言编写的外部程序中实现的,而不是在 SSIS 或 SQL Server 中实现的。这当然不是您可以使用预先存在的 SSIS 功能完成的事情。
因此,我现在会忘记 SSIS,并找出在 .NET(或您熟悉的任何其他语言)中实现算法的最佳方法。完成此操作后,您可以决定是否:
It's not really clear from your post exactly what logic you want to apply to the strings. I'm guessing you want to use some form of edit distance calculation to identify similar strings, then generate a regular expression that matches them all. Those are typically tasks that would be implemented in an external program written in an appropriate language, not in SSIS or SQL Server. It is certainly not something you can do with pre-existing SSIS functionality.
So I would forget SSIS for now and work out the best way to implement your algorithm in .NET (or whatever other language you're comfortable with). Once you've done that you can decide whether to: