从 SQL Server 环境中的许多其他表构建平面文件

发布于 2025-01-07 22:37:27 字数 978 浏览 1 评论 0原文

抱歉,如果这是错误的地方,但我不确定哪个 Stack Exchange 站点适合此类问题。

我的任务是建立一个用于直接营销的地址池。我必须:

  • 合并不同的地址列表(所有这些都已在数据库中)
  • 根据某些规则纠正名称(例如,如果名称类似于“John & Mary Doe”,我需要将名称拆分为“John Doe”和“Mary” Doe")
  • 重复记录
  • 通过加入我们的客户数据库中的姓名和地址来查找客户编号
  • 加入多个提供客户信息的表(年龄、性别、客户价值、他/她已经从我们这里购买过哪些产品、他/她什么时候购买过)最近联系)
  • 加入用几个提供区域信息的表格(我们在该地区销售了多少产品,社会人口结构是什么,该地区的平均收入是多少)
  • 用所有这些信息构建一个平面表并制作它在同一个数据库中可用(没有来自/到其他系统或平面文件的输入/输出)
  • 安排它每晚运行

所以现在我正在考虑哪种是实现这一目标的最佳方法。我可以想到以下替代方案。

  1. 只需使用 UNION、JOINS 和 SELECT INTO 编写一个大型 SQL 批处理
    + 简单明了
    + 灵活
    - 难以阅读
    - 难以维护
    - 当我总是在步骤之间创建中间表时,速度很慢

  2. SQL Server Integration Services
    + 易于阅读和维护
    + 良好的错误处理
    + 自我文档
    + 视觉
    - 我需要学习它
    - 不确定它在窗口函数(排名、行号等)方面是否足够灵活
    - 不确定它是否比 SQL 更快或更慢
    - AFAIK 无法从数据流创建 SQL(以防万一)

  3. SQL Server Analysis Services
    - 我什至不知道它与 SSIS 有什么不同

  4. 其他方式?

谢谢

Sorry if this is the wrong place, but I'm not sure what Stack Exchange site is suitable for these types of questions.

I have a task to build a pool of adresses for direct marketing. I have to:

  • Merge different adress lists (all of which already in Database)
  • Correct names based on some rules (e.g if name is like "John & Mary Doe", I need to split the name as "John Doe" and "Mary Doe")
  • Deduplicate records
  • Find customer numbers by joining with our customer database on names and adresses
  • Join with several tables that supply customer information (age, gender, customer value, what products has he/she already bought from us, when was he/she contacted recently)
  • Join with several tables that supply regional information (how many products have we sold in that region, what is the sociodemographical structure, what is the average income of the region)
  • Construct a flat table with all those information and make it available in the same database (no input/output from/to other systems or flat files)
  • Schedule it to run every night

So now I am considering which is the best way of achieving this. I can think of following alternatives.

  1. Just write a big SQL batch with UNIONs, JOINS and SELECT INTOs
    + straightforward
    + flexible
    - hard to read
    - hard to maintain
    - slow when I always create intermediate tables between steps

  2. SQL Server Integration Services
    + easy to read and maintain
    + good error handling
    + self-documentation
    + visual
    - I need to learn it
    - not sure if it's flexible enough when it comes to window functions (ranking, row number, etc.)
    - not sure if it would be faster or slower than SQL
    - AFAIK no way to create SQL from data flow (just in case)

  3. SQL Server Analysis Services
    - I don't even know how it differs from SSIS

  4. Some other way??

Thanks

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

你的他你的她 2025-01-14 22:37:27

SSIS 顾名思义,对于来自许多不同来源的数据集成/转换非常有用。如果您在 SQL Server 上执行所有操作 - 使用存储过程。这应该足以满足您的需求。临时表在这里会非常有用。

另一方面...

SSIS 是非常强大的工具,因为所有数据流工作都是在内存中完成的,因此速度很快,但您需要担心内存消耗。这取决于数据量,但您还需要避免这里任何异步操作(分组、排序),因为它们很容易填满您的内存。如果数据流中有大量的转换、更新、计算等,SSIS 甚至可以更快。

但是...

你还有第三种选择。使用“数据流”和“执行 SQL 任务”组件将这两者结合起来。

SSIS as name says is useful for data integration/transformation from many different sources. If you are doing all stuff on SQL Server - use stored procedures. And that should be enough for your needs. Temporary tables will be very useful here.

On the other hand...

SSIS is very powerful tool because all data flow work is done in-memory, so it is quick but you need to worry about your memory consumption. It depends of amount of data but you also need to avoid here any asynchronous operations(grouping, sorting), because they can easily fill your memory. If there will be a lot of transrofmations, updates, calculations etc in data flow, SSIS can be even faster.

But...

You have third option here. Combine this two using "Data Flow" and "Execute SQL Task" components.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文