从多个外部系统构建数据集市的最佳方式?

发布于 2024-09-27 06:53:43 字数 275 浏览 11 评论 0原文

我正处于构建用于邮件/电子邮件/SMS 联系信息和历史记录的 SQL Server 数据集市的规划阶段。每条数据位于不同的外部系统中。因此,电子邮件地址没有帐号,短信电话号码没有电子邮件地址等。换句话说,不存在共享主键。有些数据重叠,但我无能为力,只能在出现重复时保留最完整的版本。

是否有使用这些数据构建数据集市的最佳实践?创建一个每个外键都有一列的键表是可以接受的做法吗?然后,可以分配唯一的主 ID 以将其与其他数据集市表联系起来。

寻找我可能还没有想到的方法的想法/建议。

谢谢。

I'm in the planning stages of building a SQL Server DataMart for mail/email/SMS contact info and history. Each piece of data is located in a different external system. Because of this, email addresses do not have account numbers and SMS phone numbers do not have email addresses, etc. In other words, there isn't a shared primary key. Some data overlaps, but there isn't much I can do except keep the most complete version when duplicates arise.

Is there a best practice for building a DataMart with this data? Would it be an acceptable practice to create a key table with a column for each external key? Then, a unique primary ID can be assigned to tie this to other DataMart tables.

Looking for ideas/suggestions on approaches I may not have yet thought of.

Thanks.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

吾性傲以野 2024-10-04 06:53:43

电子邮件地址或电话号码本身听起来像是一个合适的业务密钥。通常,“临时”数据库用于从多个源加载数据,然后分配代理键并执行其他转换。

您熟悉数据仓库方法和设计模式吗?如果您没有先前的知识或经验,请考虑雇用一些帮助。 BI/数据仓库项目的失败率非常高,而且错误的代价可能会很高。

The email address or phone number itself sounds like a suitable business key. Typically a "staging" database is used to load the data from multiple sources and then assign surrogate keys and do other transformations.

Are you familiar with data warehouse methods and design patterns? If you don't have previous knowledge or experience then consider hiring some help. BI / data warehouse projects have a very high failure rate and mistakes can be expensive.

飘落散花 2024-10-04 06:53:43

好吧,如果没有其他信息将不同的部分连接在一起,您的数据集市将非常初级。您将能够获取数据类型(短信、电子邮件、邮件)以及每种类型随时间变化的指标(“本周/月/季度/年,我们平均每天发送 42.5 条短信,每月发送 8000 封电子邮件!w00t !”)。如果只有电话号码和电子邮件地址,您的“其他数据集市”可能必须是电话公司名称或互联网域。我想您可以将其链接到某种地理信息(互联网提供商位置?),或者可能是公司的财务信息。如果您还不知道自己想要朝哪个方向前进,那就有点模糊了。

老实说,这听起来像是某个高层对“数据集市”这个流行词做出了下意识的反应,再加上听到了一些有关通信指标有多么重要的信息,所以他们向整个链条发送了命令,“让我们一些数据集市对我们所有的电子邮件进行统计!”

您需要弄清楚您或您的雇主希望从该项目中获得什么,然后弄清楚您当前收集的数据是否可以为您提供追踪该信息的线索。现在听起来你正在倒退(“我有这些数据,它有什么用?”)。您当前完全有可能没有所需的数据,这意味着您需要购买它(谁知道是否可以)或开始收集它,在这种情况下您将不会有漂亮的图表和趋势 -供高层管理人员查看一段时间的台词...与 dportas 在第二段中给您的警告完全一致;)

Well, with no other information to tie the disparate pieces together, your datamart is going to be pretty rudimentary. You'll be able to get the types of data (sms, email, mail), metrics for each type over time ("this week/month/quarter/year we averaged 42.5 sms texts per day, and 8000 emails per month! w00t!"). With just phone numbers and email addresses, your "other datamarts" will likely have to be phone company names, or internet domains. I guess you could link from that into some sort of geographical information (internet provider locations?), or maybe financial information for the companies. Kind of a blur if you don't already know which direction you want to head.

To be honest, this sounds like someone high-up is having a knee-jerk reaction to the "datamart" buzzword coupled with hearing something about how important communication metrics are, so they sent orders on down the chain to "get us some datamarts to run stats on all our e-mails!"

You need to figure out what it is that you or your employer is expecting to get out of this project, and then figure out if the data you're currently collecting gives you a trail to follow to that information. Right now it sounds like you're doing it backwards ("I have this data, what's it good for?"). It's entirely possible that you don't currently have the data you need, which means you'll need to buy it (who knows if you could) or start collecting it, in which case you won't have nice looking graphs and trend-lines for upper-management to look at for some time... falling right in line with the warning dportas gave you in his second paragraph ;)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文