数据仓库中数据转换的实际问题

发布于 2024-10-29 03:38:52 字数 85 浏览 1 评论 0原文

我需要解释将交易(和其他)数据从不同来源转换到数据仓库时可能遇到的实际问题。据我所知,这是关于清理和清理数据。如果有人知道任何实际问题请帮助我。谢谢你的帮助

i need to explain the practical problems that might be encountered when transforming their transactional (and other) data from their diverse sources into the Data Warehouse. according to my knowledge this is about cleansing and scrubbing data. if anyone knows about any practical problem please help me.thanks for your help

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

左岸枫 2024-11-05 03:38:52

这是一个广泛的主题,但我将提供一些很好的起点。

首先,想想历史。如果交易更​​新了某些数据点,您是否需要追溯应用该数据,或者您是否需要记住任何给定时间点的值是什么。例如,假设您有一份按城市划分的客户月度报告,并且您的一位客户搬家了。德国之声应该如何反映这一点?

考虑数据接受情况。每个输入行都是好的输入吗?例如,如果您正在处理网络数据,那么您可能不希望将爬网程序和垃圾邮件发送者与计算用户流量一样进行计数。

考虑数据同步。您的所有输入都使用相同的键吗?你知道它们之间如何翻译吗? A 团队所说的“cust_id”与 B 团队的意思是否相同?项目术语表在这里非常有帮助。

考虑本地化。您的输入都在同一时区吗?他们都使用相同的日历系统吗?你需要处理unicode吗?

考虑一下报告。您捕获的数据是否能够回答人们向 DW 提出的问题?如果不能,如何捕获可以的数据?

考虑一下演示。您是否应该向客户展示用于内部报告的相同数据?财务部门是否需要看到与营销部门不同的数据?

这实际上只是触及了主要 DW 项目中出现的问题的表面。我建议您参阅 Ralph Kimball 的有关数据仓库的各类书籍,以更深入地讨论问题和解决方案。希望这可以帮助您入门。

That's a broad topic, but I'll offer a few good starting points.

For starters, think about history. If a transaction updates some data point, do you need to apply that retroactively, or do you need to remember what the value was at any given point in time. For example, suppose you have a monthly report of customers by city, and one of your customers moves. How should the DW reflect that.

Think about data acceptance. Is every input row a good input? For example, if you're dealing with web data, there are crawlers and spammers that you might not want to count the same as you count user traffic.

Think about data synchronization. Do all your inputs use the same keys? Do you know how to translate between them? Does Team A mean the same thing by "cust_id" as Team B does? A project glossary is very helpful here.

Think about localization. Are you inputs all in the same time zone? Do they all use the same calendar system? Do you need to handle unicode?

Think about reporting. Are the data you're capturing able to answer the questions people will ask of the DW? If not, how can you capture data that can?

Think about presentation. Should you be showing customers the same data you're using for internal reporting? Does finance need to see a different slice of the data than marketing?

This really only scratches the surface of the issues that come up on a major DW project. I would refer you to Ralph Kimball's assorted books on Data Warehousing for a more in depth discussion of problems and solutions. Hope this helps you get started.

极致的悲 2024-11-05 03:38:52

你在问题中给出答案。

据我所知,这是关于清理和清理数据。

你是对的。清理数据意味着您拥有全公司范围内的干净元素属性列表,以及将不干净元素更改为干净元素的映射。

与创建公司范围的干净元素属性列表相比,根据干净元素属性处理数据是小菜一碟。

您必须让不同部门的人员就存储哪些数据以及每个元素的含义达成一致。这是一个困难的社会学问题。这并不是一个非常困难的技术问题。

祝您好运,获得全公司范围内的清洁元素属性列表。

You give the answer in your question.

According to my knowledge this is about cleansing and scrubbing data.

And you are correct. Cleansing data means that you have a company-wide list of clean element attributes, and a mapping that changes the unclean elements into clean elements.

Processing the data against the clean element attributes is a piece of cake compared to creating the company-wide list of clean element attributes.

You have to get people from different departments to agree on what data to warehouse, and to agree on what each element means. This is a difficult sociological problem. It's not a terribly hard technical problem.

Good luck getting your company-wide list of clean element attributes.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文