是否有资源可以获取一些真实的 ETL 示例?

发布于 2024-10-31 00:51:12 字数 589 浏览 1 评论 0 原文

我完全相信我正在做的工作的很大一部分属于 ETL 这个领域,但在 3 个月前我什至不知道这个术语的存在。我发现 SSIS 与我的技能有点不匹配,即我的直觉是,以深思熟虑的方式编写 C# 代码会给我带来我需要的结果(而且我的雇主也不拥有它)。我开始关注 WF,因为它看起来很合乎逻辑,但我回到了最初的结论,即我确实需要了解问题域的基础知识,当我这样做时,这将是最有意义的利用我的经验并在 .net/c# 中编写解决方案(我是一个单人团队,这似乎没有改变)。到目前为止,我有一些同步器实用程序的大杂烩,正是管理它们时开始出现的困难导致我去寻找这些知识。

问题 1 是:是否有资源可供我获取一些示例,说明如何将其组合在一起,例如:

  • 从具有使用限制的 REST 服务中提取 -->加载到数据库以实现(尽可能接近)实时(尽可能)同步
  • 从内部第 3 方应用程序(如 QuickBooks)中提取的内容 -->加载到数据库
  • 监视数据库的更改并以仔细跟踪的批次更新外部系统(即,提取的相同信息被 LOB 应用程序更改,然后需要推回)

问题 2 是:我还没有掌握T部分将发挥作用。到目前为止,我一直在提取代表一个系统中的逻辑实体的信息并将其推送到另一个系统中。

I'm fully convinced that a significant part of the work I'm doing falls into this domain of ETL, but I didn't even know the term existed before 3 months ago. I've found SSIS to be a bit of a mismatch for my skillset, i.e. my instincts are that writing C# code in a well thought out way will give me the result I need (also my employer doesn't own it). I started looking at WF because if seemed logical, but I came back to the original conclusion that I really need to understand the fundamentals of the problem domain, and when I do that it will make the most sense to leverage my experience and code the solution in .net/c# (I'm a one man team and that doesn't seem to be changing). So far I have a sort of hodge-podge of syncher utilities, and it was the difficultly that began arising in managing them all that led to seek out this knowledge.

QUESTION 1 is: is there a resource for me to get some examples of how it's all put together for things like:

  • extracting from REST services with usage limits --> loading to databases for purposes of (as close to) real time (as possible) synchronization
  • extracting from in-house 3rd party apps like QuickBooks --> loading to databases
  • monitoring for changes in database and updating external systems in carefully tracked batches (i.e. the same information that was extracted is changed by an LOB app and then needs to be pushed back)

QUESTION 2 is: I've yet to grasp where the T part will come into play. Thus far I've been pulling the information that represents logical entities in one system and pushing them into another.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

输什么也不输骨气 2024-11-07 00:51:12

我没有任何您所看到的确切场景的示例,但如果您想了解有关 ETL 本身的更多信息,您可以尝试查看 Ayende 的网站。他有一个非常易于使用的 ETL 流程框架,称为 Rhino ETL。还有一个视频展示了如何使用它

至于 T 部分的作用,T 代表 Transform(变换)。在这个过程中,您可以(但不一定必须)更改数据的形状。从一个数据源提取后,您可以添加或删除字段、聚合信息、将对象分解为表、将表映射为对象等。这部分是转换步骤。然后,您可以继续将数据加载到新的数据存储或系统中。

希望对一些人有所帮助。

I don't have any examples of the exact scenarios your looking at, but if you want to learn more about ETL itself, you can try taking a look at the articles on Ayende's site. He has an extremely easy to use framework for ETL processes called Rhino ETL. And a video showing how to use it.

As for where the T part comes in to play, the T stands for Transform. This is the step in the process where you can (but do not necessarily have to) change the shape of the data. After Extracting from one datasource, you can add or remove fields, aggregate information, break objects up in to tables, map tables into objects, etc. This part is the transform step. You then proceed to Load the data in to the new data storage or system.

Hope that helps some.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文