.NET ETL 流程
首先是一些背景知识;我们正在开发一个数据仓库,并对 ETL 流程使用哪些工具进行一些研究。该团队非常以开发人员为中心,每个人都熟悉 C#。到目前为止我已经研究过RhinoETL、Pentaho (Kettle)、Astrix Centerprise。 SSIS 因多种原因而退出,这些原因超出了本问题的范围。
目前,我倾向于像 RhinoETL 这样更面向开发人员的东西,因为它对于一组开发人员来说似乎是阻力最小的路径。其他更面向视觉设计师的产品是否能带来 RhinoETL 所没有的东西?在评估这些 ETL 工具时,我应该注意哪些具体事项?我们还应该研究其他任何工具吗?
First some background; we are developing a datawarehouse and doing some research on what tools to use for our ETL process. The team is very developer centric, everyone is knowledgeable with C#. So far I have looked at RhinoETL, Pentaho (Kettle), Astrix Centerprise. SSIS is out for a number of reasons which are outside the scope of this question.
At this time, I am leaning towards something more developer oriented like RhinoETL because it seems like the path of least resistance for a group of devs. Do the other more visual designer oriented products bring anything to the table that RhinoETL doesn't? Are there any specific things I should be paying attention to when evaluating these ETL tools? Are there any other tools that we should also investigate?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我知道这是一个迟到的答案,但由于我需要一个具有所有 SSIS 功能但在 100% .net 环境中的合适的 Elt,所以我开始开发自己的 Elt。
当然,性能不如SSIS。我相信,如果你想要大量的性能来集成和转换,你仍然应该使用SSIS。
我真正需要的主要东西是一个适当的跟踪系统,没有其他类似的 etl 工具(例如 RhinoEtl 提供),该系统允许跟踪任何单个细节,并且在必要时可以轻松操作以进行记录。我为文件系统、ftp、sftp、xml、csv、entityframework 核心和批量加载制作了很多开箱即用的适配器。我什至想出了一个可视化工具来查看转换过程的结构。
到目前为止我花了10个月的时间,我把它开源了。它仍然缺乏大量文档(需要付出巨大的工作量才能实现)。我必须通过一组更大的单元测试来完成它(也需要完成大量的工作),以便我能够在测试版本中体面地发布它。即使我仍然将其保留在 alpha 版本中,它也是我公司所有 ETL 流程的基础,并且运行得非常好!
I know this is a late answer, but as I needed a proper Elt with all SSIS features but in a 100% .net environment, I came up developing my own.
For sure, performances are not as good as SSIS. I believe that if you want massive performances for huge volumes to integrate and transform, you should still use SSIS.
The main thing that I really needed that no other kinda-etl tool like RhinoEtl provides, is a proper tracing system that permits to have traces of any single details that is easily manipulate to record if necessary. I made lot of out of the box adapters for file system, ftp, sftp, xml, csv, entityframework core and bulk load. I even came up with a visual tool to view the structure of the transformation process.
It took me 10 months so far, and I open sourced it. It still lacks a lot of documentation (huge work to achieve). I must complete it with a much bigger set of unit tests (also huge work to achieve) for me to decently release it in beta version. Even if I still left it in alpha version, it is the foundation of all ETL processes of my company, and it works like hell!
最近我和我的同事在RhinoETL和SSIS之间做了一些简单的性能测试。对于简单的数据流,SSIS 似乎总是优于 RhinoETL(移动 2,000,000 条记录的速度大约快 30%)。如果您使用源代码管理(在我们的示例中为 TFS),则无法轻松查看 dtsx 文件(SSIS 文件)版本之间的差异,而使用 RhinoETL 进行开发允许您利用 TFS 功能。
如果您在数据仓库之上开发用户界面,则可以看到 RhinoETL 的另一个优势。您可以在这两个程序之间共享代码。
尽管我们 SSIS 团队的一些成员来自 .Net 背景,但我们的管理层决定继续使用 SSIS 进行开发(尽管他们升级到了 SSIS 2008——完全是另一个主题),因为他们觉得让开发人员学习 SSIS 比学习 .Net 更容易。网。
Recently my coworker and I did some simple performance testing between RhinoETL and SSIS. It seem that for simple data flows SSIS always outperformed RhinoETL (moves 2,000,000 records about 30% faster). If you are using source control (in our case TFS), you can not easily view differences between versions of dtsx files (SSIS files), where developing with RhinoETL allows you to utilize TFS features.
Another advantage RhinoETL has is seen if you develop a User Interface on top of your data warehouse. You can share code between these two programs.
Although several of the members of our SSIS team come from .Net backgrounds, our management decided to continue developing with SSIS (although they upgraded to SSIS 2008 --another topic altogether) because they felt it was easier to have a developer learn SSIS than .Net.