使用实体框架进行数据流传输
我正在为数据仓库设计一个 ELT 系统,并且想知道从源数据库中提取数据的最有效且(在某种意义上)仍然安全的方法是什么。
我需要从源数据库中读取几个表,将其组织成我可以有效使用的 POCO 对象。这些大致对应于我的立方体的尺寸。为了将事实放入我的多维数据集中,我需要从其他表批量加载大量数据,对它们进行一些(重要的)转换,并将它们写入目标数据库中的表中。
虽然原则上我只会从 O/RM 功能的一小部分中受益,但我无论如何想知道使用实体框架是否可以作为一种选择。因此,我的问题是 EF(最新版本)是否可以处理流数据。我的意思是,我保持某种 DataReader 打开,加载几个 POCO,对它们进行转换,将结果写入第二个数据库,尽快处理它们(我不能将它们全部保存在内存,因为它会爆炸)并继续阅读,直到我读完。
显然,我不需要对这些对象进行任何更改管理,并且我想让它们(至少是具有事实的第二类)仅在短时间内保持活动状态,并在同一事务中处理它们。处置对我来说不仅意味着摆脱 POCO,而且 EF 不会保留任何基础设施,也不会再在任何这些对象上浪费哪怕一个字节的内存。
我认为使用 O/RM 的优点是它可以在一定程度上简化查询和转换,但我不愿意牺牲太多性能,而且我受到可以消耗的总体内存量的限制。选择 EF 有意义吗?还是应该继续使用普通的旧 ADO.NET DataReader?
I'm designing an ELT system for a data warehouse and was wondering what is the most effective while still (in some sense) safe way of extracting the data from the source database.
I need to read a couple of tables from the source database, organize it into POCO objects that I can work with effectively. These roughly correspond to the dimensions of my cube. To get the facts into my cube, I need to bulk load huge amounts of data from other tables, make some (non-trivial) transformations on them, and write them into a table in the target database.
Although in principle I would only benefit from a small subset of O/RM features, I'm anyway wondering whether using Entity Framework could be an option. Therefore, my question is whether EF (in its newest version) can handle streaming data. What I mean by that is that I keep some kind of a DataReader open, load a couple of POCOs, make transformation on them, write the results into the second database, dispose them all as soon as I can (I cannot keep them all in memory cause it would blow up) and continue reading until I'm done.
I obviously don't need any change management for these objects and I want to keep them (at least the second category with facts) alive only for a short period of time and dispose them while still in the same transaction. Disposing means for me that not only I get rid of POCOs, but that EF will not keep any infrastructure and not waste even a single byte of memory on any of those objects anymore.
The advantages that I see in using O/RM is that it could simplify querying and transformation to some extent, but I'm not willing to sacrifice too much performance and I'm limited by the overall memory amount that I can consume. Does it make sense to go for EF or should I better stay by plain old ADO.NET DataReader ?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
使用 BLT 工具包。我们这样-非常好。仅具有适合 ETL 的小子集。就像不记得它在事务中获得了哪些对象等。
如果你使用 EF,你就死定了。 ORM 不适用于数据加载,它们适用于业务对象。当您移动 1000 万个对象时,许多更高级别的功能(独特等)都会带来巨大的代价;)
Use BLTOolkit. We so that - very nice. ONLY has the small subset that is good for ETL. Like not remembering which objects it got in a transaction etc.
If you use EF, you are dead. ORMs are NOT for data loads, they are for business objects. A lot of the higher level features (uniquing, etc.) comes with a HUGE price the moment you move 10 million objects ;)