内存中数据集的替代方案
我正在将一个复杂的流程从 SQL 移至 .NET 应用程序。我采取了一种蛮力方法,仅从 SQL 中提取所需的数据,然后存储在数据表中。使用带有步进的管道模式,我分解了可以并行完成的进程(不依赖于其他进程,也不处理相同的数据位)。
一切都很顺利,但我想知道是否有一个内存中的 sql 解决方案比 DataSet/DataTable 结构的性能更好。我们谈论的是一次 50k 行,最多支持 1m 行数据(读取 5b 行)。 1 行(包含所有支持数据行)的行大小平均约为 1K(由于字符串较大)。
我的问题具体是关于数据集的性能、内存开销和持久性。我需要将每个阶段的数据序列化到磁盘以进行恢复。
将行映射到强类型模型会更好吗?我不需要数据集的任何关系或其他好处;我使用并行处理用我自己的搜索功能替换了大部分搜索功能。
数据仅使用原始类型,没有 blob、流、地理等。
I am moving a complex process out of SQL to a .NET application. I'm kind of taking a brute force approach by pulling down only the data that is needed from SQL, then storing in datatables. Using a pipeline pattern with stepping, I broke out the processes that can be done in parallel (not dependent upon the other processes, nor working on the same data bits).
Everything is going fine, but I want to know if there is an in-memory sql solution that would perform better than the DataSet/DataTable structures. We're talking about 50k rows at a time with up to 1m supporting data rows (read 5b rows). Row size for 1 row (with all supporting data rows) is probably around 1K avg (due to large strings).
My question is specifically on the performance of DataSets, memory overhead and persistence. I will need to serialize the data at each stage to disk for recovery purposes.
Would it be better to just map the rows to a strongly typed model instead? I don't need any relationships or other benefits of datasets; I replaced most of the search functionality with my own using parallel processing.
Data only uses primitive types, no blobs, streams, geography etc.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
对于本地 SQL 操作,请查看 SQLite。我不记得它是否可以专门在内存中运行,但在磁盘缓存和事务日志关闭的情况下,它可能非常接近。
Steve Shaunesse 多年前在 Borland 开发了一个快速内存 SQL 引擎。不知道有没有产品化过。浏览一下 Embarcadero.com 网站,看看他的作品是否有残留。
我注意到 aidaim.com 宣传了一个内存 SQL 引擎。没有这方面的经验,仅供参考。
还可以考虑使用 LINQ 进行内存中查询操作。在我看来,如果您注意自己正在做的事情,就会发现 LINQ 的查询组合和延迟执行非常适合大数据。并且,不需要 SQL 解析器。
For local SQL operations, take a look at SQLite. I don't recall offhand if it can run exclusively in memory, but with disk caching and transaction journalling turned off it would probably be pretty close.
Steve Shaunessey developed a fast in-memory SQL engine at Borland many years ago. I don't know if it was ever productized. Take a look around the Embarcadero.com web site to see if any remnants of his work have survived.
I noticed aidaim.com advertises an in-memory SQL engine. No experience with it, just FYI.
Also consider LINQ for in-memory query operations. If you pay attention to what you're doing, LINQ's query composition and deferred execution work well with large data, IMO. And, no SQL parser required.