简单的批量数据持久化框架

发布于 2024-10-04 19:26:00 字数 684 浏览 1 评论 0原文

是否有一个用于批量数据持久化的 ACID 框架,它还允许一些基本的搜索功能?我不是在寻找一个完整的 DBMS,而是寻找快速、轻便且简单的东西。即使只是处理原子提交的东西也很棒,只是为了避免在电源故障的情况下重新发明它。

SQL Server 对此来说太慢并且开销太大,SQLite 甚至更慢(开销可能更少?)。

基本上,我需要每秒存储大量带时间戳的数据。作为标准化数据,这将对应于 ~10k 表行,但作为二进制数据,它可以使用 ~200kb 表示。显然,与向关系数据库写入 10k 行相比,向磁盘写入 200kb 是小菜一碟。

我可以简单地将其保存在一个或多个大型二进制文件中,然后实现一些我自己的索引,以允许对某些字段进行快速过滤,但唯一让我害怕的是非原子事务和读/写锁定场景。

有什么建议吗?顺便说一句,我使用的是 C#,所以任何带有 .NET 包装器的东西都是首选。

[编辑]关于ACID,我刚刚发现了这个,例如:事务处理的托管包装器NTFS(尽管 TxF 是“Vista 及更高版本”功能)。

Is there an ACID framework for bulk data persistance, which would also allow some basic search capabilities? I am not looking for a full blown DBMS, but rather something fast, light and simple. Even something which would just take care of atomic commits would be great, just to avoid reinventing this in case of power failure.

SQL Server is too slow for this and has too much overhead, SQLite is even slower (with potentially less overhead?).

Basically, I need to store large quantities of timetamped data each second. As normalized data, this would correspond to ~10k table rows, but as binary data, it can be represented using ~200kb. Obviously, writing 200kb to disk is a piece of cake compared to writing 10k rows to a relational database.

I could simply persist it in one or more large binary files, and then implement some indexing of my own to allow fast filtering on certain fields, but the only thing that frightens me are non-atomic transactions and read/write locking scenarios.

Any recommendations? I am using C# btw, so anything with .NET wrappers would be preferred.

[Edit] Regarding ACID, I just found this, for example: Managed wrapper for Transactional NTFS (although TxF is a "Vista and later" feature).

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

寄离 2024-10-11 19:26:00

传统的基于 SQL 的存储将提供 ACID,但是许多批量更新会很慢。另一方面,NoSQL 解决方案/键值存储通常不会为您提供可靠的事务或某种方式来无缝索引以通过单个键以外的其他方式进行快速查找。因此,我们需要结合这两种方法优点的东西。

我会考虑使用 CouchDB(带有 RESTful API 的 NoSQL 映射/减少基于文档的数据库)并采用以下策略:CouchDB 在原子保存多个文档方面没有事务,但是当它要保存单个文档时 - 它是超级可靠和原子,还允许多版本并发控制。

因此,如果您有 10000 条记录,每条数据量约 200-300 kB,您可以将其保存为单个文档。这对您来说可能听起来很奇怪,但事实是您可以在文档集合之上构建视图,这些视图实际上是增量索引。并且一个文档可能会产生多个查看结果。视图是用 javascript 编写的(在文档创建/更新时仅评估一次),因此您可以根据需要对它们进行索引 - 通过关键字、数值、日期 - 几乎可以使用 javascript 执行的任何操作。获取视图结果非常快,因为它们已预先索引到 B+ 树中。

这种方法的优点:

  • CouchDB 使用 HTTP 上的 JSON 作为其数据传输协议,因此您可以使用任何 HTTP 客户端或 REST 客户端或本机 C# 包装器(周围有多种可用的包装器)
  • 您对该 200 kB 文档的批量插入将是原子的,并且接受单个 HTTP 请求
  • 您的插入将是异步的,因为它只是一个 HTTP。
  • 您将拥有 MVCC - CouchDB 在并发性方面非常出色,因此您将忘记任何锁或其他东西。

给它一个机会——它节省了我大量的时间。

Traditional SQL-based storages will provide ACID, however bulk updates of many will be slow. From the other side NoSQL solutions/key-value stores usually won't provide you with reliable transactions or with some way to index seamlessly for fast lookups by something else than just a single key. So we need something that combines benefits of both approaches.

I would consider using CouchDB (NoSQL map/reduce document-based DB with RESTful API) and adopt the following strategy: CouchDB doesn't have transactions in terms of saving multiple document atomically, however when it goes about saving a single document - it is super-reliable and atomic, also allowing multi-version concurrency control.

So if you have 10000 records data bulks ~200-300 kB each you can save it as a single document. It may sound strange for you, but the thing is you can build views on top you document collections which are actually incremental indexes. And one document may produce multiple view results. Views are written in javascript (which is evaluated only once on document creation/update), so you can index them as you want - by keywords, numeric values, dates - virtually anything you can do with javascript. Fetching view results is very fast, cuz they are preindexed into the B+-tree.

Benefits of this approach:

  • CouchDB uses JSON over HTTP as its data transport protocol, so you can use any HTTP client or REST client or a native C# wrapper (there are several available around)
  • Your bulk insert of that 200 kB document will be atomic and take a single HTTP request
  • Your insert will be async, because it's just an HTTP.
  • You will have MVCC - CouchDB is very good about concurrency, so you will forget about any locks or smth.

Just give it a chance - it saved me tons of time.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文