正在寻找一个易于使用的 C++ 嵌入式键值存储;
我需要编写一个 C++ 应用程序来读取和写入大量数据(超过可用 RAM),但始终按顺序进行。
为了以面向未来且易于记录的方式保存数据,我使用了 Protocol Buffer。然而,协议缓冲区不能处理大量数据。
我之前的解决方案包括为每个数据单元创建一个文件(并将它们全部存储在一个目录中),但这似乎不是特别可扩展。
这次我想尝试使用嵌入式数据库。为了具有类似的功能,我只需要存储键->值关联(因此 sqlite 似乎有点矫枉过正)。值将是协议缓冲区的二进制序列化输出。
我希望数据库能够管理“什么保留在内存中,什么移动到磁盘 asp”问题,“如何有效地在磁盘上存储大量数据”问题,并且理想情况下,优化我的顺序读取模式(通过读取预先输入下一个条目)。
在寻找替代方案时,我对缺乏替代方案感到惊讶。我不想将数据库保留在单独的进程中,因为我不需要这种分离(这排除了 redis)。
我发现的唯一选择是 Berkeley DB,但它有一个令人不快的低级 C api。然后,我发现最好的选择是 Berkeley DB 之上的 stldb4。该 API 看起来相当不错,符合我的需求。
然而我很担心。 stldb4 似乎很奇怪(它依赖于 libferris 的东西),未维护的解决方案(最后一次发布是一年前),对于我认为很常见的问题。
你们中有人对如何处理这个问题有更好的建议吗?
感谢您的回答。
I need to write a C++ application that reads and writes large amounts of data (more than the available RAM) but always in a sequential way.
In order to keep the data in a future proof and easy to document way I use Protocol Buffer. Protocol buffer however does not handle large amounts of data.
My previous solution consisted on creating one file per data unit (and store them all in one directory), but this seems not particularly scalable.
This time I would like to try using an embedded database. To have the similar functionality I only need to store key->values associations (thus sqlite seems an overkill). Values will be the binary serialization output from Protocol Buffer.
I expect the database to manage the "what to keep in memory, what to move to disk asp" issue, the "how to efficiently store large amount of data on disk" issue, and ideally, to optimize my sequential read patterns (by reading before-hand the next entries).
Searching for alternatives I was surprised from the lack of alternatives. I do not want to keep the database in a separate process, because I not need this separation (this rules out redis).
The only option I found was Berkeley DB, but it has an unpleasant low level C api. Then, the best option I found was stldb4 on top of Berkeley DB. The API seems quite nice and fits my needs.
However I am worried. stldb4 seems a weird (it has dependencies on libferris stuff), unmaintained solution (last release one year ago), for a problem I would have though to be quite common.
Do any of you have a better suggestion on how to manage this issue ?
Thanks for your answers.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
我想我已经找到了问题的答案。
我没有注意到 Berkeley DB 为 C++ 提供了两个 API:
此 STL API 提供 STL 兼容的向量和地图抽象,可以直接访问数据库。因此,
value = data_container[key]
成为可能。这对我来说似乎是最好的解决方案;直接使用 Berkeley DB STL API 和 Protocol Buffers。
I think I have found the answer to my problem.
I did not notice that Berkeley DB provides two APIs for C++:
This STL API provides STL compatible vectors and map abstractions that give direct access to the database. Thus doing
value = data_container[key]
becomes possible.This seems to be the best solution for me; using Berkeley DB STL API directly together Protocol Buffers.
我建议京都内阁。
I'd suggest Kyoto Cabinet.
BerkleyDB 似乎适合您的需求。当然,它的 API 有点尴尬,但如果您想获得一个不错的 API,SQLite 可能是更好的解决方案,尽管我认为它的性能可能没有那么好。
BerkleyDB seems to fit your needs. Sure, its API is a bit awkward, but if you rather get a nice API, SQLite might be better solution, even though I think its performance might not be as good.