当数据无法容纳内存时,适合 Erlang 应用程序的数据存储后端

发布于 2024-07-09 01:05:48 字数 499 浏览 9 评论 0原文

我正在研究如何为 Erlang 应用程序组织数据存储的可能选项。 它应该使用的数据基本上是由短字符串 id 索引的二进制 blob 的巨大集合。 每个 blob 都不到 10 Kb,但数量很多。 我预计它们的总大小将达到 200 GB,因此显然它无法装入内存。 对此数据的典型操作是通过 id 读取 blob、通过 id 更新 blob 或添加新的 blob。 在一天中的每个给定时间段,仅使用 ids 的子集,因此数据存储访问性能可能会受益于内存缓存。 说到性能——这是非常关键的。 目标是在商用硬件(例如 EC2 VM)上每秒进行大约 500 次读取和 500 次更新。

有什么建议在这里使用什么吗? 据我了解,dets 是不可能的,因为它仅限于 2G(或者是 4G?)。 Mnesia 或许也是不可能的; 我的印象是它主要是为数据适合内存的情况而设计的。 我正在考虑尝试 EDTK 的 Berkeley DB 驱动程序来完成该任务。 在上面的场景中它会起作用吗? 有人有在类似条件下在生产中使用它的经验吗?

I'm researching possible options how to organize data storage for an Erlang application. The data it supposed to use is basically a huge collection of binary blobs indexed by short string ids. Each blob is under 10 Kb but there are many of them. I'd expect that in total they would have size up to 200 Gb so obviously it cannot fit into memory. The typical operation on this data is either reading a blob by its id or updating a blob by its id or adding a new one. At each given period of day only a subset of ids is being used so the data storage access performance might benefit from in-memory cache. Speaking about performance - it is quite critical. The target is to have around 500 reads and 500 updates per second on commodity hardware (say on EC2 VM).

Any suggestions what to use here? As I understand dets is out of question as it is limited to 2G (or was it 4G?). Mnesia probably out of question too; my impression is that it was mainly designed for cases when data fits memory. I'm considering trying EDTK's Berkeley DB driver for the task. Would it work in the above scenario? Does anybody have experience using it in the production in the similar conditions?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

只怪假的太真实 2024-07-16 01:05:48

tcerl 摆脱了同样的大小限制。 这些天我没有使用 Erlang,但它听起来就像你正在寻找的。

tcerl came out of facing the same size limit. I'm not using Erlang these days but it sounds like what you're looking for.

庆幸我还是我 2024-07-16 01:05:48

你看过CouchDB在做什么吗? 它可能不完全是您所追求的产品,但其中有大量 erlang 代码用于存储数据。 还有一些关于提供原生 erlang 接口而不是 REST api 的讨论。

Have you looked at what CouchDB is doing? It might not be quite what you are after as a drop in product, but there is lots of erlang code in there for storing data. There is also some talk of providing a native erlang interface instead of the REST api.

叫思念不要吵 2024-07-16 01:05:48

有什么理由不能只使用文件系统,将文件名视为字符串 ID,将文件内容视为二进制 blob? 您可以选择一个适合您的性能要求的(文件系统),并且您应该基本上免费获得由操作系统提供的缓存。

Is there any reason why you can't just use a file system, treating filename as your string id and file contents as a binary blob? You can choose one (filesystem) that fits your performance requirements, and you should get caching basically for free, provided by your OS.

森林很绿却致人迷途 2024-07-16 01:05:48

Mnesia 可以很好地将数据存储在磁盘上。 还有 dets(基于磁盘的术语存储),它大致类似于 Berkeley DB。 它位于标准库中: http://www.erlang.org/doc/应用程序/stdlib/index.html

Mnesia can store data on disk just fine. There's also dets (disk based term storage) which is roughly analogous to Berkeley DB. It's in the standard lib: http://www.erlang.org/doc/apps/stdlib/index.html

哎呦我呸! 2024-07-16 01:05:48

我会推荐 Apache CouchDB。

它非常适合 Erlang,从它的声音来看(您提到了基于 ID 的 blob,但没有提到任何关系要求),您正在寻找一个面向文档的数据库。

由于接口是 REST,因此如果需要缓存,可以非常简单地在其前面添加商品 HTTP 缓存。

CouchDB 的文档质量非常高。

它还有内置的 Map-Reduce :)

I would recommend Apache CouchDB.

It's a great fit for Erlang, and from the sound of it (you mention ID-based blobs and don't mention any relational requirements) you're looking for a document-oriented database.

Since the interface is REST, you can very simply add a commodity HTTP cache in front of it if you need caching.

The documentation for CouchDB is of a very high quality.

It also has built-in Map-Reduce :)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文