NoSql 带有我自己的自定义二进制文件?

发布于 2024-10-31 05:47:53 字数 618 浏览 3 评论 0 原文

最初,我只需要处理 1.5[TB] 的数据。由于我只需要快速写入/读取(无需任何 SQL),因此我设计了自己的平面二进制文件格式(使用 python 实现)并轻松(并且愉快地)保存我的数据并在一台机器上对其进行操作。当然,出于备份目的,我添加了两台机器作为精确镜像(使用rsync)。

目前,我的需求不断增长,需要构建一个能够成功扩展到 20[TB](甚至更多)数据的解决方案。 我很乐意继续使用我的平面文件格式进行存储。它快速、可靠并为我提供所需的一切。

我关心的是复制、数据一致性等(显然,数据必须分布——不是所有数据都可以存储在一台机器上)网络。

是否有任何现成的解决方案(基于Linux / python)可以让我继续使用我的文件格式进行存储,但可以处理NoSql 解决方案通常提供的其他组件是什么? (数据一致性/可用性/易于复制)?

基本上,我想要确保的是我的二进制文件在整个网络中保持一致。我正在使用由 60 台双核机器组成的网络(每台机器都有 1GB RAM1.5TB 磁盘

Originally, I had to deal with just 1.5[TB] of data. Since I just needed fast write/reads (without any SQL), I designed my own flat binary file format (implemented using python) and easily (and happily) saved my data and manipulated it on one machine. Of course, for backup purposes, I added 2 machines to be used as exact mirrors (using rsync).

Presently, my needs are growing, and there's a need to build a solution that would successfully scale up to 20[TB] (and even more) of data. I would be happy to continue using my flat file format for storage. It is fast, reliable and gives me everything I need.

The thing I am concerned about is replication, data consistency etc (as obviously, data will have to be distributed -- not all data can be stored on one machine) across the network.

Are there any ready-made solutions (Linux / python based) that would allow me to keep using my file format for storage, yet would handle the other components that NoSql solutions normally provide? (data consistency / availability / easy replication)?

basically, all I want to make sure is that my binary files are consistent throughout my network. I am using a network of 60 core-duo machines (each with 1GB RAM and 1.5TB disk)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

清旖 2024-11-07 05:47:53

方法:使用 Disco 项目在 Python 中进行分布式映射缩减

似乎是解决问题的好方法。我使用迪斯科项目也遇到类似的问题。

您可以将文件分布在n台机器(进程)之间,并实现适合您逻辑的map和reduce函数。

disco 项目的教程,准确地描述了如何针对您的问题实施解决方案。您会对需要编写的代码之少感到印象深刻,而且您绝对可以保留二进制文件的格式。

另一个类似的选项是使用 Amazon 的 Elastic MapReduce

Approach: Distributed Map reduce in Python with The Disco Project

Seems like a good way of approaching your problem. I have used the disco project with similar problems.

You can distribute your files among n numbers of machines (processes), and implement the map and reduce functions that fit your logic.

The tutorial of the disco project, exactly describes how to implement a solution for your problems. You'll be impressed about how little code you need to write and definitely you can keep the format of your binary file.

Another similar option is to use Amazon's Elastic MapReduce

荆棘i 2024-11-07 05:47:53

也许为 Tarsnap 开发的 Kivaloo 系统的一些评论将帮助您决定最合适的:http://www.daemonology.net/blog/2011-03-28-kivaloo-data-store.html

不了解有关您的应用程序的更多信息(记录的大小/类型、频率读/写)或自定义格式很难说更多。

Perhaps some of the commentary on the Kivaloo system developed for Tarsnap will help you decide what's most appropriate: http://www.daemonology.net/blog/2011-03-28-kivaloo-data-store.html

Without knowing more about your application (size/type of records, frequency of reading/writing) or custom format it's hard to say more.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文