关于非关系型数据库(NoSQL)的问题

发布于 2024-08-28 17:12:36 字数 1693 浏览 4 评论 0原文

虽然我还没有使用过任何新的 NoSQL 数据库,但我已经尝试通过阅读 Wikipedia 文章、博客以及查看一些 NoSQL DB 文档来了解情况。

我刚刚(重新)阅读了 2009 年 8 月版的 php|architect,特别是关于非关系数据库的文章,我的脑海中突然出现了一些问题,我知道这篇文章对这个主题的介绍相当简单,但它是足以让我感到困惑...

CouchDB

我关于 CouchDB 的主要问题是为什么如此大肆宣传?。据我了解,CouchDB 提供了一个 Web 服务,允许您在数据库内创建数据库和文档,这些文档可以具有多个 JSON 编码的属性,并且还具有特殊的 _id_rev 用于跟踪文档修订的属性。

我真的没有对此大惊小怪,几年前,在一个宠物项目中,我编写了一个类似的(?)系统来存储文档,其结构是这样的:

documents/
  document-name/
    (revision) timestamp/
      (contents) md5-hash.txt
        PHP Serialized Data

我确信我错过了一些非常基本的东西,否则(从 PHP 开发人员的角度来看)这将具有与 CouchDB 相同的优点并且速度更快 - 无需编码和解码 JSON。


Amazon SimpleDB

现在这个真的让我头晕……作者(Russell Smith)给出了以下示例:

$sdb->putAttributes('phparch', 'may', array('title' => array('value' => 'May 2009'), 'have' => array('value' => false)));
$sdb->putAttributes('phparch', 'june', array('title' => array('value' => 'June 2009'), 'have' => array('value' => true)));
$sdb->putAttributes('phparch', 'july', array('title' => array('value' => 'July 2009'), 'have' => array('value' => true)));

然后他说 Amazon 现在支持类似 SQL 的接口,然后执行以下查询:

$sdb->select('phparch', 'SELECT * FROM phparch WHERE have = "1"');

他没有给出任何类似的例子来说明如何在 CouchDB 中执行该查询(但是他在视图和 Map/Reduce 上留下了一些提示),但我认为这也是可能的,所以我的问题是:亚马逊(和CouchDB)会这样做吗?

我的第一个猜测是他们打开所有文档(可能在分布式环境中),然后应用reduce操作来过滤属性匹配的文档搜索条件,但即使在并行计算中,这是否也过于昂贵(CPU 和磁盘 I/O)?


我知道我忽略了一些重要的东西,比如分布、一致性等,但我只是试图掌握 NoSQL 存储的基本内部工作原理。

PS:另外,谁能解释一下为什么 CouchDB 和 Amazon SimpleDB 都是用 Erlang 构建的?

Although I've not yet used any of the new NoSQL databases I've tried to keep myself informed by reading Wikipedia articles, blogs and the peeking into some of the NoSQL DBs documentation.

I've just (re)read the August 2009 edition of php|architect, specifically the article about the Non-Relation Databases and a few questions popped up in my head, I understand that the article is pretty light on the subject but it was enough to get me confused...

CouchDB

My main question regarding CouchDB is why so much hype?. From what I understood CouchDB provides a Web Service that lets you create databases and documents inside the database, the documents can have several JSON-encoded attributes and also have a special _id and _rev attribute for tracking revisions of the document.

I really don't get all the fuss about this, some years ago for a pet project I coded a similar (?) system for storing documents and the structure was something like this:

documents/
  document-name/
    (revision) timestamp/
      (contents) md5-hash.txt
        PHP Serialized Data

I'm sure I'm missing something very fundamental, otherwise (from the viewpoint of a PHP developer) this would have the same benefits as CouchDB and be faster - no need to encode and decode JSON.


Amazon SimpleDB

Now this one really gets my head spinning... The author (Russell Smith) gives the following example:

$sdb->putAttributes('phparch', 'may', array('title' => array('value' => 'May 2009'), 'have' => array('value' => false)));
$sdb->putAttributes('phparch', 'june', array('title' => array('value' => 'June 2009'), 'have' => array('value' => true)));
$sdb->putAttributes('phparch', 'july', array('title' => array('value' => 'July 2009'), 'have' => array('value' => true)));

He then says that Amazon now supports a SQL-like interface and then executes the following query:

$sdb->select('phparch', 'SELECT * FROM phparch WHERE have = "1"');

He doesn't give any analogous example of how to do that query in CouchDB (he leaves some hints on Views and Map/Reduce however) but I suppose it is also possible, so my question is: how does Amazon (and CouchDB) do it?

My first guess would be that they open all documents (in possible in a distributed environment) and then apply a reduce operation to filter the documents whose attributes don't match the search criteria, but wouldn't this be overly expensive (CPU and Disk I/O) even in parallel computing?


I know I'm ignoring some important stuff like distribution, consistency and so on but I'm just trying to grasp the very basic inner workings of NoSQL storages.

PS: Also, can anyone explain me why both CouchDB and Amazon SimpleDB are built with Erlang?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

2024-09-04 17:12:36

围绕 nosql 的争论归结于索引、可用性和可扩展性。如果您想获取 has = 1 的文档,则索引允许面向文档的存储不打开所有文档。可用性和可扩展性使这些系统能够轻松横向扩展,并在面对不可靠的硬件时保持稳健。

erlang 是为多处理器系统设计的,因此也非常适合分布式系统。

the fuss around nosql is down to indexing, availability, and scalability. indexing is what allows the document-oriented stores to NOT open all documents if you want to get the ones where have = 1. availablity and scalability allow these systems to easily scale out and be robust in the face of unreliable hardware.

erlang is designed for multi-processor systems and so is an ideal fit for distributed systems too.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文