如何取 MongoDB 与 CouchDB 中大数据的平均值?

发布于 2024-11-19 16:03:16 字数 480 浏览 3 评论 0原文

我正在看这个图表...

http:// /www.mongodb.org/display/DOCS/MongoDB,+CouchDB,+MySQL+Compare+Grid

...其中表示:

查询方法

构建索引

CouchDB - Map/reduce javascript 函数以延迟为每个查询MongoDB - 动态的;基于对象的查询语言

这到底是什么意思?例如,如果我想取 1,000,000,000 个值的平均值,CouchDB 是否会自动以 MapReduce 方式执行此操作?

有人可以告诉我如何使用两个系统取 1,000,000,000 个值的平均值吗?这将是一个非常有启发性的例子。

谢谢。

I'm looking at this chart...

http://www.mongodb.org/display/DOCS/MongoDB,+CouchDB,+MySQL+Compare+Grid

...which says:

Query Method

CouchDB - Map/reduce of javascript functions to lazily build an index per query

MongoDB - Dynamic; object-based query language

What exactly does this mean? For example, if I want to take an average of 1,000,000,000 values, does CouchDB automatically do it in a MapReduce way?

Can someone walk me through how to take an average of 1,000,000,000 values with both systems... this would be a very illuminating example.

Thanks.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

辞别 2024-11-26 16:03:16

CouchDB 的观点是一种奇怪而迷人的野兽。

CouchDB 进行增量映射/归约,也就是说,一旦您指定了“视图”,它将像关系数据库中的物化视图一样工作。如果您平均有 3 或 30 亿个文档,那并不重要。结果就在那里。

但其中存在三个问题:

1)一旦创建并更新视图,查询就会很快。如果您有大量小文档(如果可能,请使用较大的文档),视图创建可能会很慢。创建视图后,中间缩减步骤将存储在 B 树节点内,您无需重新计算它们。

2)当您查询时,视图会延迟更新。为了获得可预测的性能,您最好设置某种作业来定期更新它们。
如何在 CouchDB 中安排索引更新

3)您需要很好地了解如何使用复合键、范围和分组查询数据。 CouchDB 不擅长进行即席查询。
http://wiki.apache.org/couchdb/Introduction_to_CouchDB_views

我相信很快就会有人发布如何平均两个数据库中 1,000,000,000 个项目的详细信息,但您必须了解 CouchDB 让您做更多的前期工作以便从其增量方法中受益。这确实是非常独特的东西,但并不是真正适合当您对临时查询数据进行平均值或任何操作时的场景。

在 Mongo 中,您可以使用 map/reduce(不是增量式的。无论您是平均 3 亿还是 30 亿个文档,这都很重要,但由于其内存映射 I/O 方法,mongo 被认为速度极快)或其聚合功能。 http://www.mongodb.org/display/DOCS/Aggregation

CouchDB´s views are a strange and fascinating beast.

CouchDB does incremental map/reduce, that is to say, that once you specify your "view" it´ll work sort of like a materialized view from a relational database. It will not matter if you´re averaging 3 or 3 billion documents. The result is there.

But there is a threefold gotcha in there

1) querying is fast once the view is created and is updated. View creation can be slow if you have lots of small documents (if possible go with fatter documents). Once the view is created, the intermediary reduction steps are stored inside the B-tree nodes and you´ll won´t have to recompute them.

2) Views are updated lazily when you query then. To have a predictable performance, you better setup some sort of job to update them regularly.
How do you Schedule Index Updates in CouchDB

3) You need to have a pretty good idea of how you´ll query your data with composite keys, ranges and grouping. CouchDB sucks at doing ad-hoc querying.
http://wiki.apache.org/couchdb/Introduction_to_CouchDB_views

I´m sure someone will soon post the details of how to average 1,000,000,000 items in both databases, but you have to understand that CouchDB makes you do more upfront work in order to benefit from it´s incremental approach. It´s really something quite unique, but not really intended to scenarios when you´re doing averages or anything on ad-hoc queried data.

In Mongo, you can use either map/reduce(not incremental. It will matter whether you are averaging 3 or 3 billion documents, but mongo is considered to be blazingly fast due to its memory mapped I/O approach) or their aggregation features. http://www.mongodb.org/display/DOCS/Aggregation

若有似无的小暗淡 2024-11-26 16:03:16

我不能谈论 MongoDB,但我可以向您介绍 CouchDB。 CouchDB 只能通过 Map/Reduce 视图引擎进行本机查询。事实上,wiki 的部分是一个很好的起点。

视图包含一个ma​​p函数和一个可选reduce函数。编写这些函数的典型语言是 JavaScript,但有一个可用的 Erlang 选项,并且可以用几乎任何其他编程语言构建视图引擎。

映射函数用于根据数据库中的文档构建数据集。 reduce 函数用于聚合该数据集。因此,一旦创建视图,地图函数就会在数据库中的每个文档上运行。 (并且首先查询)创建后,该函数仅在新创建或修改/删除的文档上运行。因此,视图索引是增量构建的,而不是动态构建的。

如果有 1,000,000,000 个值,CouchDB 不需要在每次请求时计算查询结果。相反,它只会报告它所存储的视图索引的值,该值本身只会在创建/更新/删除文档时发生变化。

至于编写 Map/Reduce 函数,很多工作都留给了程序员,因为没有内置的映射函数。 (即它不是“自动”)但是,有一些本机reduce函数 (_sum< /code>、_count_stats)可用。

这是一个简单的例子,我们将计算一些人的平均身高。

// sample documents
{ "_id": "Dominic Barnes", "height": 64 }
{ "_id": "Some Tall Guy", "height": 75 }
{ "_id": "Some Short(er) Guy", "height": 58 }

// map function
function (doc) {
  // first param is "key", which we do not need since `_id` is stored anyways
  emit(null, doc.height);
}

// reduce function
_stats

该视图的结果如下所示:

{
  "rows": [
    {
      "key": null
      "value": {
        "sum": 197,
        "count": 3,
        "min": 58,
        "max": 75,
        "sumsqr": 13085
      }
    }
  ]
}

从这里计算平均值就像将总和除以计数一样简单。如果您希望在视图本身内计算平均值,可以查看此示例

I cannot speak about MongoDB, but I can tell you about CouchDB. CouchDB can only be natively queried via a Map/Reduce View Engine. In fact, a great place to start is this section of the wiki.

A view contains a map function, and an optional reduce function. The typical language for writing these functions is JavaScript, but there is an Erlang option available, and it is possible to build a view engine in just about any other programming language.

The map function serves to build a data-set out of the documents in the database. The reduce function serves to aggregate that data-set. As such, the map function is run on every single document in the database once the view is created. (and first queried) After creation, that function only runs on a document that is either newly created, or is modified/deleted. As such, view indexes are built incrementally, not dynamically.

In the case of 1,000,000,000 values, CouchDB will not need to calculate the results of your query every single time it's requested. Instead, it will only report on the value of the view index it has stored, which itself only changes whenever a document is created/updated/deleted.

As far as writing Map/Reduce functions, a lot of that work is left up to the programmer, as there are no built-in map functions. (ie. it's not "automatic") However, there are a few native reduce functions (_sum, _count, _stats) available.

Here's a simple example, we'll calculate the average height of some people.

// sample documents
{ "_id": "Dominic Barnes", "height": 64 }
{ "_id": "Some Tall Guy", "height": 75 }
{ "_id": "Some Short(er) Guy", "height": 58 }

// map function
function (doc) {
  // first param is "key", which we do not need since `_id` is stored anyways
  emit(null, doc.height);
}

// reduce function
_stats

The results of this view would look like this:

{
  "rows": [
    {
      "key": null
      "value": {
        "sum": 197,
        "count": 3,
        "min": 58,
        "max": 75,
        "sumsqr": 13085
      }
    }
  ]
}

Calculating the average from here is as simple as dividing the sum by the count. If you want the average calculated within the view itself, you could check out this example.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文