几乎相同的视图中浮点精度不同?

发布于 2024-12-15 03:56:40 字数 2030 浏览 1 评论 0原文

使用下一个 bash 脚本创建数据库:

#! /bin/bash
curl -X PUT http://127.0.0.1:5984/sales
IFS=$';'
vals=`cat sales_upload.json`
for i in $vals 
do
    curl -X POST http://127.0.0.1:5984/sales -H "Content-Type: application/json" -d $i
done
unset IFS

和资源文件:

{
    "Type" : "customer",
    "LastName" : "Welsh", 
    "FirstName" : "Jim",
    "Address" : "340 West 50th Street, New York, NY",
    "TotalSpent" : 734.34
};
{
    "Type" : "customer",
    "LastName" : "Zuch", 
    "FirstName" : "Bo",
    "Address" : "116 10th Avenue, New York, NY",
    "TotalSpent" : 1102.47
};
{
    "Type" : "customer",
    "LastName" : "Libby", 
    "FirstName" : "Joe",
    "Address" : "611 Fifth Avenue, New York, NY",
    "TotalSpent" : 290.01
};
{
    "Type" : "customer",
    "LastName" : "Grant", 
    "FirstName" : "Sue",
    "Address" : "7 West 55th Street, Manhattan, NY",
    "TotalSpent" : 430.83
};
{
    "Type" : "salesman",
    "LastName" : "Green", 
    "FirstName" : "Gwen",
    "Level" : 1
};
{
    "_id" : "_design/logic",
    "language" : "javascript",
    "views" :
    {
        "customers": {
            "map" : "function(doc) { if (doc.Type == 'customer')  emit(null, {LastName: doc.LastName, FirstName: doc.FirstName, Address: doc.Address}) }"
        },
        "total_purchases": {
            "map" : "function(doc) { if (doc.Type == 'customer')  emit(null, doc.TotalSpent) }",
            "reduce" : "function(keys, values) { return sum(values) }"
        }
    }
}

当我调用时 curl -X GET http://127.0.0.1:5984/sales/_design/logic /_view/total_purchases

我得到:

{"rows":[ {"key":null,"value":2557.65} ]}

但是如果我在total_purchases中将emit的第一个参数更改为emit(doc.LastName, doc.TotalSpent),然后我会得到:

{"rows":[ {"key":null,"value":2557.6499999999996} ]}

为什么会这样?

Create db with next bash script:

#! /bin/bash
curl -X PUT http://127.0.0.1:5984/sales
IFS=

and resource file:

{
    "Type" : "customer",
    "LastName" : "Welsh", 
    "FirstName" : "Jim",
    "Address" : "340 West 50th Street, New York, NY",
    "TotalSpent" : 734.34
};
{
    "Type" : "customer",
    "LastName" : "Zuch", 
    "FirstName" : "Bo",
    "Address" : "116 10th Avenue, New York, NY",
    "TotalSpent" : 1102.47
};
{
    "Type" : "customer",
    "LastName" : "Libby", 
    "FirstName" : "Joe",
    "Address" : "611 Fifth Avenue, New York, NY",
    "TotalSpent" : 290.01
};
{
    "Type" : "customer",
    "LastName" : "Grant", 
    "FirstName" : "Sue",
    "Address" : "7 West 55th Street, Manhattan, NY",
    "TotalSpent" : 430.83
};
{
    "Type" : "salesman",
    "LastName" : "Green", 
    "FirstName" : "Gwen",
    "Level" : 1
};
{
    "_id" : "_design/logic",
    "language" : "javascript",
    "views" :
    {
        "customers": {
            "map" : "function(doc) { if (doc.Type == 'customer')  emit(null, {LastName: doc.LastName, FirstName: doc.FirstName, Address: doc.Address}) }"
        },
        "total_purchases": {
            "map" : "function(doc) { if (doc.Type == 'customer')  emit(null, doc.TotalSpent) }",
            "reduce" : "function(keys, values) { return sum(values) }"
        }
    }
}

when i calling
curl -X GET http://127.0.0.1:5984/sales/_design/logic/_view/total_purchases

i get:

{"rows":[ {"key":null,"value":2557.65} ]}

but if i in total_purchases change first parameter of emit to emit(doc.LastName, doc.TotalSpent), then i will get:

{"rows":[ {"key":null,"value":2557.6499999999996} ]}

Why so?

;' vals=`cat sales_upload.json` for i in $vals do curl -X POST http://127.0.0.1:5984/sales -H "Content-Type: application/json" -d $i done unset IFS

and resource file:

when i calling
curl -X GET http://127.0.0.1:5984/sales/_design/logic/_view/total_purchases

i get:

{"rows":[ {"key":null,"value":2557.65} ]}

but if i in total_purchases change first parameter of emit to emit(doc.LastName, doc.TotalSpent), then i will get:

{"rows":[ {"key":null,"value":2557.6499999999996} ]}

Why so?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

南烟 2024-12-22 03:56:40

您的答案之间的差异是由于您更改了视图功能。要发出的第一个参数确定如何构建视图索引。在第一种情况下,所有发出的值将存储在“null”键下。在第二个示例中,您现在已经将索引分布在不同的键上,即客户的姓氏。

因此,couchdb 中的内部 btree 在视图之间会有所不同。那么为什么总和会得到不同的结果呢?

CouchDB 使用增量映射/归约。您可以在这里阅读相关内容:
http://damienkatz.net/2008/02/incremental_map.html

从那篇文章达米安指出:

为了使增量Map/Reduce成为可能,Reduce函数不仅必须是引用透明的,而且对于数组值输入也必须是可交换的和关联的,以便能够对自己的输出进行归约并得到同样的答案,如下所示:

f(键, 值) == f(键, [ f(键, 值) ] )

reduce 函数的这种要求允许 CouchDB 将中间的归约直接存储到 btree 索引的内部节点中,并且视图索引更新和检索将具有对数成本。它还允许索引分布在机器上,并在查询时以对数成本减少索引。

增量设计使得使用map/reduce实时查询巨大的分区集群成为可能,而不必等待整个map/reduce作业完成或拥有陈旧的、偶尔更新的索引。缺点是以关联和交换的方式编写Reduce 函数可能会更困难。

因此,我假设正在发生的事情是,在第一个视图中,由于它们都在同一密钥下,因此没有存储的中间缩减。在第二个视图中,临时金额被存储。然后您可能会看到这些中间和中浮点数存储方式的差异。参见这里:
浮点数学是否损坏?

两个建议可能会帮助您解决此问题。首先是使用对 Erlang 版本的reduce 函数的“内置”调用。请参阅此处:

http://wiki.apache.org/couchdb/Built-In_Reduce_Functions

调用略有不同: "reduce": "_sum"

其次,您可以将浮点数转换为整数,如下所示:
浮点数学是否损坏?

希望这会有所帮助。

The difference between your answers is due to the fact that you have changed your view function. The first parameter to emit determines how the view index will be built. In the first case, all emitted values will be stored under the 'null' key. With the second example you have now spread your index around different keys, ie the last name of the customer.

Therefore the internal btree in couchdb will be different between the views. So why will you get a different result in the sum?

CouchDB uses incremental map/reduce. You can read about that here:
http://damienkatz.net/2008/02/incremental_map.html

From that post Damien makes the point:

To make incremental Map/Reduce possible, the Reduce function has the requirement that not only must it be referentially transparent, but it must also be commutative and associative for the array value input, to be able reduce on it's own output and get the same answer, like this:

f(Key, Values) == f(Key, [ f(Key, Values) ] )

This requirement of reduce functions allows CouchDB to store off intermediated reductions directly into inner nodes of btree indexes, and the view index updates and retrievals will have logarithmic cost. It also allows the indexes to be spread across machines and reduced at query time with logarithmic cost.

The incremental design makes it possible to use map/reduce to query huge partitioned clusters in realtime, instead of having to wait for a whole map/reduce job to complete or having stale, occasionally updated indexes,. The downside is it may be harder to write the Reduce function in an associative and commutative manner.

Therefore I assume what is going on is that in the first view, since they are all under the same key, there is no stored intermediate reductions. While in the second view, temporary sums are being stored. You are probably then seeing the difference in the way the floating point numbers are stored in these intermediate sums. See here:
Is floating point math broken?

Two recommendations may help you solve this. First is to use the 'built-in' call to the Erlang version of the reduce functions. See here:

http://wiki.apache.org/couchdb/Built-In_Reduce_Functions

The call is slightly different : "reduce": "_sum"

Second, you could convert emit the float as an integer as seen here:
Is floating point math broken?

Hope this helps.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文