当前位置：文江博客话题详情

对大型 CouchDB 数据库进行采样以进行本地开发，避免长视图构建

发布于 2024-09-16 07:00:20 字数 113 浏览 9 评论 0原文

CouchDB 可以方便地在本地开发（CouchApps），然后推送到远程生产。不幸的是，对于生产规模的数据集，处理视图可能会很麻烦。

获取 CouchDB 数据库样本用于本地开发的好方法有哪些？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

烟凡古楼 2024-09-23 07:00:20

答案是过滤复制。我喜欢分两部分执行此操作：

将生产数据库 example_db 作为 example_db_full 复制到我的本地服务器
执行从 example_db_full 到 的过滤复制>example_db，其中过滤器删除了足够的数据，因此构建速度很快，但保留了足够的数据，以便我可以确认我的代码是否有效。

选择哪些文档可以特定于应用程序。此时，我对简单的随机通过/失败以及我可以指定的百分比感到满意。随机性是一致的（即，同一个文档总是通过或总是失败。）

我的技术是在 [0.0, 1.0) 范围内规范化文档 _rev 字段中的内容校验和。然后我只需指定一些分数（例如 0.01），如果标准化校验和值 <= 我的分数，则文档通过。

function(doc, req) {
  if(/^_design\//.test(doc._id))
    return true;

  if(!req.query.p)
    throw {error: "Must supply a 'p' parameter with the fraction"
                  + " of documents to pass [0.0-1.0]"};

  var p = parseFloat(req.query.p);
  if(!(p >= 0.0 && p <= 1.0)) // Also catches NaN
    throw {error: "Must supply a 'p' parameter with the fraction of documents"
                  + " to pass [0.0-1.0]"};

  // Consider the first 8 characters of the doc checksum (for now, taken
  // from _rev) as a real number on the range [0.0, 1.0), i.e.
  // ["00000000", "ffffffff").
  var ONE = 4294967295; // parseInt("ffffffff", 16);
  var doc_val = parseInt(doc._rev.match(/^\d+-([0-9a-f]{8})/)[1], 16);

  return doc_val <= (ONE * p);
}

The answer is filtered replication. I like to do this in two parts:

Replicate the production database, example_db to my local server as example_db_full
Perform filtered replication from example_db_full to example_db, where the filter cuts out enough data so builds are fast, but keeps enough data so I can confirm my code works.

Which documents to select can be application-specific. At this time, I am satisfied with a simple random pass/fail with a percentage that I can specify. The randomness is consistent (i.e., the same document always passes or always fails.)

My technique is to normalize the content checksum in the document _rev field on a range of [0.0, 1.0). Then I simply specify some fraction (e.g. 0.01), and if the normalized checksum value is <= my fraction, the document passes.

function(doc, req) {
  if(/^_design\//.test(doc._id))
    return true;

  if(!req.query.p)
    throw {error: "Must supply a 'p' parameter with the fraction"
                  + " of documents to pass [0.0-1.0]"};

  var p = parseFloat(req.query.p);
  if(!(p >= 0.0 && p <= 1.0)) // Also catches NaN
    throw {error: "Must supply a 'p' parameter with the fraction of documents"
                  + " to pass [0.0-1.0]"};

  // Consider the first 8 characters of the doc checksum (for now, taken
  // from _rev) as a real number on the range [0.0, 1.0), i.e.
  // ["00000000", "ffffffff").
  var ONE = 4294967295; // parseInt("ffffffff", 16);
  var doc_val = parseInt(doc._rev.match(/^\d+-([0-9a-f]{8})/)[1], 16);

  return doc_val <= (ONE * p);
}

回复收藏 0 原文

~没有更多了~