对大型 CouchDB 数据库进行采样以进行本地开发,避免长视图构建

发布于 2024-09-16 07:00:20 字数 113 浏览 9 评论 0原文

CouchDB 可以方便地在本地开发(CouchApps),然后推送到远程生产。不幸的是,对于生产规模的数据集,处理视图可能会很麻烦。

获取 CouchDB 数据库样本用于本地开发的好方法有哪些?

CouchDB is convenient to develop (CouchApps) locally and then push into remote production. Unfortunately with production-sized data sets, working on views can be cumbersome.

What are good ways to take samples of a CouchDB database for use in local development?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

烟凡古楼 2024-09-23 07:00:20

答案是过滤复制。我喜欢分两部分执行此操作:

  1. 将生产数据库 example_db 作为 example_db_full 复制到我的本地服务器
  2. 执行从 example_db_full的过滤复制>example_db,其中过滤器删除了足够的数据,因此构建速度很快,但保留了足够的数据,以便我可以确认我的代码是否有效。

选择哪些文档可以特定于应用程序。此时,我对简单的随机通过/失败以及我可以指定的百分比感到满意。随机性是一致的(即,同一个文档总是通过或总是失败。)

我的技术是在 [0.0, 1.0) 范围内规范化文档 _rev 字段中的内容校验和。然后我只需指定一些分数(例如 0.01),如果标准化校验和值 <= 我的分数,则文档通过。

function(doc, req) {
  if(/^_design\//.test(doc._id))
    return true;

  if(!req.query.p)
    throw {error: "Must supply a 'p' parameter with the fraction"
                  + " of documents to pass [0.0-1.0]"};

  var p = parseFloat(req.query.p);
  if(!(p >= 0.0 && p <= 1.0)) // Also catches NaN
    throw {error: "Must supply a 'p' parameter with the fraction of documents"
                  + " to pass [0.0-1.0]"};

  // Consider the first 8 characters of the doc checksum (for now, taken
  // from _rev) as a real number on the range [0.0, 1.0), i.e.
  // ["00000000", "ffffffff").
  var ONE = 4294967295; // parseInt("ffffffff", 16);
  var doc_val = parseInt(doc._rev.match(/^\d+-([0-9a-f]{8})/)[1], 16);

  return doc_val <= (ONE * p);
}

The answer is filtered replication. I like to do this in two parts:

  1. Replicate the production database, example_db to my local server as example_db_full
  2. Perform filtered replication from example_db_full to example_db, where the filter cuts out enough data so builds are fast, but keeps enough data so I can confirm my code works.

Which documents to select can be application-specific. At this time, I am satisfied with a simple random pass/fail with a percentage that I can specify. The randomness is consistent (i.e., the same document always passes or always fails.)

My technique is to normalize the content checksum in the document _rev field on a range of [0.0, 1.0). Then I simply specify some fraction (e.g. 0.01), and if the normalized checksum value is <= my fraction, the document passes.

function(doc, req) {
  if(/^_design\//.test(doc._id))
    return true;

  if(!req.query.p)
    throw {error: "Must supply a 'p' parameter with the fraction"
                  + " of documents to pass [0.0-1.0]"};

  var p = parseFloat(req.query.p);
  if(!(p >= 0.0 && p <= 1.0)) // Also catches NaN
    throw {error: "Must supply a 'p' parameter with the fraction of documents"
                  + " to pass [0.0-1.0]"};

  // Consider the first 8 characters of the doc checksum (for now, taken
  // from _rev) as a real number on the range [0.0, 1.0), i.e.
  // ["00000000", "ffffffff").
  var ONE = 4294967295; // parseInt("ffffffff", 16);
  var doc_val = parseInt(doc._rev.match(/^\d+-([0-9a-f]{8})/)[1], 16);

  return doc_val <= (ONE * p);
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文