需要 riak-js 帮助

发布于 2024-09-25 11:13:16 字数 892 浏览 8 评论 0原文

我是 node.js 和 riak 的新手，正在尝试使用 riak-js。我编写了以下 coffeescript，用整数 1 的平方创建 N 个条目。 N。该脚本适用于 N=10。如果我在 db.get() 中放置一个 console.log() 回调，我可以打印 1..10 的平方。

db = require('riak-js').getClient({debug:false})

N = 10

for i in [1..N]
 db.save('Square', String(i), String(i*i))

for i in [1..N]
 db.get('Square', String(i))

我的问题是，当我设置 N=1000 时，我的脚本需要大约 10 秒才能完成。这是正常的吗？我期待着不到 1 秒的事情。我的本地计算机上有一个 riak 节点，Acer Aspire 5740，i3 CPU 和 4GB RAM，运行 Ubuntu 10.04。对于仅 RAM 存储，我已将 $RIAK/rel/riak/etc/app.config 中的 storage_backend 设置为 riak_kv_ets_backend。 riak-admin status 命令确认此设置。

问题 1：也许 riak-js 正在为我的存储桶设置一些默认的基于磁盘的后端？我如何找到/覆盖这个？

Q2：我不认为这是一个node.js问题，但是我在异步使用中做错了什么吗？

原文

I'm a newbie with node.js and riak, trying to use riak-js. I wrote the following coffeescript, to create N entries with the squares of integers 1..N. The script works fine for N=10. If I put a console.log() callback in the db.get() I can print the squares of 1..10.

db = require('riak-js').getClient({debug:false})

N = 10

for i in [1..N]
 db.save('Square', String(i), String(i*i))

for i in [1..N]
 db.get('Square', String(i))

My problem is that when I put N=1000 it takes about 10 seconds for my script to complete. Is this normal? I was expecting something well under 1 sec. I have a single riak node on my local machine, an Acer Aspire 5740, i3 CPU and 4GB RAM, with Ubuntu 10.04. For a RAM-only store, I have set storage_backend in $RIAK/rel/riak/etc/app.config to riak_kv_ets_backend. The riak-admin status command confirms this setting.

Q1: Perhaps riak-js is setting some default disk-based backend for my bucket? How do I find out/override this?

Q2: I don't think it's a node.js issue, but am I doing something wrong in asynchronous usage?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

尹雨沫 2024-10-02 11:13:16

A1：riak-js 不使用任何隐藏设置，您可以自行配置 Riak 节点。

A2：你的剧本看起来不错，没有做错什么。

事实是我还没有开始进行基准测试或认真考虑性能问题。

也就是说，每个请求都在内部排队并串行发出。它使 API 更简单，并且不会遇到竞争条件，但它有其局限性。理想情况下，我想围绕 riak-js 构建一个包装器，它将负责：

持有多个实例以并行发出请求
当一个节点发生故障时自动重新连接到集群中的其他节点

您的示例在我的 MBP 上运行约 5 秒（使用 Bitcask）。

 =>  time coffee test.coffee 

real    0m5.181s
user    0m1.245s
sys 0m0.369s

作为概念证明，请看一下：

dbs = [require('riak-js').getClient({debug: false}), require('riak-js').getClient({debug: false})]

N = 1000

for i in [1..N]
  db = dbs[i % 2]
  db.save('sq', String(i), String(i*i))

for i in [1..N]
  db = dbs[i % 2]
  db.get('sq', String(i))

结果：

 =>  time coffee test.coffee 

real    0m3.341s
user    0m1.133s
sys 0m0.319s

这将通过使用更多客户端访问数据库来改进。

否则答案就是 Protocol Buffers 接口，这是毫无疑问的。我无法用你的例子运行它，所以我必须深入研究它。但这应该快如闪电。

确保您运行的是最新的 Riak（有许多性能改进）。还要考虑 CoffeeScript 编译的一点开销。

A1: riak-js does not use any hidden setting, it is up to you to configure your Riak nodes.

A2: Your script seems fine, there's nothing you're doing wrong.

The truth is I haven't started benchmarking or seriously considering performance issues.

That said, every request is queued internally and issued serially. It makes the API simpler and you don't run into race conditions, but it has its limitations. Ideally I want to build a wrapper around riak-js that will take care of:

Holding several instances to make requests in parallel
Automatically reconnecting to other nodes in the cluster when one goes down

Your example runs in ~5sec on my MBP (using Bitcask).

 =>  time coffee test.coffee 

real    0m5.181s
user    0m1.245s
sys 0m0.369s

Just as a proof of concept, take a look at this:

dbs = [require('riak-js').getClient({debug: false}), require('riak-js').getClient({debug: false})]

N = 1000

for i in [1..N]
  db = dbs[i % 2]
  db.save('sq', String(i), String(i*i))

for i in [1..N]
  db = dbs[i % 2]
  db.get('sq', String(i))

Results:

 =>  time coffee test.coffee 

real    0m3.341s
user    0m1.133s
sys 0m0.319s

This will improve by using more clients hitting the DB.

Otherwise the answer is the Protocol Buffers interface, no doubt about it. I couldn't get it running with your example so I'll have to dig into it. But that should be lightning fast.

Make sure you're running the latest Riak (there have been many performance improvements). Also take into account a little overhead for CoffeeScript compilation.

回复收藏 0 原文

娇纵 2024-10-02 11:13:16

这是我的测试文件：

db = require('../lib').getClient({debug:false})

N = if process.argv[2] then process.argv[2] else 10

for i in [1..N]
 db.save('Square', String(i), String(i*i))

for i in [1..N]
 db.get('Square', String(i))

编译后，我得到以下时间：

$ time node test1.js 1000

real 0m3.759s
user 0m0.823s
sys  0m0.421s

运行多次迭代后，无论后端如何，我的时间在该卷上都相似。我测试了 ets 和 det。操作系统将在第一次运行时在特定卷上缓存您的磁盘块，但后续运行速度会更快。

继 Frank06 的回答之后，我还会研究连接处理。这不是 Riak 的问题，而是 riak-js 如何设置其连接的问题。另请注意，在 Riak 中，所有节点都是相同的，因此如果您有一个三节点集群，您将创建到所有三个节点的连接并以某种方式循环它们。 Protobuf api 是可行的方法，但在设置时需要额外小心。

Here is my test file:

db = require('../lib').getClient({debug:false})

N = if process.argv[2] then process.argv[2] else 10

for i in [1..N]
 db.save('Square', String(i), String(i*i))

for i in [1..N]
 db.get('Square', String(i))

After Compiling, I get the following times:

$ time node test1.js 1000

real 0m3.759s
user 0m0.823s
sys  0m0.421s

After running many iterations, my times were similar at that volume regardless of backend. I tested ets and dets. The os will cache your disk blocks on the first run at a particular volume but subsequent runs are faster.

Following up on frank06's answer, I would also look into connection handling. This is not an issue with Riak, so much as it is an issue in how riak-js sets up it's connections. Also note that in Riak, all nodes are the same so if you had a three node cluster you would create connections to all three nodes and round robin them in some fashion. Protobuf api is the way to go but requires some extra care in setting up.

回复收藏 0 原文

~没有更多了~