过滤 Redis 哈希条目

发布于 2024-11-01 21:14:48 字数 814 浏览 6 评论 0原文

我使用 Redis 来存储哈希值,每个哈希值大约有 100k 条记录。我想实现过滤(分面)给定哈希中的记录。请注意,一个哈希条目可以属于 n 个过滤器。

阅读这个看起来我应该:

  1. 为每个过滤器实现一个排序的集合。 SET 中的值对应于 HASH 中的键。
  2. 从给定的过滤器集合中检索 HASH 键。
  3. 一旦我从 SET 中获得了 HASH 键,就从 HASH 中获取相应的条目。这应该给我属于过滤器的所有条目。

首先,上述方法在高层次上是否正确?

假设该方法没问题,我缺少的一点是检索哈希条目的最有效的实现是什么?我的想法是否正确,一旦我有了 HASH 键,我应该使用 PIPELINE 对通过每个 HASH 键的多个 HGETALL 命令进行排队?有更好的方法吗?

我对使用 PIPELINE 的担忧是,我相信它会在服务命令时阻止所有其他客户端。我将对过滤后的结果进行分页,每页显示 500 个结果。由于多个基于浏览器的客户端执行过滤,更不用说填充 SET 和 HASH 的后端进程,听起来如果 PIPELINE 确实阻塞,可能会出现很多争用。有人可以对此提出看法吗?

如果有帮助的话,我将使用 2.2.4 redis、predis 作为 Web 客户端,使用 servicestack 作为后端。

谢谢, 保罗

I'm using redis to store hashes with ~100k records per hash. I want to implement filtering (faceting) the records within a given hash. Note a hash entry can belong to n filters.

After reading this and this it looks like I should:

  1. Implement a sorted SET per filter. The values within the SET correspond to the keys within a HASH.
  2. Retrieve the HASH keys from the given filter SET.
  3. Once I have the HASH keys from the SET fetch the corresponding entries from the HASH. This should give me all entries that belong to the filter.

Firstly is the above approach correct at a high level?

Assuming the approach is OK the bit I'm missing is what's the most efficient implementation to retrieve the HASH entries? Am I right in thinking once I have the HASH keys I should then use a PIPELINE to queue multiple HGETALL commands passing through each HASH key? Is there a better approach?

My concern about using a PIPELINE is that I believe it will block all other clients while servicing the command. I'll be paging the filtered results with 500 results per page. With multiple browser based clients performing filtering, not to mention the back end processes that populate the SETs and HASHes it sounds like there's potential for a lot of contention if PIPELINE does block. Could anyone provide a view on this?

If it helps I'm using 2.2.4 redis, predis for the web clients and servicestack for the back end.

Thanks,
Paul

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

帝王念 2024-11-08 21:14:48

Redis 是一个无锁非阻塞异步服务器,因此在使用管道时不会增加争用。 Redis 在收到每个操作后都会愉快地处理它们,因此实际上可以处理多个管道操作。本质上,redis-server 实际上并不关心操作是否是管道化的,它只是在收到操作时处理每个操作。

管道的好处是减少客户端延迟,客户端无需在发送下一个操作之前等待每个操作的 redis 服务器响应,而是可以在一次写入中一次泵送所有操作,然后读回一个操作中的所有响应。单读。

我的 Redis mini StackOverflow 克隆 中的一个实际示例是每次点击都会调用 ToQuestionResults( ) 因为操作是管道化的,所以会在 1 个 Socket 写入调用上发送所有操作,并在 1 个 Socket 阻塞读取中读取结果,这比每次调用阻塞读取更有效:

https://github.com/ServiceStack /ServiceStack.Examples/blob/master/src/RedisStackOverflow/RedisStackOverflow.ServiceInterface/IRepository.cs#L180

我对使用 PIPELINE 的担忧是
我相信它会阻止所有其他
为命令提供服务时的客户端。

这不是一个有效的担忧,我不会过多考虑 Redis 在这里如何工作,假设它在管道不会阻止其他客户端命令的处理的情况下最有效地完成它。从概念上讲,您可以认为 redis-server 按 FIFO 顺序处理每个命令(管道化或非管道化)(即,不会浪费时间等待/读取整个管道)。

您所描述的内容更接近于 MULTI/EXEC(即 Redis 事务),其中一旦 Redis 服务器读取 EXEC(即 EOF 事务),所有操作都会立即完成。这也不是问题,redis-server 仍然不会浪费任何时间等待接收整个事务,它只是将部分命令集放入临时队列中,直到收到最终的 EXEC,然后立即处理所有命令。

这就是 Redis 通过在收到命令后立即处理每个命令(一次一个)来实现原子性的方式。由于没有其他线程,因此没有线程上下文切换,没有锁,也没有多线程问题。它基本上通过快速处理每个命令来实现并发。

因此,在这种情况下,我会使用管道化,因为它总是一个胜利,管道化的命令越多(当您减少阻塞读取计数时)。

Redis is a lock-free non-blocking async server so there is no added contention when using pipelining. Redis hums along happily processing each operation as soon as it receives them so in practice can process multiple pipelined operations. In essence redis-server really doesn't care if the operation is pipelined or not it just processes each operation as it receives them.

The benefit of pipelining is to reduce client latency where instead of waiting for a response from redis-server for each operation before sending the next one, the client can just pump all operations at once in a single write then read back all the responses in a single read.

An example of this in action is in my Redis mini StackOverflow clone each click makes a call to ToQuestionResults() which because the operations are pipelined sends all operations on 1 Socket write call and reads the results in 1 Socket blocking read which is more efficient instead of a blocking read per call:

https://github.com/ServiceStack/ServiceStack.Examples/blob/master/src/RedisStackOverflow/RedisStackOverflow.ServiceInterface/IRepository.cs#L180

My concern about using a PIPELINE is
that I believe it will block all other
clients while servicing the command.

This is not a valid concern and I wouldn't over think how Redis works here, assume it's doing it the most efficiently where Pipelining doesn't block processing of other clients commands. Conceptually you can think that redis-server processes each command (pipelined or not) in FIFO order (i.e. no time is wasted in waiting/reading the entire pipeline).

You're describing something closer to MULTI/EXEC (i.e. Redis Transactions) where all operations are done at once as soon as Redis server reads EXEC (i.e. EOF Transaction). This is not a problem either and redis-server still doesn't waste any time waiting to receive your entire transaction, it just queues the partial commandset in a temporary queue until it receives the final EXEC which is then processed all at once.

This is how redis achieves atomicity by processing each command, one at a time, as soon as it receives them. Since there are no other threads, there is no thread context switching, no locks and no multi-threading issues. It basically achieves concurrency by processing each command really fast.

So in this case I would use Pipelining as it's always a win, more so the more commands you pipeline (as you reduce the blocking read count).

岁月蹉跎了容颜 2024-11-08 21:14:48

个别操作确实会阻塞,但这并不重要,因为它们不应该长时间运行。听起来您检索的信息比实际需要的信息多 - 当您只需要 500 个项目时,HGETALL 将返回 100,000 个项目。

发送 500 个 HGET 操作可能会起作用(假设该集合同时存储哈希值和密钥),尽管使用哈希值可能是一种过早优化的情况 - 使用常规密钥和 MGET 可能会更好。

Individual operations do block, but it doesn't matter as they shouldn't be long running. It sounds like you are retrieving more information than you really need - HGETALL will return 100,000 items when you only need 500.

Sending 500 HGET operations may work (assuming the set stores both hash and key) though it's possible that using hashes at all is a case of premature optimization - you may be better off using regular keys and MGET.

甜扑 2024-11-08 21:14:48

我认为您误解了管道的作用。在发送所有命令时它不会阻塞。它所做的就是缓冲命令,然后在最后一次执行所有命令,因此它们的执行就像是一个命令一样。任何时候都不会发生阻塞。 redis multi/exec 也是如此。在 Redis 中,最接近阻塞/锁定的是使用 watch 进行乐观锁定,如果自调用以来已写入 Redis 键,这将导致 exec 失败观看

在管道块内调用 hget 500 次的效率更高的是仅调用 hmget('hash-key',*keys),其中 keys是您正在查找的 500 个哈希键的数组。这将导致对 redis 的一次调用,这与管道式调用相同,但执行速度应该更快,因为您没有在 ruby​​ 中循环。

I think you misunderstand what pipelining does. It doesn't block while all the commands are being sent. All it's doing is BUFFERING the commands, then executing them all at once at the end, so they are executed as if they are one single command. At no time is blocking occurring. The same is true for redis multi/exec. The closest thing you get to blocking/locking in redis is optimistic locking by using the watch, which will cause exec to fail if the redis key has been written to since you called watch.

Even more efficient that calling hget 500 times within a pipeline block is to just call hmget('hash-key',*keys) where keys is an array of the 500 hash keys you are looking up. This will result in a single call to redis, which is the same as if it was pipelined, but should be faster to execute since you aren't looping in ruby.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文