JMS(或任何消息传递解决方案)是否适合追随者/跟随模型
为了简单起见,我们假设我正在克隆 Twitter(我没有)。因此每个用户都可以关注其他用户,并被其他用户关注。对于您关注的每个用户,您都会收到他发送的所有推文。一切都存储在数据存储中(无论是 NoSQL 解决方案还是分片关系数据库)。
但是,当用户在线时,您认为让他们通过 JMS 接收推文是否合适,而不是轮询数据库并检索新推文:
- 当用户注册时(或当他登录时) ,一个 JMS 主题被创建,
- 当用户登录时,他以他(或他的 id)的名字命名,他订阅了他关注的每个用户的 JMS 主题,
- 会话范围的对象(每个用户)充当 JMS 消息 -侦听器
- 所有收到的消息都存储在会话中(内存中)
- 通过 ajax 轮询会话范围对象来更新 UI
- 当用户注销或会话超时时,
,消息侦听器将被销毁背后的想法据称,这是为了提高性能 - 即不要过于频繁地查询数据存储,而是将即时内容缓存在内存中。
当然,整个事情应该在集群中运行,并且是可扩展的。
但是我不确定:
- 这是否真的值得(就性能和可扩展性增益而言)
- JMS 是否不会增加不需要的开销,这相当于查询数据存储(从而使整个复杂性变得毫无用处
) (当它功能正常时)我会做一些基准测试,但我想听听一些初步的评论。
For the sake of simplicity, let's assume I'm cloning twitter (I'm not). So every user can follow other users, and be followed by other users. For each user you follow, you receive all tweets he sends. Everything is stored in a data storage (be it a NoSQL solution or a sharded relational database).
However, when users are online, do you think it is appropriate to have them receive tweets via JMS, rather than polling the database and retrieving new tweets:
- when a user registers (or when he logs-in), a JMS Topic is created, named after him (or his id)
- when a user logs-in he subscribes to the JMS Topic of each of the users he follows
- a session-scoped object (per-user) acts as a JMS message-listener
- all received messages are stored in the session (in-memory)
- the UI is updated via ajax polling of the session-scoped object
- when the user logs-out, or his session times-out, the message-listener is destroyed
The idea behind this is allegedly to boost performance - i.e. not to query the datastore too often, but rather to cache immediate things in memory.
The whole thing is of course expected to run in a cluster, and be scalable.
However I'm not sure:
- whether this is actually worth it (in terms of performance and scalability gains)
- whether JMS does not add an undesirable overhead, which is equal to querying the datastore (and hence making the whole complication useless)
At some point (when the thing is functional) I will make some benchmarks, but I'd like to hear some initial remarks.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
听起来很合理。您需要确保您选择的 JMS 实现支持潜在的大量主题 - 并非所有主题都能优雅地做到这一点。
我的主要设计问题是,当用户首次登录时,他的会话消息存储将为空,您必须等待它填满。无论如何,您是否都必须访问数据库,或者这不是问题。
另外,您在这里并没有真正利用 JMS 的事件驱动特性。从主题接收到的消息只是转储到会话存储中以供以后检索。
由于它并不是真正的事件驱动,您也许可以考虑使用分布式内存数据存储,例如 EhCache+JGroups 或 JBossCache3(我强烈推荐)。新的推文将被放入这个分布式存储中,读者只需在其中搜索感兴趣的内容即可。这可以提高内存效率,因为每个节点上只存储每条推文的一个副本。您还可以在系统启动时预加载缓存。
Sounds reasonable enough. You'd need to be sure that your JMS implementation of choice supports a potentially very large number of topics - not all of them can do that elegantly.
My main design question would be that when a user first logs in, his session store of messages would be empty, and you'd have to wait for it to fill up. Wouldn't you then have to hit the database anyway, or would this not be an issue.
Also, you're not really making use of the event-driven nature of JMS here. Messages received from the topic are just dumped into the session store for later retrieval.
Since it's not really event-driven, you could perhaps consider a distributed in-memory data-store instead, such as EhCache+JGroups, or JBossCache3 (which I can highly recommend). New tweets would be dropped into this distributed store, and readers would just need to trawl over that looking for items of interest. This could be more memory-efficient, since only one copy of each tweet would be stored on each node. You could also pre-load the cache at system start-up.
注意:我对您在问题中描述的系统设置没有实际经验,因此以下内容实际上是理论上的考虑。
一个因素是问题,您的用户在登录时会看到什么下一次:
情况 1:JMS 很好,因为队列可以记住哪些消息已经被传递。但是等等:这意味着,每条消息接收者都必须有一个队列。
情况 2:在这里,您可以真正处理每个消息发件人的主题,并使早于 x 小时的消息过期。因此,JMS 可能是一个不错的选择。
性能
JMS 实现通常可以将消息保留在内存中,并且要访问队列/主题中的消息,您不必在大型索引中搜索 - 所以我认为,这应该是比数据库更快,可能仍然比内存数据库更快。当节点发生故障时,您可以从后备数据库重新创建队列,也可以使用 高可用性 和 持久性内置于某些 JMS 实现中。
但我完全同意 skaffman 的观点:与分布式内存数据存储相比,您将使用更多内存。 JMS 的优点是,它简化了消息的自动过期(以及其他一些事情),我不知道重新实现该功能是否是一个好主意。
所以也许我会做的只是将 ID 保存在队列中,并将实际消息保存在 Java 对象缓存中。这样,您将再次需要使用索引,但您可以从 JMS 获得便利,并获得对象缓存的大部分内存效率。当缓存仅位于发送方一侧时,假设来自一个发送方的所有消息都驻留在一个(复制的)节点上,则甚至不必进行分发 - 但这可能取决于许多其他架构决策。
Note: I don't have practical experience with the system setup you described in your question, so the following are really theoretical considerations.
One factor is the question, what your users will see, when they log in the next time:
Case 1: JMS is nice, because a queue can remember, which messages have already been delivered. But wait: This means, you'd have to have one queue per message receiver.
Case 2: Here you can really work with topics per message sender, and expire messages that are older than x hours. So again, JMS may be a good option.
Performance
JMS implementations can usually keep the messages in-memory, and to access the messages in a queue/topic, you don't have to search in a large index - so I assume, that this should be faster than a database, probably still faster than an in-memory database. When a node fails, you can either re-create the queues from the backing database, or you could use high-availability and persistence built into some JMS implementations.
But I fully agree with skaffman: Compared to a distributed in-memory data-store, you'll be using more memory. The advantage of JMS is, that it simplifies automated expiry of messages (and some other things), and I don't know if it's a good idea to re-implement that functionality.
So maybe what I would do, is just saving IDs in the queues, and holding the actual messages in a Java object cache. This way, you'll once again have to use an index, but you get the convenience from JMS with most of the memory efficiency of the object cache. When the cache is only on the sender's side, it doesn't even have to be distributed, assuming that all messages from one sender reside on one (replicated) node - but that depends on probably a lot of other architectural decisions.