在这种情况下我应该使用 Drools 吗?
我将使用大学的图书馆系统来解释我的用例。学生在图书馆系统中注册并提供个人资料:性别、年龄、院系、以前完成的课程、当前注册的课程、已借阅的书籍等。图书馆系统中的每本书都会根据学生的资料定义一些借阅规则,例如,计算机算法课本只能由当前注册该班级的学生借阅;另一本教科书只能由数学系的学生借用;也可以规定学生最多只能借两本计算机网络书籍。由于借阅规则的原因,当学生在图书馆系统中搜索/浏览时,他只会看到他可以借阅的书籍。因此,该要求实际上归结为有效生成学生有资格借阅的图书列表。
以下是我使用 Drools 进行设计的设想 - 每本书都会有一个规则,其中对学生资料有一些字段约束(LHS),书籍规则的 RHS 只是将书籍 ID 添加到全局结果列表中,然后是所有书籍规则被加载到规则库中。当学生搜索/浏览图书馆系统时,从规则库创建一个无状态会话,并且学生的个人资料被断言为事实,然后学生可以借阅的每本书都会触发其图书规则,您将获得完整的图书列表学生可以在全局成绩列表中借阅。
一些假设:图书馆将处理数百万册图书;我不希望书上的规则太复杂,平均每条规则最多3个简单的字段约束;系统需要处理的学生数量在100K左右,负载相当大。我的问题是:如果加载一百万本书规则,Drools 会占用多少内存?所有这百万条规则的触发速度有多快?如果 Drools 适合您,我想听听经验丰富的用户设计此类系统的一些最佳实践。谢谢。
I'll use a university's library system to explain my use case. Students register in the library system and provide their profile: gender, age, department, previously completed courses, currently registered courses, books already borrowed, etc. Each book in the library system will define some borrowing rules based on students' profile, for example, a textbook for the computer algorithm can only be borrowed by students currently registered with that class; another textbook may only be borrowed by students in the math department; there could also be rules such that students can only borrow 2 computer networking book at most. As a result of the borrowing rules, when a student searches/browses in the library system, he will only see the books that can be borrowed by him. So, the requirement really comes down to the line of efficiently generating the list of books that a student is eligible to borrow.
Here is how I vision the design using Drools - each book will have a rule with a few field constraints on the student profile as LHS, the RHS of the book rule simply adds the book id to a global result list, then all the book rules are loaded into a RuleBase. When a student searches/browsers the library system, a stateless session is created from the RuleBase and the student's profile is asserted as the fact, then every book that the student can borrow will fire its book rule and you get the complete list of books that the students can borrow in the global result list.
A few assumptions: the library will handle millions of books; I don't expect the book rule be too complicated, 3 simple field constraints for each rule on average at the most; the number of students that the system needs to handle is in the range of 100K, so the load is fairly heavy. My questions are: how much memory will Drools take if loaded with a million book rules? How fast will it be for all those million rules to fire? If Drools is the right fit, I'd like to hear some best practices in designing such a system from you experienced users. Thanks.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
首先,不要为每本书制定规则。对限制制定规则——定义的限制比书本少得多。这将对运行时间和内存使用产生巨大影响。
通过规则引擎运行大量书籍的成本将会很高。特别是因为您不会向用户显示所有结果:每页仅显示 10-50 个结果。我想到的一个想法是使用规则引擎来构建一组查询条件。 (我实际上不会这样做——见下文。)
这就是我的想法:
但我实际上不会这样做!
这就是我改变问题的方式:
不向用户展示书籍是一种糟糕的体验。用户可能想要仔细阅读这些书籍,以了解下次要获取哪些书籍。展示书籍,但不允许借出受限书籍。这样,每个用户一次只需运行 1-50 本书即可运行规则。这会非常敏捷。上述规则将变为:
我使用激活组来确保仅触发一条规则,并使用显着性来确保它们按照我希望的顺序触发。
最后,缓存规则。 Drools 允许并建议您将规则仅加载到知识库中一次,然后从中创建会话。知识库很昂贵,课程很便宜。
First, Don't make rules for every book. Make rules on the restrictions—there are a lot fewer restrictions defined than books. This will make a huge impact on the running time and memory usage.
Running a ton of books through the rule engine is going to be expensive. Especially since you won't show all the results to the user: only 10-50 per page. One idea that comes to mind is to use the rule engine to build a set of query criteria. (I wouldn't actually do this—see below.)
Here's what I have in mind:
But I wouldn't actually do that!
This is how I would have changed the problem:
Not showing the books to the user is a poor experience. A user may want to peruse the books to see which books to get next time. Show the books, but disallow the checkout of restricted books. This way, you only have 1-50 books to run through the rules at a time per user. This will be pretty zippy. The above rules would become:
Where I am using activation-group to make sure only one rule is fired, and salience to make sure they are fired in the order I want them to be.
Finally, keep the rules cached. Drools allows—and suggests that—you load the rules only once into a knowledge base and then create sessions from that. Knowledge bases are expensive, sessions are cheap.
我对 Drools(或一般的规则引擎)的经验是,如果用户对规则的可见性很重要,或者如果在不使其成为编码项目的情况下快速更改规则很重要,或者如果规则集很重要,那么它是一个很好的选择非常大,因此很难在代码中进行管理、思考和分析(因此,业务人员会要求技术人员去阅读代码并告诉他们在情况 X 中会发生什么)。
话虽这么说,规则引擎可能是一个瓶颈。它们运行的性能与代码的性能相差甚远,因此您确实需要在架构上进行预先管理。在这种特定情况下,这背后肯定有一个数据库,并且您可以添加性能问题,即数据库返回查询的速度比您在代码中分析整个集合的速度要快得多。
我绝对不会通过创建一百万个规则对象来实现这一点,而是创建一个可以分配多本书的书籍类型,并针对书籍类型运行规则,然后仅显示允许类型的书籍。通过这种方式,您可以加载类型,将它们传递给规则引擎,然后将允许的类型推送到数据库端的查询,该查询会拉取允许类型的书籍列表。
类型变得有点复杂,因为在实践中,一本书可能有两种类型(如果您正在学习某门课程,或者一般来说,如果您是该系的一员,则允许),但该方法仍然适用。
My experience with Drools (or a rules engine in general) is that it is a good fit if user visibility into the rules are important, or if fast changes to the rules without making it a coding project is important, or if the set of rules is very large making it hard to manage, think about and analyze in code (so you would have business people asking technical people to go read the code and tell them what happens in situation X).
That being said, rules engines can be a bottleneck. They don't run anything close to the performance of code, so you do need to manage that up front architecturally. In this specific case there is certainly a database behind this, and you can add to the performance issues that the database will return a query a whole lot faster than you can analyze the whole set in code.
I would absolutely not implement that by making a million rules objects, rather I would make a book type that multiple books can be assigned to, and run the rules against the book types, and then only show books that are in an allowed type. This way you could load the types, pass them through the rules engine, and then push the allowed types to a query on the database end that pulls the list of books in the allowed types.
Types get a bit complicated by the fact it will be likely that in practice a book may be of two types (allowed if you are taking a certain course, or in general if you are part of the department), but the approach should still hold.
您的计算机有多快以及有多少内存?从某种意义上说,您只能通过构建概念证明并用正确数量的(随机生成的)测试数据填充它来找到答案。我的经验是,Drools 比您预期的要快,并且您必须非常了解幕后的内容,才能预测是什么会使其变慢。
请注意,您正在谈论一百万个规则会话事实(即 Book 对象),而不是一百万个规则。规则只有少数,不需要很长时间就能实施。可能较慢的部分是插入数百万个对象,因为 Drools 需要决定将哪些规则放入每个新事实的议程中。
遗憾的是,我们没有人能用一百万个事实来回答某些特定的设置。
至于实现,我的方法是为学生想要借出的每本书插入一个 Book 对象,收回不允许的图书,并执行一个查询来获取剩余的(允许的)Book 对象,然后再执行另一个查询获取原因列表。或者,使用具有可在规则中设置的附加
boolean allowed
和String ReasonDisallowed
属性的 RequestedBook 对象。How fast is your computer and how much memory have you got? In one sense you can only find out by building a proof of concept and filling it with the right quantity of (randomly-generated) test data. My experience is that Drools is faster than you expect, and that you have to have very good knowledge of what's under the hood to be able to predict what is going to make it slow.
Note that you are talking about a million rule session facts (i.e. Book objects), not a million rules. There are only a handful of rules, which won't take long to fire. The potentially slow part is inserting the million objects, because Drools needs to decide which rules to put on the Agenda for each new fact.
It's a shame that none of us has an answer for some particular set-up with a million facts.
As for the implementation, my approach would be to insert a Book object for each book that the student wants to check out, retract the ones that are not allowed, and a query to get the remaining (allowed) Book objects, and another query to get the list of reasons. Alternatively, use RequestedBook objects that have additional
boolean allowed
andString reasonDisallowed
properties that you can set in your rules.每当我们查看大型数据集(这个问题是关于……Drools 是否适合大型数据集情况)时,请跳出框框思考(如下)。任何时候我们谈论“数百万个对象”或类似的 log-N 类型问题时,我认为它们所讨论的工具不一定是问题所在。所以,是的,可以使用 Drools(或 JBoss 规则),但只有在特定的上下文中才有意义......
当你有任何东西的 log-N 时(根据输入交叉引用大型数据集),我建议使用更新颖的诸如数据库支持的布隆过滤器之类的方法。这些可以作为 Java 对象实现,并由 Drools 引用以进行事实查找(但是可以自定义编码)。
由于布隆过滤器是微小的内存结构,仅具有基本的 insert()/contains() 函数,因此它们确实有一个缺点……大约 1% 的误报率。所以这将充当主缓存。如果构造 Drools 问题的答案通常为“否”,则 Bloom Filter 支持的事实表构造查找将快如闪电,并且内存占用极小(在我的实现中每条记录大约 1.1 字节),因此需要 1 MB RAM这个案例。然后,在“包含”情况下(可能是误报),使用数据库支持的事实表进行澄清。同样,如果在 80% 的时间里,查找都是错误的,那么 Bloom Filter 将在内存和时间上节省巨大的成本。否则,纯粹的(任何东西 - Drools 事实、数据库等)每次 1M 记录查找都会非常昂贵(在内存和速度方面)。
Any time we are looking at large data-sets (which this question is about ... whether or not Drools is a good fit in a large data set case), think outside the box (below). Any time we are talking about "millions of objects" or similar log-N type problems, I don't think they tool in question is necessarily the problem. So yes, Drools (or JBoss Rules) can be used BUT would only make sense in a certain context...
When you have log-N of anything (cross-referencing large data-sets against inputs), I would recommend using more novel approaches like database-backed Bloom Filters. These can be implemented as Java objects and referenced by Drools for the fact lookup (custom coding there however).
Since Bloom Filters are tiny memory structures with only basic insert()/contains() functions, they do have a drawback ... about a 1% false-positive rate. So this will serve as a primary-cache. If constructing the Drools question to generally be "NO" as the answer, a Bloom Filter backed fact-table construct lookup will be lightning fast and with a tiny memory footprint (about 1.1 bytes per record in my implementation) so 1 MB of RAM for this case. Then in the "contains" case (which might be a false-positive), use the database-backed fact table to clarify. Again, if in 80% of the time, the lookup is false, then the Bloom Filter will be a huge cost-savings in memory and time. Otherwise, the pure (anything - Drools facts, database, etc) 1M record lookups will be very expensive every time (in memory and speed).
我担心规则的数量是否需要与学生数量相关——这确实会让事情变得棘手(这听起来像是最大的问题)。
I would be worried about the need to have the number of rules a function of the number of students - that could really make things tricky (that sounds like the biggest problem).