核心规则引擎之外
是否有在核心之外运行的生产规则系统的实现?
我已经检查了开源实现,例如 CLIPS 和 Jess,但这些仅在内存中运行,因此在处理大量事实和规则(例如数十亿/万亿)时,它们往往会崩溃或强制进行大量磁盘交换。
我正在考虑可能移植一个简单的规则引擎的想法,例如 Pychinko 到SQL 后端,使用 Django 的 ORM。然而,支持 CLIPS 中的功能级别将是非常重要的,我不想重新发明轮子。
是否有任何替代方案可以扩展生产规则系统?
Are there any implementations of production rule systems that operate out of core?
I've checked out the open source implementations like CLIPS and Jess, but these only operate in memory, so they tend to crash or force heavy disk swapping when operating on large numbers of facts and rules (e.g. in the billions/trillions).
I'm playing around with the idea of possibly porting a simple rules engine, like Pychinko to a SQL backend, using Django's ORM. However, supporting the level of functionality found in CLIPS would be very non-trivial, and I don't want to reinvent the wheel.
Are there any alternatives for scaling up a production rule system?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您可以检查 JENA 和类似的 RDF 规则引擎,它们设计用于处理非常大的事实数据库。
you can check JENA and similar RDF rule engines which are designed to work with very large fact databases.
这不是对您的问题的直接答案,但它可能会给您提供解决问题的思路。
早在 80 年代和 90 年代,我们就部署了一个信息检索系统,该系统允许大量的长期查询。具体来说,我们的系统具有 64MB 内存(在当时是一个屁股负载),每天摄取超过一百万条消息,并对该流应用 10,000 到 100,00 多个常设查询。
如果我们所做的只是针对最新文档迭代应用每个常设查询,那么我们就已经死了。我们所做的是对查询执行某种反转 ,具体标识查询中必须具有和可能具有的术语。然后,我们使用文档中的术语列表来查找那些有可能成功的查询。客户学会了创建具有强大差异性的查询,因此有时只需完全评估 10 或 20 个查询。
我不知道你的数据集,也不知道你的规则是什么样的,但你可能可以尝试类似的东西。
This isn't a direct answer to your question, but it may give you a line of attack on the problem.
Back in the 80's and 90's we fielded an information retrieval system that allowed for very large numbers of standing queries. Specifically, we had systems with 64MB of memory (which was a buttload in those days) ingesting upwards of a million messages a day and applying 10,000 to 100,00+ standing queries against that stream.
If all we had done was to iteratively apply each standing query against the most recent documents we would have been dead meat. What we did was to perform a sort of inversion of the queries, specifically identifying the must have and may have terms in the query. We then used the term list from the document to find those queries that had any sort of chance to succeed. The customer learned to create queries that had strong differentiators and, as a result, sometimes only 10 or 20 queries had to be fully evaluated.
I don't know your dataset, and I don't know what your rules look like, but there might be something similar you could try.