理想的算法方法——规则引擎/决策树和一些学习逻辑?
我的要求可能接近人们对“专家系统”的期望。并寻找最简单的解决方案,可以为我提供实时或近乎实时的推理,并具有一些离线(非实时)学习功能。
详细地说,我的问题是——
观看实时更新的日志,并将每个条目分类为红色、绿色和蓝色。 红、绿、蓝的分类是基于编码为生产规则的逻辑(正如我今天所想象的那样)。
具有挑战性的一点是 -
1)标记为蓝色的日志条目最终必须根据后续日志条目标记为红色/绿色,我们希望在其中获得更详细的信息,因此需要记住一点。确切的等待时间无法提前得知,但有一个最大限制。当然,在任何给定时间点,可能有数十万个标记为蓝色的条目。
2) 判定红&的规则绿色并不完美,因此有时标签会出现错误。因此,偶尔的人工审核会发现这些错误。我的主要挑战是看看我是否可以用最少的编程工作来自动化规则更新的某些部分。
我的(持续研究)表明基于 RETE 算法的规则引擎可能会为我的分类和分类服务。标签,包括重新标签。如果可行,我仍然需要弄清楚如何自动化“从错误中学习”的部分?可以采用一种统计方法——贝叶斯分类吗?另外,对于我已经充分手动训练系统的初始分类,可以将贝叶斯分类完全与规则引擎相比较吗?贝叶斯方法似乎通过“信任统计数据”方法“简化”了维护一组正确规则的任务,尤其是在存在这些定期手动审核的情况下。
PS>我的主要应用程序是用 C++ 编写的(如果这很重要的话)。
My requirement is probably close to what one expects of an "Expert System". And looking for the simplest solution, that can give me real-time or near-real time inference, with some offline (non-realtime) learning capabilities.
To elaborate, my problem is --
Watch a log that is being updated live, and classify each entry as Red, Green and Blue.
The classification into Red, Green, Blue is based on logic codified as production-rules (as I imagine it today).
The point where it gets challenging is --
1) Log entries tagged Blue will eventually have to be tagged red / green, based on subsequent log entries, where we hope to have more detailed information, so there is a bit of remembering to be done. The exact duration to wait, isn't known in advance, but there's a max limit. Of course, at any given point in time, there could be several hundred-thousand entries that are tagged Blue.
2) The rules that determine Red & Green are not perfect, so sometimes mistakes happen with labeling. So an occasional manual audit reveals these mistakes. My main challenge is to see if I could automate some part of rule-updating, with minimal programming effort.
My (continuing study) reveals that RETE algorithm based rule-engine might serve my classification & labeling, including the re-labelling. If that works, I still need to figure how to automate the part of "learning from mistakes" ? Can one take a statistical approach -- s.a. Bayesian classification ? Also, could one take the Bayesian classification completely as against Rules-Engine, for the initial classification s.t. I've manually trained the system sufficiently ? Bayesian approach seems to "dumb down" the task of maintaining a correct set of rules, by "trust the statistics" approach, especially as there are these periodic manual audits.
PS> My main application is written in C++ (if that matters).
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
这听起来像复杂事件处理 (CEP),其中您有规则并能够使用时间计算,例如事件 X 在事件 y 之后 2 分钟内。
在Java领域,Drools Fusion(或 Drools Expert)会处理这个问题真的很好(虽然我有偏见)。在C++领域...也许您可以设置一个
drools-camel-server
并通过XML与其进行通信。This sounds like Complex Event Processing (CEP), where you have rules and the ability to use time calculations like event X is within 2 minutes after event y.
In Java land, Drools Fusion (or Drools Expert) would handle that really well (I am biased though). In C++ land... well maybe you can set up a
drools-camel-server
and communicate through XML with it.