事件关联和过滤 - 如何、从哪里开始?
获得异步事件流,其中每个事件都包含以下信息 -
- 代理机构(我的解决方案可能提供服务的多个代理机构之一)
- 代理(代理机构中的多个代理之一)
- 服务实体(由 1 或 1 个代理服务的个人/组织)更多机构)
- 日期+时间
- 类数据(来自固定但大量标签的标签)
我需要做的是 --
关联基于服务实体、日期+时间和类数据的事件,以及创建一个合并的新事件。示例:
事件 #0021:{ 机构='XYZ',代理='ABC',服务实体='MMN',日期+时间='12-03-2011/11:03:37',班级日期= '错过递送、不重复、无法追踪、孤儿' }
事件#0193:{ 代理='KLM',代理='DAY',服务实体='MMN',日期+时间='12-03-2011/12:32:21',班级日期= '错过交货、孤儿、丢失' }
事件#1217:{ 代理='KLM',代理='CARE',服务实体='MMN',日期+时间='12-03-2011/18:50:45',班级日期= “升级”}
在这里,我发现 3 个事件在时间上有间隔(间隔超过 7 小时),它们针对同一服务实体 (MMN),在特定时间窗口(例如 24 小时)内发生,具有匹配或相关类数据。
最后创建一个可以表示所得出的推理的合并(新)事件。
能够基于特定时间段内的特定类数据标签(例如错过的投递)等内容,针对每个代理机构、每个代理机构、每个服务实体创建报告。这可以使用原始/输入事件或合成(推理)事件来完成。
虽然这不是今天的要求,但很可能在未来出现,类数据中出现的“标签”可以增长,而无需任何人为干预。所以不确定这是否应该被视为非结构化数据。
也不是立即要求,但将来可能需要识别事件发生的趋势/模式(即事件1导致事件2导致事件3)。
事件到达率可能相当高...可能每分钟数千个事件。也许更多。而且,我需要将原始/合成事件存档一段时间(一个月左右)。
我的解决方案需要基于 FOSS 组件(最好)。迄今为止所做的一些研究都指向 CEP(复杂事件处理)、贝叶斯网络/分类、预测分析的方向。
寻找一些有关采取方法的建议。我更愿意走一条以最小的难度/时间满足我的大部分目标的道路,或者换句话说,“学习人工智能”或“正式的统计方法”不是我的短期目标:-)
Got an asynchronous stream of events, where each event has information like -
- Agency (one of many Agencies possible to be served by my solution)
- Agent (one of many Agents in an Agency)
- Served-Entity (a person/organization served by 1 or more agencies)
- Date+Time
- Class-Data (tags from a fixed but large set of tags)
What I need to do is to --
Correlate an event based on Served-Entity, Date+Time and Class-Data, and create a consolidated new Event. Example:
Event #0021: { Agency='XYZ', Agent='ABC', Served-Entity='MMN', Date+Time='12-03-2011/11:03:37', Class-Date='missed-delivery,no-repeat,untracable,orphan' }
Event #0193: { Agency='KLM', Agent='DAY', Served-Entity='MMN', Date+Time='12-03-2011/12:32:21', Class-Date='missed-delivery,orphan,lost' }
Event #1217: { Agency='KLM', Agent='CARE', Served-Entity='MMN', Date+Time='12-03-2011/18:50:45', Class-Date='escalated' }
Here I find 3 events which are spaced out in time (more than 7hr separation), which are for the same Served-Entity (MMN), occur within a certain time window (say 24-hours), have matching or related Class-Data.
Finally create a consolidated (new) event which could represent an inference drawn.
Be able to create reports on a per Agency, per Agency, per Served-Entity basis, based on things like specific Class-Data tags (e.g. missed-delivery) over a certain period of time. This could be done using the original/input events, or the synthesized (inference) events.
While this is not a requirement today, but quite likely to appear in future, that the "tags" that appear in Class-Data could grow, without any human intervention. So not sure if this should then be treated as unstructured data.
Also not an immediate requirement, but in future there may be a need to identify trends / patterns of event occurrences (i.e. Event1 led to Event2 led to Event3).
The event arrival rate could be quite high... possibly thousands of events per minute. Maybe more. And, I need to archive the original/synthesized events for a period of time (a month or so).
My solution needs to be based on FOSS components (preferably). Some research done so far, points in the direction of CEP (Complex Event Processing), Bayesian-Networks/Classification, Predictive-Analytics.
Looking for some suggestions regarding approach to take. I'd prefer to take the path which meets most of my goals, with minimum difficulty/time, or to put another way, "learning AI" or "formal statistical methods" isn't my short-term goal :-)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
迈克,
您是否考虑过像 Esper/Nesper 这样的东西,看看它们是否可以满足您的要求?虽然我自己也看过类似的东西——尤其是在 Erlang 上(查看我的帖子 此处),您会在那里找到一些有用的答案。
我知道了
Mike,
did you consider something like Esper/Nesper to see if they might meet your requirements ? while I've looked at something similar myself -- especially on Erlang (check my post here), and you'd find some useful answers there.
IC
你的问题是一个战术问题,而不是一个程序问题。两种类型都有自己的一套工具,如果您尝试使用程序工具解决战术问题,您将陷入痛苦的世界。
只是为了澄清术语,当我说程序时,我指的是用例,您可以说先做 X,然后做 Y,然后做 Z。对于战术问题,X、Y 和 Z 可以随时发生,并且您必须能够处理事件。
您的 CEP 走在正确的轨道上。您还可以考虑使用规则引擎。你没有提到你的开发环境是什么,但如果是 Java,你可以看看 Jess。如果您确实想要一个漂亮且强大的规则引擎,请查看 Tibco Business Events。它非常强大且容错,但绝对不是免费的。
Your problem is a tactical problem as opposed to a procedural problem. Both types have their own set of tools, and you will be in a world of pain if you try to solve a tactical problem with procedural tools.
Just to clarify terms, when i say procedural, I am talking about use cases where you can say do X, then Y, then Z. With Tactical problems, X, Y, and Z can occur at anytime, and you must be able to handle the event.
You are on the right track with CEP. You might also look into using a rules engine. You didn't mention what your dev environment is, but if its Java, you might take a look at Jess. If you really want a nice and robust rules engine, take a look at Tibco Business Events. It is very powerful and fault tolerant, but definitely not free.