用于(快速)基于签名的匹配的编程库?
我接到的任务是构建一个应用程序,该应用程序将对某些网络流量(第 7 层)执行基于签名的匹配。匹配将实时进行,并且需要快速以便系统始终保持响应能力。
一开始我想过用正则表达式作为签名,用pcre作为匹配的库。但这似乎太慢了。将会有几千个签名进行匹配。
由于我在基于签名的内容匹配方面没有很多经验,所以我问:
- 我应该使用正则表达式作为签名并找到一些更快的库吗?
- 是否有其他库(免费或商业)用于基于签名的快速匹配?
I've been given an assignment to build an application that will perform signature based matching on some network traffic (at layer 7). Matching will be performed in real-time and it needs to be fast in order for the system to keep it's responsiveness at all times.
At first I thought about using regular expressions as signatures and pcre as a library for matching. But it seems this is too slow. There will be a few thousand signatures for matching.
Since I don't have a lot of experience in signature based content matching I am asking:
- Should I use regular expressions as signatures and find some faster library?
- Is there any other library (free or commercial) for signature based matching that is fast?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
为了构建一个高效的 RE 匹配器,我们将其编译成一个具有接受和不接受状态的有限状态机。
当您有多个 RE 时,您可以轻松地形成它们的析取,并将其编译成 FSA,其中包含不接受状态以及根据 RE 接受标记的状态。
因此,使用几千个 RE,您可以计算一个巨大的析取并为该集合构建 FSA。
大多数标准词法分析引擎(例如,FLEX)正是这样做的,每个令牌使用一个 RE,并会告诉您是哪个令牌。所以您应该能够使用 FLEX 作为起点。
To build an efficient RE matcher, one compiles it into a finite state machine with accept and not-accept states.
When you have more than one RE, you can easily form their disjunction, and compile that into an FSA, with not-accept states, and states marked according to which RE accepted.
So with a few thousand REs, you compute one huge disjunction and build the FSA for that set.
Most standard lexing engines (e.g., FLEX) do exactly this, using one RE per token, and will tell you which token. So you ought to be able to use FLEX as a starting place.