字符串搜索/通配符匹配
我目前正在做一个相对较小的项目供我的公司玩,它基本上是node.js中的代理,目前的功能相对简单
- 缓存
- Http(s)
- 黑名单
- 可配置
- 等。
我处于阶段我在其中构建黑名单系统,我的黑名单文件是一个普通文件,每个黑名单站点都在一行上。
现在将构建黑名单,以便您可以使用以下类型的黑名单值:
- google.com
- google.com/path
- ww2.google.com/path
- 202.55.66.201
- 202.55.66.[100-200]
现在在 node.js 中,当请求进来时,我可以使用的是来自客户端的请求 URL,然后将在 IP 缓存文件中查找该 URL,如果它不存在,则会对其进行 ping 操作,并获取该请求的 IP。
因此,手头有一些信息,1 是域,2 是 IP,3 是端口。
现在的问题是找到最快的方法来根据基于文件的黑名单检查这些值。
由于这些值不是直接查找,我不确定是否将其放入对象中并执行操作:
if(ip in blacklist || domain in blacklist || fullUri in blacklist)
{
//block
}
即使我确实这样做了,它也不会真正有益,因为我无法检查 IP 范围等,它缺乏对要求更高的站点黑名单技术的支持。
我正在考虑某种数据库系统,但这是我想避免的,所以基本上我问的是有某种方法可以在数据文件上执行通配符查找,而不会造成太多开销。
Iv'e currently been working on a relatively small project for my company to have a play with, its basically a proxy in node.js, the features at the moment are relatively simple
- Caching
- Http(s)
- Blacklist
- Configurable
- etc.
Im at the stage where im building the blacklisting system, and my blacklist file is a plain file that would have each blacklisted site on a single line.
Now the blacklist would be constructed so that you could the following types blacklist values:
- google.com
- google.com/path
- ww2.google.com/path
- 202.55.66.201
- 202.55.66.[100-200]
now within node.js when a request comes in i have available to me is the requested URL from the client side, this would then be looked up in the IP Cache file, if it does not exists it gets pinged and i get the IP for that request.
So have a few bits of information at hand, 1 being the domain, 2 being the IP, 3 being the port.
Now the problem is finding the fastest way to check these values against the file based blacklist.
As these values are not direct lookups im not sure if putting then into an object and doing:
if(ip in blacklist || domain in blacklist || fullUri in blacklist)
{
//block
}
Even if I did do that it would not really be beneficial as I cant check IP Ranges etc, it lacks support for the more demanding site blacklisting techniques.
I was thinking of some sort of database system but this is something I wanted to avoid, so basically what im asking is there some way to perform wild-card lookups on a datafile without causing too much overhead.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我认为更有效的方法是循环文件的每一行,并与您的信息进行比较 - 也将允许模式匹配 - 所以在伪代码中:
I think the more efficient way would be to loop each line of the file, and compare against your information - also would allow pattern matching - so in pseudo code:
您可以在启动 Nodejs 进程时加载该文件。然后,您可以处理整个文件并分为 3 个阵列(IP、域和端口)。
在内存中搜索元素速度很快。
然后,您可以使用
setInterval
重新加载文件内容并将其保存到内存中以获取最新的黑名单。You can load the file on booting your nodejs process. You can then process the whole file and separate in on 3 arrays (IP, domains and ports).
Searching elements on memory is fast.
You can then have a
setInterval
that reloads the contents of the file and save it to the memory to get the latest blacklist.