如何在 Express.js 服务器上过滤机器人
我创建了一个 Express Node.js API,并将其部署到 AWS(带有 2 个 EC2 实例的 Elasticbeanstalk)。 我正在使用 morgan-body 包来记录端点上的请求和响应,但似乎有大量机器人正在“攻击”我的 API,这会导致每个月产生数百万条日志,这让我在 datadog 上损失惨重。 我已经使用 morgan-boday 的内置“跳过”功能来根据用户代理过滤请求,但似乎每天都会出现新的请求。 有没有一种方法可以跳过各种机器人的日志记录,而不需要一一检查它们? 这是我的代码,非常感谢您的帮助! :)
morganBody(app, {
skip: (req, res) => {
if(req.get('user-agent')){
if (req.get('user-agent').startsWith('ELB-HealthChecker') ||
req.get('user-agent').startsWith('Mozilla') ||
req.get('user-agent').startsWith('Mozlila')||
req.get('user-agent').startsWith('Python')||
req.get('user-agent').startsWith('python')||
req.get('user-agent').startsWith('l9explore')||
req.get('user-agent').startsWith('Go-http-client')
) {
return true
}
}
return false},
logRequestBody:false,
logResponseBody: false
});```
I have created an express node.js API, and deployed it to AWS (Elasticbeanstalk with 2 EC2 instances).
I am using the morgan-body package to log the requests and responses on my endpoints, but it seems that tons of bots are "attacking" my API, and this results in millions of logs every months, which cost me a fortune with datadog.
I have used morgan-boday's built-in "skip" feature to filter requests based on the user agents, but new ones seem to appear every day.
Is there a way to skip logging for all kinds of bots, without checking them one by one ?
Here is my code, many thanks for your help ! :)
morganBody(app, {
skip: (req, res) => {
if(req.get('user-agent')){
if (req.get('user-agent').startsWith('ELB-HealthChecker') ||
req.get('user-agent').startsWith('Mozilla') ||
req.get('user-agent').startsWith('Mozlila')||
req.get('user-agent').startsWith('Python')||
req.get('user-agent').startsWith('python')||
req.get('user-agent').startsWith('l9explore')||
req.get('user-agent').startsWith('Go-http-client')
) {
return true
}
}
return false},
logRequestBody:false,
logResponseBody: false
});```
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
欢迎来到互联网。机器人/垃圾邮件检测是最需要解决的微不足道的问题之一。您添加的每个逻辑都可以通过客户端的反向逻辑来否定。
AWS 本身就有一个工具。
https://aws.amazon.com/waf/features/bot-control/< /a>
过滤流量的良好策略将基于用例。
一些建议。
互联网上应该有更多材料。
Welcome to internet. Bot/Spam detection is one of most trivial problem to solve. Every logic you add can be negated by reverse logic at the client side.
AWS itself has a tool for it.
https://aws.amazon.com/waf/features/bot-control/
A good strategy to filter traffic will be based on use case.
Some suggestions.
There should be more material available on internet.
我通过简单地跳过所有 GET 请求找到了部分答案:(
我仍然收到机器人的一些 POST 请求,这增加了我的日志量,但我仍然不知道如何过滤它们......)
I figured out part of the answer, by simply skipping all GET requests:
(I am still getting some POST requests by bots which increase my logs volumes and I still do not know how to filter them...)