如何确定访问您网站的用户是否是机器人?

发布于 2024-08-03 14:29:36 字数 80 浏览 9 评论 0原文

我知道用户代理是一个指标,但这很容易被欺骗。还有哪些其他可靠指标可以表明访问者确实是机器人?标题不一致?是否需要图像/javascript?谢谢!

I know that user agents are one indicator, but that's easy to spoof. What other reliable indicators are there that a visitor is really a bot? Inconsistent headers? Whether images/javascript are requested? Thanks!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

帅气尐潴 2024-08-10 14:29:36

CVSTrac 使用 honeypot 页面来完成此操作。这是一个链接到网站某处的页面,爬虫可以到达该页面,但人们通常会忽略它。 CVSTrac 更进一步,允许用户证明他是人类。

CVSTrac uses a honeypot page to accomplish this. It's a page linked somewhere on the site where crawlers reach it, but humans usually ignore it. CVSTrac goes one step further by allowing the user to prove that he is human.

温柔女人霸气范 2024-08-10 14:29:36

“是否需要图像/javascript?”我会选择这个,但是 Google 和其他人现在要求图像和 javascript 文件。

请求时间速度怎么样?机器人阅读您的内容的速度比人类快得多。

"Whether images/javascript are requested?" I would go for this one, however Google and others request images and javascript files nowadays.

How about request time speed? Bots read your content a lot faster than humans do.

超可爱的懒熊 2024-08-10 14:29:36

我们要查找 4 项内容:

  • 用户代理字符串。它很容易伪造,但爬虫通常会使用自己独特的用户代理字符串。

  • 页面的访问速度,如果每半秒左右访问超过一个,通常是一个很好的指示

  • 他们是否只请求 HTML,或者是否请求整个页面。有些爬虫只会询问 HTML 结构。这通常是一个很好的提示。

  • 传入 url

There are 4 things that we look for:

  • The user agent string. It is very easy to fake, but often crawlers will use their own unique user agent string.

  • The speed of access of pages, if they access more than one every half second or so, that's usually a good indication

  • If they request just the HTML, or if they request the entire page. Some crawlers will only ask for the HTML structure. This is usually a good tip off.

  • The incoming url

一身骄傲 2024-08-10 14:29:36

某种反向验证码也有帮助;您可以创建一个带有 display: none; 的文本输入字段在它的样式属性(或你的样式表)中。如果它被发布到,那么您很可能正在与机器人打交道。

编辑:这实际上是我的 RSS 阅读器中聚合的内容,如果我能找到来源,我会链接一个很好的示例。

A reverse captcha of sorts can help as well; you could create an text input field with display: none; in it's style attribute (or your stylesheet). If it's posted to, chances are you're dealing with a bot.

Edit: This was actually something that had been aggregated in my RSS reader, if I can find the source, I'll link a good example.

离笑几人歌 2024-08-10 14:29:36

看一下不良行为,这是一个采用多种机器人检测技术的库

Take a look at Bad Behavior, a library which employs a wide array of bot detection techniques

静谧幽蓝 2024-08-10 14:29:36

这不就是 验证码 的发明目的吗?

Isn't that what captcha is invented for?

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文