年龄验证表格和爬虫
我创建了一个有关某些啤酒品牌的网站,并且必须包含年龄验证页面。验证脚本是用PHP编写的,并使用会话来存储验证变量。该脚本的工作方式是,无论您尝试通过哪个链接进入网站,它都会首先带您进入验证页面。验证非常简单。有 2 个按钮:“我未满 21 岁”和“我已超过 21 岁”。如果您点击后者,您就可以浏览该网站。
一段时间后我发现网络爬虫无法通过验证页面。我在谷歌网站管理员工具中检查了该网站,扫描到的唯一文本内容来自验证页面。
我在某处读到爬虫无法提交表单按钮,是真的吗?
考虑到年龄验证页面无论如何都是无用的,也许我应该将其保留为起始页面,但不要禁止绕过它,例如从到子页面的链接?
I have created a website about some beer brand and had to include age verification page. The verification script is written in PHP and uses sessions to store verification variable. The script works the way that no matter form which link you will try to enter the website it will take you to the verification page first. The verification is very simple. There are 2 button: "I'm under 21" and "I'm over 21". If you click the latter, you can browse the website.
After some time I discovered that the web crawlers are not able to get past verification page. I checked the website in Google webmaster tools and the only text content scanned was from the verification page.
I read somewhere that crawlers are not able to submit form buttons, is it true?
Considering the fact that age verification pages are useless anyways, maybe I should just leave it as a starting page but don't forbid going around it, e.g. from links to the subpages?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
为什么不制作按钮链接而不是提交按钮。
Why not make the buttons links instead of submit buttons.
只需让您的年龄验证页面检测主要的爬虫用户代理并重定向到主要内容页面即可。您可以在同一代码块中自动设置所需的任何变量。
Just have your age verification page detect the major crawler user agents and redirect to a main content page. You can set whatever variables are necessary automatically in the same code block.