对于我正在开发的系统,我遇到了一些问题:我搞乱了 HTTP 的基本规则之一,并且允许用户通过 GET 请求发布数据。
不要生我的气:我这样做是有原因的:用户从外部环境到达我的应用程序,我无法提示他们进行任何额外的输入(因此所有必要的数据都在 GET 查询中)。他们应该能够在浏览器窗口打开后立即关闭它,并且应该保存输入。不,我无法通过 AJAX、API 或其他底层方法来做到这一点。
这些要求有点排除了验证码、计算、表单等。所以我剩下的问题是,我确实需要某种类型的验证来防止机器人/爬虫“意外”提交某些内容。
我正在研究的解决方案之一是制作一个非常轻量级的登陆页面,通过 javascript onload 提交自身,但这将是我的应用程序中最丑陋的事情,所以我试图阻止它。另一种方法是让登陆页面不执行任何处理,而是使用 AJAX 调用来执行此操作。然而,这意味着旧版浏览器(以及许多手机)将不得不使用另一种解决方案。
背景:用 PHP5.3 编写的应用程序,基于 Yii Framework 构建,100% 跨浏览器兼容(这包括几乎所有的手机)。
更多背景知识:我所说的“外部环境”从电子邮件客户端到网站各不相同。在运行时操纵我们的内容是不可能的。
更新:这就是我要做的:我可能会将此处发布的解决方案合并到后备机制中,以便尝试一系列验证:
1.ajax验证
2.非Ajax javascript验证(表单自动提交)
3. 提示用户输入(用户必须单击确认按钮)
除此之外,我将实现一个机器人陷阱,如 http://www.kloth.net/internet/bottrap.php
完成构建后,如果我做了任何不同的事情,我将更新该帖子。
For a system I'm working on I've got a bit of a problem: I'm messing with one of the basic rules of HTTP and I'm allowing users to post data through a GET request.
Don't get mad at me yet: I've got a reason for this: Users arrive in my application from an external environment and I can't prompt them for any extra input (so all necessary data is in the GET query). They should be able to close the browser window right after it opens and the input should be saved. And no, I can't do this through AJAX, an API or other under-the-hood method.
These requirements kind of rule out captcha, calculations, forms etc. So I'm left with the problem that I really do want some type of verification to prevent bots/crawlers from "accidentally" submitting something.
One of the solutions I am looking into is making a very lightweight landing page that submits itself through javascript onload but it would be the ugliest thing in my application so I'm trying to prevent it. Another is to let the landingpage not do any of the processing but instead use an AJAX-call to do this. This would however mean that older browsers (and many mobile phones) would have to use another solution.
Background: Application written in PHP5.3, built on Yii Framework, 100% cross-browser compatible (this includes pretty much every mobile phone out there).
Some more background: The "exteral environments" I'm talking about vary from e-mail clients to websites. Manipulation of our content at runtime isn't possible.
Update: Here's what I'm going to do: I'm probably going to combine solutions posted here in a fallback mechanism so that a chain of verifications will be attempted:
1. Ajax verification
2. Non-Ajax javascript verification (automatic form submission)
3. Prompt for user input (user has to click a confirm button)
Besides this I'm going to implement a bot trap as descripbed by http://www.kloth.net/internet/bottrap.php
After I'm done with building this I'll update the post if I did anything different.
发布评论
评论(2)
如果您能够修改用户来自的位置,您可以尝试包含校验和。计算 GET 请求中所有字段的某种校验和或哈希,并将其添加到 GET 请求本身(即通过 javascript,但在用户来自的地方进行,而不是在他们登陆的地方进行)。然后,在您的应用程序中,拒绝所有校验和不正确的命中。
If you are able to modify the place that your users are coming fro, you could try including a checksum. Calculate some kind of checksum or hash of all the fields in the GET request and add it to the GET request itself (i.e. through javascript, but do it in the place your users are coming from, not where they are landing). Then, in your application, reject all hits with an incorrect checksum.
很难理解你的应用程序在哪里以及外部环境到底在哪里。但我使用的一种简单的机器人删除技术是放置一个名为“登录”或“名称”的隐藏字段,并为其指定一个空值。
人类永远不会填补这个隐藏的领域,但垃圾邮件机器人总是在填补它。因此,您可以丢弃该字段不为空的任何请求。
现在您必须防止爬虫,而不仅仅是垃圾邮件机器人。从来没有这样做过,但这里有一些想法。您可以在第一个 mouseMove 事件的表单中添加隐藏的“人类”隐藏输入(但仅限键盘 - 并考虑盲人 - 用户将被视为机器人)。因此,如果该字段不存在,您可以启动 JavaScript“确认”,在其中询问“确认您是机器人,或者如果您是人类,则单击取消”。
您可以使锚链接包含默认值,此隐藏字段值将在 js 中覆盖该默认值。大多数爬网程序不会覆盖这些值,特别是如果您必须取消确认才能获得正确的行为(并避免对大多数用户使用 mouseMove 事件进行确认)。
Hard to understand where you app is and where external environment really are. But one simple bot-removal technique I use is to put an hidden field named 'login' or 'name' and give it an empty value.
Human people will never fill this hidden field, but spam bots are always filling it. So you can discard any request with that field being not empty.
Now you must prevent crawlers and not only spam bots. Never did it, but here are some thoughts. You could add a hidden 'human' hidden input in the form on first mouseMove events (but keyboard-only -- and think about blind people -- users will be considered as robots). So maybe if this field is not there you can launch a javascript 'confirm' where you ask "Confirm that you are a robot or click cancel if you are human".
You can make your anchor link containing a default value that this hidden field values will overwrite in js. Most crawlers will not overwrite the values, especially if you must cancel a confirmation to get the right behavior (and avoid confirmation with mouseMove event for most users).