使用 HTTP_REFERER 阻止用户访问站点内部

发布于 2024-07-04 01:52:19 字数 791 浏览 6 评论 0原文

我可以控制 HttpServer,但不能控制那里的 ApplicationServer 或 Java 应用程序,但我需要阻止对这些应用程序上某些页面的直接访问。 准确地说,我不希望用户自动访问向适当的 servlet 发出直接 GET/POST HTTP 请求的表单。

因此,我决定根据 HTTP_REFERER 的值来阻止用户。 毕竟,如果用户在网站内部导航,它将有一个适当的 HTTP_REFERER。 嗯,我就是这么想的。

我在 .htaccess 文件中实现了一条重写规则,其中表示:

RewriteEngine on 

# Options +FollowSymlinks
RewriteCond %{HTTP_REFERER} !^http://mywebaddress(.cl)?/.* [NC]
RewriteRule (servlet1|servlet2)/.+\?.+ - [F]

我希望禁止访问未导航站点但使用查询字符串向“servlet1”或“servlet2”servlet 直接发出 GET 请求的用户。 但我的期望突然结束,因为正则表达式 (servlet1|servlet2)/.+\?.+ 根本不起作用。

当我将该表达式更改为 (servlet1|servlet2)/.+ 时,我真的很失望,它运行得非常好,以至于我的用户无论是否浏览该网站都会被阻止。

所以,我的问题是:如果我没有访问/权限/时间来修改应用程序,我该如何完成不允许“机器人”直接访问某些页面的事情?

I have control over the HttpServer but not over the ApplicationServer or the Java Applications sitting there but I need to block direct access to certain pages on those applications. Precisely, I don't want users automating access to forms issuing direct GET/POST HTTP requests to the appropriate servlet.

So, I decided to block users based on the value of HTTP_REFERER. After all, if the user is navigating inside the site, it will have an appropriate HTTP_REFERER. Well, that was what I thought.

I implemented a rewrite rule in the .htaccess file that says:

RewriteEngine on 

# Options +FollowSymlinks
RewriteCond %{HTTP_REFERER} !^http://mywebaddress(.cl)?/.* [NC]
RewriteRule (servlet1|servlet2)/.+\?.+ - [F]

I expected to forbid access to users that didn't navigate the site but issue direct GET requests to the "servlet1" or "servlet2" servlets using querystrings. But my expectations ended abruptly because the regular expression (servlet1|servlet2)/.+\?.+ didn't worked at all.

I was really disappointed when I changed that expression to (servlet1|servlet2)/.+ and it worked so well that my users were blocked no matter if they navigated the site or not.

So, my question is: How do I can accomplish this thing of not allowing "robots" with direct access to certain pages if I have no access/privileges/time to modify the application?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(9

弥繁 2024-07-11 01:52:20

您无法通过 http 请求来区分用户和恶意脚本。 但您可以分析哪些用户在短时间内请求过多页面,并阻止他们的 IP 地址。

You can't tell apart users and malicious scripts by their http request. But you can analyze which users are requesting too many pages in too short a time, and block their ip-addresses.

标点 2024-07-11 01:52:20

使用引荐来源网址作为验证方法非常不可靠。 正如其他人提到的,它很容易被欺骗。 最好的解决方案是修改应用程序(如果可以的话)

您可以使用验证码,或者设置某种 cookie 或会话 cookie 来跟踪用户上次访问的页面(会话将更难欺骗)并跟踪页面查看历史记录,并且仅允许浏览过所需页面的用户访问您要阻止的页面。

这显然要求您有权访问相关应用程序,但这是最简单的方法(不完全,但在我看来“足够好”。)

Using a referrer is very unreliable as a method of verification. As other people have mentioned, it is easily spoofed. Your best solution is to modify the application (if you can)

You could use a CAPTCHA, or set some sort of cookie or session cookie that keeps track of what page the user last visited (a session would be harder to spoof) and keep track of page view history, and only allow users who have browsed the pages required to get to the page you want to block.

This obviously requires you to have access to the application in question, however it is the most foolproof way (not completely, but "good enough" in my opinion.)

筑梦 2024-07-11 01:52:20

Javascript 是另一个有用的工具,可以防止(或至少延迟)屏幕抓取。 大多数自动抓取工具没有 Javascript 解释器,因此您可以执行设置隐藏字段等操作。

编辑:类似于 这篇 Phil Haack 文章

Javascript is another helpful tool to prevent (or at least delay) screen scraping. Most automated scraping tools don't have a Javascript interpreter, so you can do things like setting hidden fields, etc.

Edit: Something along the lines of this Phil Haack article.

内心激荡 2024-07-11 01:52:20

我猜你是想防止屏幕刮擦?

老实说,这是一个很难解决的问题,尝试通过检查 HTTP_REFERER 的值来修复只是一块橡皮膏。 任何愿意自动化提交的人都会足够精明,从他们的“自动机”发送正确的引荐来源网址。

您可以尝试限制速率,但无需实际修改应用程序来强制进行某种“这是人类”验证(验证码),那么您会发现这种情况很难阻止。

I'm guessing you're trying to prevent screen scraping?

In my honest opinion it's a tough one to solve and trying to fix by checking the value of HTTP_REFERER is just a sticking plaster. Anyone going to the bother of automating submissions is going to be savvy enough to send the correct referer from their 'automaton'.

You could try rate limiting but without actually modifying the app to force some kind of is-this-a-human validation (a CAPTCHA) at some point then you're going to find this hard to prevent.

谈情不如逗狗 2024-07-11 01:52:20

如果您尝试阻止搜索引擎机器人访问某些页面,请确保您使用的是格式正确的 robots.txt 文件。

使用 HTTP_REFERER 是不可靠的,因为它很容易伪造

另一种选择是检查已知机器人的用户代理字符串(这可能需要修改代码)。

If you're trying to prevent search engine bots from accessing certain pages, make sure you're using a properly formatted robots.txt file.

Using HTTP_REFERER is unreliable because it is easily faked.

Another option is to check the user agent string for known bots (this may require code modification).

俯瞰星空 2024-07-11 01:52:20

为了让事情更清楚一点:

  1. 是的,我知道使用 HTTP_REFERER 是完全不可靠的,而且有点幼稚,但我很确定那些学习(也许是从我那里?)使用 Excel VBA 进行自动化的人不会知道如何在时间跨度内颠覆 HTTP_REFERER 以获得最终解决方案。

  2. 我没有修改应用程序代码的访问/权限。 政治。 你相信吗? 因此,我必须等到权利人做出我所要求的更改。

  3. 根据之前的经验,我知道所请求的更改需要两个月的时间才能投入生产。 不,将敏捷方法论书籍扔给他们并没有带来任何改善。

  4. 这是一个 Intranet 应用程序。 所以我没有很多年轻人试图破坏我的威望。 但我还太年轻,以至于试图破坏“来自印度的非常高级的全球咨询服务”的声誉,但奇怪的是,那里没有一个印度人在那里工作。

到目前为止,最好的答案来自“Michel de Mare”:根据IP阻止用户。 嗯,我昨天就这么做了。 今天我想做一些更通用的东西,因为我有很多袋鼠用户(从一个 IP 地址跳转到另一个 IP 地址),因为他们使用 VPN 或 DHCP。

To make the things a little more clear:

  1. Yes, I know that using HTTP_REFERER is completely unreliable and somewhat childish but I'm pretty sure that the people that learned (from me maybe?) to make automations with Excel VBA will not know how to subvert a HTTP_REFERER within the time span to have the final solution.

  2. I don't have access/privilege to modify the application code. Politics. Do you believe that? So, I must to wait until the rights holder make the changes I requested.

  3. From previous experiences, I know that the requested changes will take two month to get in Production. No, tossing them Agile Methodologies Books in their heads didn't improve anything.

  4. This is an intranet app. So I don't have a lot of youngsters trying to undermine my prestige. But I'm young enough as to try to undermine the prestige of "a very fancy global consultancy services that comes from India" but where, curiously, there are not a single indian working there.

So far, the best answer comes from "Michel de Mare": block users based on their IPs. Well, that I did yesterday. Today I wanted to make something more generic because I have a lot of kangaroo users (jumping from an Ip address to another) because they use VPN or DHCP.

青春如此纠结 2024-07-11 01:52:20

您也许可以使用反 CSRF 令牌来实现您想要的目标。

本文更详细地解释了这一点:跨站请求伪造

You might be able to use an anti-CSRF token to achieve what you're after.

This article explains it in more detail: Cross-Site Request Forgeries

很快妥协 2024-07-11 01:52:19

我不确定是否可以一次性解决这个问题,但我们可以根据需要来回处理。

首先,我想重复一下我认为你在说什么,并确保我说清楚了。 您想要禁止对 servlet1 和 servlet2 的请求,因为该请求没有正确的引用者并且它是否有查询字符串? 我不确定我是否理解 (servlet1|servlet2)/.+\?.+ 因为看起来您需要 servlet1 和 2 下的文件。我想您可能正在将 PATH_INFO (在“?”之前)与 GET 结合起来查询字符串(在“?”之后)。 看来 PATH_INFO 部分可以工作,但 GET 查询测试不会。 我使用 script1.cgi 和 script2.cgi 在我的服务器上进行了快速测试,以下规则可以完成您所要求的任务。 显然,它们经过了一些编辑以匹配我的环境:

RewriteCond %{HTTP_REFERER} !^http://(www.)?example.(com|org) [NC]
RewriteCond %{QUERY_STRING} ^.+$
RewriteRule ^(script1|script2)\.cgi - [F]

上面捕获了所有尝试使用查询字符串提交数据的对 script1.cgi 和 script2.cgi 的错误引用请求。 但是,您也可以使用 path_info 并通过发布数据来提交数据。 我使用此表单来防止使用不正确的引用者的三种方法中的任何一种:

RewriteCond %{HTTP_REFERER} !^http://(www.)?example.(com|org) [NC]
RewriteCond %{QUERY_STRING} ^.+$ [OR]
RewriteCond %{REQUEST_METHOD} ^POST$ [OR]
RewriteCond %{PATH_INFO} ^.+$
RewriteRule ^(script1|script2)\.cgi - [F]

根据您试图开始工作的示例,我认为这就是您想要的:

RewriteCond %{HTTP_REFERER} !^http://mywebaddress(.cl)?/.* [NC]
RewriteCond %{QUERY_STRING} ^.+$ [OR]
RewriteCond %{REQUEST_METHOD} ^POST$ [OR]
RewriteCond %{PATH_INFO} ^.+$
RewriteRule (servlet1|servlet2)\b - [F]

希望这至少能让您更接近您的目标。 请让我们知道它是如何工作的,我对你的问题很感兴趣。

(顺便说一句,我同意引用阻塞的安全性很差,但我也理解有时可靠性会迫使解决方案不完美和部分解决方案,您似乎已经承认了这一点。)

I'm not sure if I can solve this in one go, but we can go back and forth as necessary.

First, I want to repeat what I think you are saying and make sure I'm clear. You want to disallow requests to servlet1 and servlet2 is the request doesn't have the proper referer and it does have a query string? I'm not sure I understand (servlet1|servlet2)/.+\?.+ because it looks like you are requiring a file under servlet1 and 2. I think maybe you are combining PATH_INFO (before the "?") with a GET query string (after the "?"). It appears that the PATH_INFO part will work but the GET query test will not. I made a quick test on my server using script1.cgi and script2.cgi and the following rules worked to accomplish what you are asking for. They are obviously edited a little to match my environment:

RewriteCond %{HTTP_REFERER} !^http://(www.)?example.(com|org) [NC]
RewriteCond %{QUERY_STRING} ^.+$
RewriteRule ^(script1|script2)\.cgi - [F]

The above caught all wrong-referer requests to script1.cgi and script2.cgi that tried to submit data using a query string. However, you can also submit data using a path_info and by posting data. I used this form to protect against any of the three methods being used with incorrect referer:

RewriteCond %{HTTP_REFERER} !^http://(www.)?example.(com|org) [NC]
RewriteCond %{QUERY_STRING} ^.+$ [OR]
RewriteCond %{REQUEST_METHOD} ^POST$ [OR]
RewriteCond %{PATH_INFO} ^.+$
RewriteRule ^(script1|script2)\.cgi - [F]

Based on the example you were trying to get working, I think this is what you want:

RewriteCond %{HTTP_REFERER} !^http://mywebaddress(.cl)?/.* [NC]
RewriteCond %{QUERY_STRING} ^.+$ [OR]
RewriteCond %{REQUEST_METHOD} ^POST$ [OR]
RewriteCond %{PATH_INFO} ^.+$
RewriteRule (servlet1|servlet2)\b - [F]

Hopefully this at least gets you closer to your goal. Please let us know how it works, I'm interested in your problem.

(BTW, I agree that referer blocking is poor security, but I also understand that relaity forces imperfect and partial solutions sometimes, which you seem to already acknowledge.)

守不住的情 2024-07-11 01:52:19

我没有解决方案,但我敢打赌,依赖引荐来源网址永远不会起作用,因为用户代理可以完全不发送它或将其欺骗到能让他们进入的东西。

I don't have a solution, but I'm betting that relying on the referrer will never work because user-agents are free to not send it at all or spoof it to something that will let them in.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文