使用 HTTP_REFERER 阻止用户访问站点内部

发布于 2024-07-04 01:52:19 字数 791 浏览 6 评论 0原文

我可以控制 HttpServer，但不能控制那里的 ApplicationServer 或 Java 应用程序，但我需要阻止对这些应用程序上某些页面的直接访问。准确地说，我不希望用户自动访问向适当的 servlet 发出直接 GET/POST HTTP 请求的表单。

因此，我决定根据 HTTP_REFERER 的值来阻止用户。毕竟，如果用户在网站内部导航，它将有一个适当的 HTTP_REFERER。嗯，我就是这么想的。

我在 .htaccess 文件中实现了一条重写规则，其中表示：

RewriteEngine on 

# Options +FollowSymlinks
RewriteCond %{HTTP_REFERER} !^http://mywebaddress(.cl)?/.* [NC]
RewriteRule (servlet1|servlet2)/.+\?.+ - [F]

我希望禁止访问未导航站点但使用查询字符串向“servlet1”或“servlet2”servlet 直接发出 GET 请求的用户。但我的期望突然结束，因为正则表达式 (servlet1|servlet2)/.+\?.+ 根本不起作用。

当我将该表达式更改为 (servlet1|servlet2)/.+ 时，我真的很失望，它运行得非常好，以至于我的用户无论是否浏览该网站都会被阻止。

所以，我的问题是：如果我没有访问/权限/时间来修改应用程序，我该如何完成不允许“机器人”直接访问某些页面的事情？

原文

I have control over the HttpServer but not over the ApplicationServer or the Java Applications sitting there but I need to block direct access to certain pages on those applications. Precisely, I don't want users automating access to forms issuing direct GET/POST HTTP requests to the appropriate servlet.

So, I decided to block users based on the value of HTTP_REFERER. After all, if the user is navigating inside the site, it will have an appropriate HTTP_REFERER. Well, that was what I thought.

I implemented a rewrite rule in the .htaccess file that says:

RewriteEngine on 

# Options +FollowSymlinks
RewriteCond %{HTTP_REFERER} !^http://mywebaddress(.cl)?/.* [NC]
RewriteRule (servlet1|servlet2)/.+\?.+ - [F]

I expected to forbid access to users that didn't navigate the site but issue direct GET requests to the "servlet1" or "servlet2" servlets using querystrings. But my expectations ended abruptly because the regular expression (servlet1|servlet2)/.+\?.+ didn't worked at all.

I was really disappointed when I changed that expression to (servlet1|servlet2)/.+ and it worked so well that my users were blocked no matter if they navigated the site or not.

So, my question is: How do I can accomplish this thing of not allowing "robots" with direct access to certain pages if I have no access/privileges/time to modify the application?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

弥繁 2024-07-11 01:52:20

您无法通过 http 请求来区分用户和恶意脚本。但您可以分析哪些用户在短时间内请求过多页面，并阻止他们的 IP 地址。

回复收藏 0 原文

标点 2024-07-11 01:52:20

使用引荐来源网址作为验证方法非常不可靠。正如其他人提到的，它很容易被欺骗。最好的解决方案是修改应用程序（如果可以的话）

您可以使用验证码，或者设置某种 cookie 或会话 cookie 来跟踪用户上次访问的页面（会话将更难欺骗）并跟踪页面查看历史记录，并且仅允许浏览过所需页面的用户访问您要阻止的页面。

这显然要求您有权访问相关应用程序，但这是最简单的方法（不完全，但在我看来“足够好”。）

回复收藏 0 原文

筑梦 2024-07-11 01:52:20

Javascript 是另一个有用的工具，可以防止（或至少延迟）屏幕抓取。大多数自动抓取工具没有 Javascript 解释器，因此您可以执行设置隐藏字段等操作。

编辑：类似于这篇 Phil Haack 文章。

回复收藏 0 原文

内心激荡 2024-07-11 01:52:20

我猜你是想防止屏幕刮擦？

老实说，这是一个很难解决的问题，尝试通过检查 HTTP_REFERER 的值来修复只是一块橡皮膏。任何愿意自动化提交的人都会足够精明，从他们的“自动机”发送正确的引荐来源网址。

您可以尝试限制速率，但无需实际修改应用程序来强制进行某种“这是人类”验证（验证码），那么您会发现这种情况很难阻止。

回复收藏 0 原文

谈情不如逗狗 2024-07-11 01:52:20

如果您尝试阻止搜索引擎机器人访问某些页面，请确保您使用的是格式正确的 robots.txt 文件。

使用 HTTP_REFERER 是不可靠的，因为它很容易伪造。

另一种选择是检查已知机器人的用户代理字符串（这可能需要修改代码）。

回复收藏 0 原文

俯瞰星空 2024-07-11 01:52:20

为了让事情更清楚一点：

是的，我知道使用 HTTP_REFERER 是完全不可靠的，而且有点幼稚，但我很确定那些学习（也许是从我那里？）使用 Excel VBA 进行自动化的人不会知道如何在时间跨度内颠覆 HTTP_REFERER 以获得最终解决方案。
我没有修改应用程序代码的访问/权限。政治。你相信吗？因此，我必须等到权利人做出我所要求的更改。
根据之前的经验，我知道所请求的更改需要两个月的时间才能投入生产。不，将敏捷方法论书籍扔给他们并没有带来任何改善。
这是一个 Intranet 应用程序。所以我没有很多年轻人试图破坏我的威望。但我还太年轻，以至于试图破坏“来自印度的非常高级的全球咨询服务”的声誉，但奇怪的是，那里没有一个印度人在那里工作。

到目前为止，最好的答案来自“Michel de Mare”：根据IP阻止用户。嗯，我昨天就这么做了。今天我想做一些更通用的东西，因为我有很多袋鼠用户（从一个 IP 地址跳转到另一个 IP 地址），因为他们使用 VPN 或 DHCP。

回复收藏 0 原文

青春如此纠结 2024-07-11 01:52:20

您也许可以使用反 CSRF 令牌来实现您想要的目标。

本文更详细地解释了这一点：跨站请求伪造

回复收藏 0 原文

很快妥协 2024-07-11 01:52:19

我不确定是否可以一次性解决这个问题，但我们可以根据需要来回处理。

首先，我想重复一下我认为你在说什么，并确保我说清楚了。您想要禁止对 servlet1 和 servlet2 的请求，因为该请求没有正确的引用者并且它是否有查询字符串？我不确定我是否理解 (servlet1|servlet2)/.+\?.+ 因为看起来您需要 servlet1 和 2 下的文件。我想您可能正在将 PATH_INFO （在“？”之前）与 GET 结合起来查询字符串（在“？”之后）。看来 PATH_INFO 部分可以工作，但 GET 查询测试不会。我使用 script1.cgi 和 script2.cgi 在我的服务器上进行了快速测试，以下规则可以完成您所要求的任务。显然，它们经过了一些编辑以匹配我的环境：

RewriteCond %{HTTP_REFERER} !^http://(www.)?example.(com|org) [NC]
RewriteCond %{QUERY_STRING} ^.+$
RewriteRule ^(script1|script2)\.cgi - [F]

上面捕获了所有尝试使用查询字符串提交数据的对 script1.cgi 和 script2.cgi 的错误引用请求。但是，您也可以使用 path_info 并通过发布数据来提交数据。我使用此表单来防止使用不正确的引用者的三种方法中的任何一种：

RewriteCond %{HTTP_REFERER} !^http://(www.)?example.(com|org) [NC]
RewriteCond %{QUERY_STRING} ^.+$ [OR]
RewriteCond %{REQUEST_METHOD} ^POST$ [OR]
RewriteCond %{PATH_INFO} ^.+$
RewriteRule ^(script1|script2)\.cgi - [F]

根据您试图开始工作的示例，我认为这就是您想要的：

RewriteCond %{HTTP_REFERER} !^http://mywebaddress(.cl)?/.* [NC]
RewriteCond %{QUERY_STRING} ^.+$ [OR]
RewriteCond %{REQUEST_METHOD} ^POST$ [OR]
RewriteCond %{PATH_INFO} ^.+$
RewriteRule (servlet1|servlet2)\b - [F]

希望这至少能让您更接近您的目标。请让我们知道它是如何工作的，我对你的问题很感兴趣。

（顺便说一句，我同意引用阻塞的安全性很差，但我也理解有时可靠性会迫使解决方案不完美和部分解决方案，您似乎已经承认了这一点。）

I'm not sure if I can solve this in one go, but we can go back and forth as necessary.

First, I want to repeat what I think you are saying and make sure I'm clear. You want to disallow requests to servlet1 and servlet2 is the request doesn't have the proper referer and it does have a query string? I'm not sure I understand (servlet1|servlet2)/.+\?.+ because it looks like you are requiring a file under servlet1 and 2. I think maybe you are combining PATH_INFO (before the "?") with a GET query string (after the "?"). It appears that the PATH_INFO part will work but the GET query test will not. I made a quick test on my server using script1.cgi and script2.cgi and the following rules worked to accomplish what you are asking for. They are obviously edited a little to match my environment:

RewriteCond %{HTTP_REFERER} !^http://(www.)?example.(com|org) [NC]
RewriteCond %{QUERY_STRING} ^.+$
RewriteRule ^(script1|script2)\.cgi - [F]

The above caught all wrong-referer requests to script1.cgi and script2.cgi that tried to submit data using a query string. However, you can also submit data using a path_info and by posting data. I used this form to protect against any of the three methods being used with incorrect referer:

RewriteCond %{HTTP_REFERER} !^http://(www.)?example.(com|org) [NC]
RewriteCond %{QUERY_STRING} ^.+$ [OR]
RewriteCond %{REQUEST_METHOD} ^POST$ [OR]
RewriteCond %{PATH_INFO} ^.+$
RewriteRule ^(script1|script2)\.cgi - [F]

Based on the example you were trying to get working, I think this is what you want:

RewriteCond %{HTTP_REFERER} !^http://mywebaddress(.cl)?/.* [NC]
RewriteCond %{QUERY_STRING} ^.+$ [OR]
RewriteCond %{REQUEST_METHOD} ^POST$ [OR]
RewriteCond %{PATH_INFO} ^.+$
RewriteRule (servlet1|servlet2)\b - [F]

Hopefully this at least gets you closer to your goal. Please let us know how it works, I'm interested in your problem.

(BTW, I agree that referer blocking is poor security, but I also understand that relaity forces imperfect and partial solutions sometimes, which you seem to already acknowledge.)

回复收藏 0 原文