PHP 文件存在检查与 MySQL SELECT 服务器负载
我正在我的一个项目中实现 IP 禁止功能。 首先,我想避免使用 .htaccess 来达到此目的,因为 CMS 可能会在修改时重置它,所以我必须使用 PHP-send-header-and-die 解决方案。 显然每个 HTTP 请求都会被检查。
考虑到一个高流量的网站,我有两种解决方案来存储 ip 禁止信息:
1 - 在目录中,比方说 /bans/,我可以创建 N 个文件,其中 N = 被禁止的 ip 数量,因此:
/bans/23.23.23.23。 ban
将禁止 23.23.23.23,在这种情况下,我在脚本中所要做的就是检查 file_exists,例如:
<?php
if( file_exists("bans/".$_SERVER['REMOTE_ADDR'].".ban"){
header("HTTP/1.0 403 Forbidden");
die();
}
else{
// Continue surfing ....
}
?>
2 - 使用 MySQL 表,假设 cms_bans,并执行 SELECT每个 HTTP 请求都会检查该 ip 是否在禁止列表中。
考虑这两种解决方案,假设禁用 MySQL 查询缓存,哪一种解决方案的过载影响较小(文件系统与 mysql :D)?
请只提供有动机的答案,而不仅仅是个人喜好。
谢谢
i'm implementing in a project of mine an ip banning functionality.
First of all, i'd like to avoid .htaccess for this purpose 'cause the CMS probably would reset it upon modifications, so i have to use a PHP-send-header-and-die solution.
Obviously every HTTP request will be checked.
Considering an highly trafficated site, i have two solutions to store ip ban infos :
1 - In a directory, let's say /bans/, i can create N files where N = number of banned ips, so :
/bans/23.23.23.23.ban
would ban 23.23.23.23, in this case all i have to do from my script is to check with file_exists, for instance :
<?php
if( file_exists("bans/".$_SERVER['REMOTE_ADDR'].".ban"){
header("HTTP/1.0 403 Forbidden");
die();
}
else{
// Continue surfing ....
}
?>
2 - Use a MySQL table, let's say cms_bans, and execute a SELECT for every HTTP request to check if the ip is in the ban list.
Considering those 2 solutions, which one has less overload impact (filesystem vs mysql :D), assuming MySQL query caching is disabled ?
Please only motivated answers, not just personal preferences.
Thanks
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
将 MySQL 表与 MEMORY 引擎结合使用。每隔一段时间将其转储到另一个永久表,以在服务器重新启动后保持 IP 的存储和持久性。
Use a MySQL table with the MEMORY engine. Dump it every once in a while to another, permanent table to keep the IPs stored and persistent after a server restart.
我倾向于认为 file_exists() 的开销较小,因为不需要建立远程连接,并且它可以由 PHP 缓存。但是,如果您有大量的禁令,并且已经为应用程序的其他部分建立了数据库连接,那么 MySQL 解决方案开始看起来更好,并且当然更易于管理。也就是说,我不喜欢这两种方法,并建议通过防火墙/代理/负载平衡器将禁令移至网络层。
或者,如果您没有大量禁令并且它们不经常更改,那么您最好将列表直接存储为 PHP 数组,在代码中包含(),然后使用in_array() 扫描禁令:
I would tend to think the file_exists() has less overhead since there's no remote connection to make and it can be cached by PHP. But if you've got a huge number of bans, and the db connection is already being made for some other part of the app, then the MySQL solution starts to look better and is certainly easier to manage. That said, I'm not a fan of either approach and would suggest moving the bans to the network layer, via your firewall/proxy/load balancer.
Alternatively, if you don't have a large number of bans and they don't change very often, you'll be better off storing the list directly as a PHP array, include()'ing that in your code, and then using in_array() to scan for the bans:
我更喜欢mysql数据库。您可以在表中存储其他数据,例如
banned_by
、banned_on
、bann_reason
和access_count
。在你的代码中,你只假设
你在 ip 列上有一个索引,这应该很快。如果有人触犯了禁令,你会做
I would prefer the mysql database. You can store additional data in the table like
banned_by
,banned_on
,bann_reason
andaccess_count
.In your code you only do a
assuming you have an index on the ip column this should be pretty fast. If someone hit's a ban you do an
为什么要坚持其中一个而不是另一个?
我会建立一个基于 MySQL 的禁止表,我可以轻松扩展它。 MySQL 是一个快速的系统,并且更加灵活,面向未来。
但是,您也可以自己将结果缓存在文件中并从中读取。现在,任何熟悉 MySQL 的人都可以直接添加禁令,他们不必知道您的特殊格式即可直接使用它(在扩展您的禁令系统、使用它等的情况下)。那么,这里的问题是文件权限。因此,您只需添加一些额外的代码即可弥补这一点。
如果他们需要更新缓存,请给予他们这种能力。
Why stick with one over the other?
I would set up a MySQL-based ban table that I can expand easily. MySQL is a fast system, and is a lot more flexible keeping in mind the future.
But, you can also cache the results yourself in a file and read from that. Now, anyone who's familiar with MySQL can add bans directly, they don't have to know your special format in order to work directly with it (in the case of extensions to your ban system, working with it, etc). The question here, then, is file permissions. So, you'd just have to throw in a little extra code to compensate for that.
If they need the cache updated, give them the ability.
你没有说你正在使用什么 CMS,但如果是开源的或你自己的,为什么不修改它以使用 .htaccess 来禁止?
You don't say what CMS you are using, but if is open source or your own why not modify it to use .htaccess for the bans ?
我认为这个问题与性能无关。
与应用程序的其他部分相比,如此简单的键值查找永远不会成为瓶颈。
这是经常犯的错误:人们倾向于优化网站中资源消耗较少的部分,没有任何特殊原因,只是因为他们想到了这一点。
这就是为什么问题应该是有动机的,而不是仅仅出于个人喜好。
I don't think this question is really performance related.
Such a simple key-value lookup will never be a bottleneck compared to the other parts of the application.
That's mistake which being made very often: people tend to optimize less resource-consuming parts of the site, without any particular reason but because it just came to their mind.
That's why question should be motivated, not going out of just personal preferences.
就您最初的问题而言,使用
file_exists()
对于少量禁令(大约 <1000 个禁令)来说速度更快,而对于大量禁令使用 MySQL 则更快。只与数据库建立一次连接,并且只发送回一次答案,因此 MySQL 的“瓶颈”只是在执行查询所需的时间上增加了一组恒定的时间。 MySQL(和其他数据库)软件的扩展性非常好,因为每行都有恒定的字节宽度,因此它只需要检查字节 nRX 到 nRX+Y 是否为 n 的整数倍。在较旧的文件系统中,操作系统不能做出这样的假设,因为文件的长度可以是可变的。因此它会扫描
end_of_file
位。较新的操作系统在分区的开头创建每个文件的数据库(“文件分配表”),并且只需要搜索它即可。问题是-计算机上的文件越多,搜索该表所需的时间就越长。此外,驱动器碎片会使查找文件是否仍然存在变得更加困难。这些小小的减慢并不等于连接到 SQL 数据库所需的时间......对于少量的禁令。更好的解决方案是拥有一个每行包含一个禁令的文本文件。
然后你只需使用strpos($file_contents, $_SERVER["REMOTE_ADDR"]。请注意,PHP 行数越少,它最终运行的速度就越快,因为 PHP 的 C 后端大约是比解释快 100 倍,因此我们可以在两行中使用
get_file_contents()
(将内容转储到 RAM)和strpos()
(在 RAM 中搜索字符串)并它完全由 C 后端处理,它的迭代速度相当快还有更快的方法。
,如果您愿意编写自己的数据库来以数字方式列出禁令(允许二进制搜索),那么 正如一些人已经指出的那样,这不会是您的服务器中出现任何主要瓶颈的地方。优化您网站的“检查禁令”部分将使您的整个网站的速度提高 0.01%。您需要非常小心地优化的是运行超过 100 次的循环、对远程服务器的调用以及返回要解析的数据库的多行的查询。
另外,不要编写函数来执行已经具有内置 PHP 函数的操作。在学习如何使用
preg_replace()
之前,我花了一年时间手动解析包含数百行substr(strpos())
行的字符串As far as your original question, using
file_exists()
is quicker for small numbers of bans (about <1000 bans) whereas using MySQL is quicker for larger numbers. The connection is made to the database only one time and the answer is sent back only one time, so the "bottleneck" of MySQL only adds a set, constant amount of time to the required time to do the query. MySQL (and other database) software then scales extremely well because there is a constant byte-width for each row, so it need only check bytes nRX to nRX+Y for integer multiples of n.In older file systems the Operating System can not make such an assumption since files can be of variable length. Thus it would scan for the
end_of_file
bit. Newer operating systems create a database of every file (The "File Allocation Table") at the beginning of the partition and it need merely search this. The problem is- the more files on the computer, the longer it takes to search this table. Also, fragmentation of the drive can make it harder to find if a file still exists. These little slow downs don't equal the time it takes to connect to an SQL database... for small numbers of bans.What would be an even better solution would be to have a text file containing one ban per line.
Then you just use
strpos($file_contents, $_SERVER["REMOTE_ADDR"]
. Note that the fewer lines of PHP you have, the quicker it will ultimately run since the C back-end of PHP is about 100x quicker than the interpretation. Therefore in two lines we canget_file_contents()
(dump the contents to RAM) andstrpos()
(search for a string within the RAM) and it's entirely handled by the C back-end, which iterates through quite quickly.There are even quicker ways to do it, too, if you're willing to write your own database that keeps the bans listed numerically (allowing binary search).
Although as several people have already stated, this is not where any main bottleneck will occur in your server. Optimizing the "check for ban" portion of your website will net a 0.01% speed increase to your whole site. What you want to be really careful to optimize are loops that run >100 times, calls to remote servers, and queries which return several lines of a database to be parsed.
Also, don't write a function to perform something which already has a built-in PHP function. I spent a year manually parsing string with hundreds of
substr(strpos())
lines before I learned how to usepreg_replace()