实现 PHP 文件上传器。一些基本的技术问题
我目前正在开发一个文件上传器(类似于rapidshare),但规模非常小。
但是,我关心的一件事是如何组织文件?如果我将所有文件放在一个上传的目录中,很快(大约一个月内)该目录中的文件数量将达到一百万。这会减慢我的系统速度吗?文件访问和查找会花费更多时间吗?我该如何解决这个问题?
此外,我希望实现多服务器上传。这意味着管理员可以选择多个可以上传文件的服务器。这将如何运作?用户是否会上传到我的服务器,而我的服务器会立即通过 FTP 或其他某种机制上传到其他服务器?
用户应该无法通过下载管理器下载该文件。此外,普通用户不应支持恢复功能。我如何实现它。我可以为下载文件的用户提供直接文件位置访问吗?或者我是否必须使用脚本和fopen、fread 和 print 来“提供”文件?
感谢您的所有帮助,我非常感谢您的回答。
I am currently working on a file uploader (something like rapidshare) but on a very small scale.
However, one thing i am concerned about is how do i organize files ? If i place all files in a single uploaded directory, pretty soon (in about a month) the number of files in that directory will reach a million. Will this slow anything down on my system ? Will file accesses and lookups take more time ? How do i counter this problem ?
Also, i am looking to implement a multi-server upload. Meaning the admin can choose multiple servers where the file can be uploaded. How will this work ? Will the user upload to my server and my server will instantaneously upload via FTP or some other mechanism to the other server ?
The users should not be able to download the file via a download manager. Also, resume functionality should not be supported for normal users. How do i implement that. Can i provide a direct file location access to the user downloading the file ? Or will i have to "serve" the file using a script and fopen, fread and print ?
Thanks for all your help i really appreciate any answers.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
老实说,您似乎缺少一些实施所描述的系统所必需的主要经验。此外,“规模非常小”绝对与不到一个月内达到一百万个文件的情况相矛盾。
我会尽力回答你的问题。
组织文件的关键在于给它们提供合理的名称。如果您让用户选择文件名,请注意您是否正确过滤它以阻止基于文件名的攻击,例如“../../../etc/passwd”(您应该理解这一点。)。我建议您使用哈希值作为文件名。此外,您可以为它们分配公共“文件名”(实际上是通过数据库的别名)。上传后计算文件的哈希值。如果文件数量增加,您可以将它们存储在以哈希值的前 2 个字符命名的目录中。这就是 Git VCS 所做的事情,我真的很喜欢这一点。
你这句话到底是什么意思?如果您计划拥有 1 个上传服务器并将上传的文件镜像到其他服务器,您可以轻松分离这些进程。创建一个简单的上传页面并编写另一个镜像脚本,通过 FTP 将文件发送到其他服务器。否则,如果您要创建称为集群的东西(用于相同目的的多个 Web 服务器执行负载平衡并提供高可用性),那么对于如何执行此操作没有简短的答案。许多智者赚了很多钱,因为他们拥有实施此类系统所需的经验和技能。如果你足够热衷于自己做这件事,你应该去读一些相关的书籍。
我不想质疑您的动机,但您为什么要阻止使用下载管理器?这些对于恢复中止的下载非常有帮助,从而有助于降低服务器的流量。它可以节省您的流量、带宽、能源成本和 CPU 时间。这对你来说太糟糕了吗?从技术上讲,您需要配置 HTTP 服务器(例如 Apache)以禁用恢复。我不知道什么是合适的选择,但我认为有一个。或者,您可以通过 PHP 脚本提供文件,而不是直接链接到文件。该脚本通过 URL 参数获取文件的 ID,并将文件的内容(在这种情况下不得驻留在 WWW 根目录中)发送回客户端。这样,是否实施简历是您自己的责任,因此您可以轻松“禁用”它。如果您实际上要避免多次下载,我建议您使用复杂的 ID,例如哈希值(没有人能猜出下载文件的链接),并实现一些脚本,在完成下载后删除文件。正如我所说,禁用下载管理器会损害您和您的用户。
我希望这有助于您对您的想法的复杂性有一个大致的了解。
To be honest it looks like you're missing some major experience almost necessary to implement a system like to describe. Additionally "on a very small scale" definitely contradicts with a million files in less than a month.
I'll try to give answers to your questions.
Organizing files is much about giving them reasonable names. If you let the user choose the filename, watch out that you filter that correctly to block attacks based on filenames like "../../../etc/passwd" (You SHOULD understand that.). I recommend you using hashes as filenames. Additionally you can assign them public "filenames" (actually aliases via a database). After uploading calculate a hash of the file. If the number of files increases you can store them in directories named after the first 2 chars of the hash. This is what Git VCS does and i really like that.
What exactly do you mean by that? If you plan to have 1 single upload server and mirror the uploaded files to other servers you can easily separate these processes. Create an easy upload page and write another mirror script which sends the files e.g. via FTP to the other servers. Otherwise if you are about to create something called a cluster (several web servers for the same purpose performing load balancing and providing high availability), then there's no short answer on how to do this. Many wise men earn much money because they have the necessary experience and skills to implement such systems. If you are keen enough to do this on your own, you should go read some books about that.
I don't want to question your motivation but why do you want to prevent the usage of download managers? These are very helpful to resume aborted downloads and thus help to lower the traffic of your server. It saves you traffic, bandwith, energy costs and CPU time. Is that too bad for you? More technically you need to configure your HTTP server, like e.g. Apache, to disable resume. I have no clue what the appropriate option is but I reckon there is one. Alternatively you can provide the files via a PHP script instead of linking to the file directly. The script gets the ID of the file via URL parameter and sends the content of the file (which must not reside in WWW root in this case) back to the client. This way it is your own responsibility to implement resumes or not and thus you can "disable" it easily. If you're actually about to avoid multiple downloads I would instead recommend using complex IDs like hashes (nobody can guess the link to download the file) and implementing some script which deletes the file after complete download. As I said disabling download managers harms you and your users.
I hope this is helpful to get a general image of the complexity of your idea.