自动生成文件名,无冲突
我正在编写一个“文件共享托管”,我想在上传到唯一名称时重命名所有文件,并以某种方式跟踪数据库上的名称。 由于我不希望两个或多个文件具有相同的名称(这肯定是不可能的),因此我正在寻找一种基于密钥或其他东西为我生成随机名称的算法。
此外,我不想生成名称并搜索数据库以查看该文件是否已存在。 我想确保 100% 或 99% 生成的文件名从未由我的应用程序之前创建过。
知道如何编写这样的应用程序吗?
I'm writing a "file sharing hosting" and I want to rename all the files when uploading to a unique name and somehow keep track of the names on the database. Since I don't want two or more files having same name (which is surely impossible), I'm looking for an algorithm which based on key or something generates random names for me.
Moreover, I don't want to generate a name and search the database to see if the file already exists. I want to make sure 100% or 99% that the generated filename has never been created earlier by my application.
Any idea how I can write such application?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
您可以根据文件内容本身生成哈希值。 这样做有两个很好的理由:
允许您永远不会存储同一文件两次 - 例如,如果您有一个音乐文件的两个副本,且内容相同,您可以检查是否已存储该文件文件,然后只存储一次。
您将元数据(文件名只是元数据)与 blob 分开。 因此,您将拥有一个通过文件内容的哈希进行索引的存储系统,然后将文件元数据与该哈希查找代码相关联。
找到两个计算相同散列的文件的风险实际上并不相同,具体取决于散列的大小,并且您可以通过将文件散列成块来有效地缓解这种风险(这可能会导致一些有趣的存储优化场景:P)。
You could produce a hash based on the file contents itself. There are two good reasons to do this:
Allows you to never store the same file twice - for example, if you have two copies of a music file which are identical in content you could check to see if you have already stored that file, and just store it once.
You separate meta-data (file name is just meta data) from the blob. So you would have a storage system which is indexed by the hash of the file contents, and you then associate the file meta-data with that hash lookup code.
The risk of finding two files that compute the same hash that aren't indeed the same contents, depending on the size of the hash would be low, and you can effectively mitigate that by perhaps hashing the file in chunks (which could then lead to some interesting storage optimisation scenarios :P).
GUID 是一种方法。 基本上可以保证不会出现任何重复(如果您有适当的随机生成器)。
GUIDs are one way. You're basically guaranteed to not get any repeats (if you have a proper random generator).
您还可以附加自纪元以来的时间。
You could also append with the time since epoch.
最佳解决方案已经提到过。 我只是想补充一些想法。
最简单的解决方案是在每个新文件上都有一个计数器和增量。 只要只有一个线程创建新文件,这种方法就非常有效。 如果多个线程、进程甚至系统添加新文件,事情就会变得更加复杂。 您必须使用锁定或任何类似的同步方法来协调新 ID 的创建。 您还可以为每个进程分配 ID 范围以减少同步工作,或通过唯一的进程 ID 扩展文件 ID。
更好的解决方案可能是在这种情况下使用 GUID,并且不必关心进程之间的同步。
最后,您可以为每个标识符添加一些随机数据,以使它们更难以猜测这是否是一个要求。
此外,常见的是将文件存储在目录结构中,其中文件的位置取决于其名称。 文件 abcdef1234.xyz 可能存储为 /ab/cd/ef/1234.xyz。 这避免了包含大量文件的目录。 我不太清楚为什么这样做——可能是文件系统限制、性能问题——但这很常见。 我不知道如果文件直接存储在数据库中,类似的事情是否常见。
The best solution have already been mentioned. I just want to add some thoughts.
The simplest solution is to have a counter and increment on every new file. This works quite well as long as only one thread creates new files. If multiple threads, processes or even systems add new files, things get a bit more complicated. You must coordinate the creation of new ids with locking or any similar synchronisation method. You could also assign id ranges to every proceses to reduce the synchronisation work, or extend the file id by a unique process id.
A better solution might be to use GUIDs in this scenario and do not have to care about synchronisation between processes.
Finally, you can at some random data to every identifier to make them harder to guess if this is a requirement.
Also coommon is storing files in a directory structure where the location of a file depends on its name. File abcdef1234.xyz might be stored as /ab/cd/ef/1234.xyz. This avoids directories with a huge number of files. I am not really aware why this is done - may be file system limitations, performance issues - but it is quite common. I do not know if similar things are common if the files are stored directly in the database.
最好的方法是简单地使用计数器。 第一个文件是 1,下一个是 2,另一个是 3,依此类推...
但是,看来您想要随机。 要快速执行此操作,您可以确保随机数大于最后创建的文件。 您可以缓存最后一个文件,然后将随机数与其姓氏相抵消。
The best way is to simply use a counter. The first file is 1, the next is 2, another is 3, and so on...
But, it seems you want random. To quickly do this, you could make sure that your random number is greater than the last file created. You can cache the last file and then just offset your random number with its last name.