面向未来的文件存储
我接受用户上传的文件。 每个文件在数据库中都有一个指针,其中包含有关文件系统中文件位置的信息。 目前,我将文件非明确地存储在文件系统中,并且每个文件当前仅命名为唯一值。 所有分类和命名等都是在应用程序中使用数据库完成的。
我担心的一个因素是文件同步问题。 如果我想设置文件系统同步,例如通过与 PC 应用程序桥接来自动更新用户的文件,该系统是否仍然可以正常工作? 我不知道这样的系统如何工作,所以希望我能得到一些意见。
基本上,纯粹在数据库中表示文件的名称和位置是否是最佳的,特别是如果所述文件可以与 PC 应用程序同步?
I accept file uploads from users. Each file has a pointer in the db which has info on the file location in the filesystem.
Currently, I'm storing the files in the filesystem non categorically, and each file is currently just named a unique value. All categorisation and naming etc is done in the app using the db.
A factor that I'm concerned about is that of file synchronization issues.
If I wanted to set up file system synchronization where, for example, the user's files are automatically updated by bridging with a pc app, would this system still work well?
I have no idea how such a system would work so hopefully I can get some input.
Basically, is representing a file's name and location purely in the database optimal, especially if said file may be synchronized with a pc application?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
我们正是使用这个模型进行文件存储,并使用(无耻的插件)SabreDAV 来实现它对于最终用户来说,这是一个普通的文件系统。
我认为这是一个非常好的模型,只要查找文件已记录并且可以轻松检索,就不会有问题。 只需备份您的数据库即可:)
我可以给出的另一条建议是,我们在文件 ID 上使用 md5() 来生成唯一的文件名。 我们使用部分文件来生成目录结构,例如.. id 1 将产生:b026324c6904b2a9cb4b88d6d61c81d1,生成的文件名将变为:
b02/632/4c6/904b2a9cb4b88d6d61c81d1 这样做的原因是大多数稳定的文件系统在运行后会变得非常慢一个目录中存在大量文件(或目录)。 遍历几个子目录也快得多。
We use exactly this model for file storage, along with (shameless plug) SabreDAV to make it seem to the end-user it's a normal filesystem.
I think this is a perfectly fine model, as long as looking up the file is documented and easily retrieved there shouldn't be an issue. Just make backups of your DB :)
One other advice I can give, we use an md5() on the file-id to generate a unique filename. We use parts of the files to generate a directory structure, for example.. id 1 will yield: b026324c6904b2a9cb4b88d6d61c81d1, the resulting filename will become:
b02/632/4c6/904b2a9cb4b88d6d61c81d1 The reason for this is that most stable filesystems can become very slow after a high number of files (or directories) in one directory. It's much, much faster too traverse a few sub-directories.
无聊的答案™:
我认为这取决于您想做什么,一如既往:)
我的意思是采用您的常规网络托管公司。 开发人员一直在将文件同步到网络服务器。 对于 Web 服务器来说,将哈希生成的文件名存储在指向物理文件的数据库中是否有意义? 不。然后您就无法使用 FTP 客户端登录并上传类似的文件,并且您必须编写自定义模块才能使 Apache 工作等。立即令人头痛。
Flickr 使用数据库有意义吗? 是的,一点没错! (话又说回来,您无法使用 FTP 客户端登录并管理您的照片,这可能是一件好事!)
请记住,文件系统也是一个(非常简单的)数据库。 它是一个附带许多有用的免费工具的数据库。
我的2美分
The Boring Answer™:
I think it depends on what you wanna do, as always :)
I mean take your regular web hosting company. Developers are synching files to web servers all the time. Would it make sense for a web server to store hash-generated file names in a db that pointed to physical files? No. Then you couldn't log in with your FTP-client and upload files like that, and you'd have to code a custom module to get Apache to work etc. Instant headache.
Does it make sense for Flickr to use a db? Yes, absolutely! (Then again, you can't log in with an FTP-client and manage your photos—and that's probably a good thing!)
Just remember, a file system is a (very simple) db too. And it's a db that comes with a lot of useful free tools.
my 2¢
是的,您这样做的方式是最好的方式。 您正在使用文件系统来存储文件,并使用数据库来存储结构化数据。
我提出的一个建议是在文件系统上创建一棵目录树。 有一天,您可能会遇到文件系统的每个目录的最大文件数限制。 我构建了每天或每周创建一个新子目录的系统。
确保您拥有数据库和文档存储库的良好备份。
Yes, the way you are doing this is the best way to do it. You are using a file system to store files and a database to sore structured data.
One suggestion I would make is that you create a directory tree on the file system. You may one day run up against a maximum files per directory limitation of your file system. I have built systems that create a new sub directory for each day or week.
Make sure you have good backups of the database as well as the document repository.
要使这样的系统正常工作,您所需要做的就是确保您使用(或者更有可能创建)的 API 可以以合理的方式与数据库和文件系统进行通信。 由于这就是您的网站已经在做的事情,因此实施起来应该不难。
您的文件被赋予标识符而不是简单的英文名称,这一事实与远程同步几乎无关。
All you need to make such a system work is to make sure the API you use (or, more likely, create) can talk to the database and to the filesystem in a sensible way. Since this is what your site is already doing anyway, it shoudn't be hard to implement.
The mere fact that your files are given identifiers instead of plain-English names is mostly irrelevant with regard to remote synchronization.
将文件哈希存储在数据库中而不是路径中(即SHA1)并拥有单独的数据库将哈希值与路径连接起来。 编写一个小应用程序来同步哈希数据库,以便当您将文件移动到不同位置时,可以轻松构建具有更新路径的新数据库。
这样,您还可以让系统从不同位置加载文件,具体取决于您使用哪个哈希数据库来定位文件,因此如果您需要人们能够从不同位置(即 nfs 或 nfs 或网络达夫)。
Store a file hash in the database rather than a path (i.e. SHA1) and have a separate database connect the hash with the path. Write a small app that will synchronize the hash database so that when you move your files to a different location it'll be easy to build a new database with updated paths.
That way you can also have the system load the file from a different location depending of which hash database you use to locate the file so it offers some transparency if you need people to be able to access the same file from diverse locations (i.e. nfs or webdav).