我正在编写一个应用程序,它将存储大量图像(可能还有视频)文件。上传后,它们将立即推送到某个云服务 CDN,以实际向公众提供服务。这个想法是将图像存储在可靠的、可备份的存储中。我预计会有 200,000 个对象(每个对象最多 10KB)以及可能更少的几 MB 视频文件。
默认情况下,我会使用 文档建议 的 Postgres好的。
- 这是一个明智的想法吗?
- 它会让备份数据库成为一场彻底的噩梦吗?经验?
- 有可靠性问题吗?
- 这会影响数据库其他部分的性能吗?请记住,每个图像的数据库只会被命中一次或两次。
I'm writing an app which will store a large number of image (and possibly video) files. After they're uploaded they will be immediately pushed out to some cloud serving CDN for actual serving to the public. The idea is to have the images stored in a reliable, back-uppable store. I would anticipate of the order of 200,000 objects of up to 10KB each and possibly fewer video files of a few MB.
By default I would go to Postgres which the documentation suggests would be ok.
- Is this is a sensible idea?
- Will it make backing up the database a complete nightmare. Experiences?
- Any reliability issues?
- Will this affect the performance for other parts of the db? Bear in mind that the db will only be hit once or twice for each image.
发布评论
评论(3)
我有在 Oracle 和 MySQL 中以这种方式将图像存储在数据库中的经验。性能和可靠性不是问题。备份是。您的备份将会变得非常大。由于备份既耗时又昂贵,因此节省空间可能是个好主意。如果这意味着您只需从数据库中删除图像就可以将数据库缩小 80%,那么将它们存储在其他地方可能是个好主意。备份单独的文件更加高效,因为您可以轻松创建仅包含新映像和修改后的映像的增量备份。
I've got experience with storing images in a database this way in Oracle and MySQL. Performance and reliability are not an issue. Backing up is. Your backup will get very large. Since backing up is time consuming and expensive, it might be a good idea to save space. If that means you can shrink your database by 80% by just removing the images from the database, it might be a good idea to store them elsewhere. Backing up separate files is more efficient, because you can easily create incremental backups containing only new and modified images.
我有使用 PostgreSQL 的经验,将图像存储为 ByteA(类似 BLOB 的数据类型),这是一次很好的体验,并将图像存储在“双重解决方案"(文件系统中的图像,MySQL 和 PostgreSQL 等数据库中的元数据),我不推荐。
有 3 个方面或架构考虑因素可以帮助我们做出决定:
我建议:
在表中存储为blob(带有间接存储的二进制大型对象):用于原始映像存储,但单独备份。请参阅伊万的回答,PostgreSQL 额外提供的模块、操作方法等
存储为bytea(或blob),在一个单独的数据库中(使用DBlink):用于原始图像存储,位于另一个(统一)数据库。在这种情况下,我更喜欢 bytea,但 blob 几乎相同。分离数据库是“统一图像网络服务”的最佳方式。
在表中存储为bytea(直接存储的字节数组):用于缓存处理后的图像(通常是缩略图)。缓存小图像以将其快速发送到网络浏览器(避免渲染问题)并减少服务器处理。还缓存必要的元数据,例如宽度和高度。数据库缓存是最简单的方法,但请检查您的需求和服务器配置(例如 Apache 模块):PostgreSQL 二进制数据类型手册,使用 bytea 列进行测试等
I have experiences with PostgreSQL, storing images as ByteA (a BLOB-like datatype), a good experience, and storing images in "dual solution" (images at filesystem, metadata at databases like MySQL and PostgreSQL), that I not recommend.
There are 3 aspects, or architecture considerations, that can help us in our decision:
I recommend:
to store as blob (Binary Large OBject with indirect store) at your table: for original image store, but separated backup. See Ivan's answer, PostgreSQL additional supplied modules, How-tos etc.
to store as bytea (or blob), at a separated database (with DBlink): for original image store, at another (unified) database. In this case, I preffer bytea, but blob is near the same. Separating database is the best way for a "unified image webservice".
to store as bytea (BYTE Array with direct store) at your table: for caching processed images (typically thumbnails). Cache the little images to send it fast to the web-browser (avoiding renderization problems) and reduce server processing. Cache also the essential metadata, like width and height. Database caching is the easiest way, but check your needs and server configs (ex. Apache modules): store thumbnails at file system may be better, compare performances. Remember that it is a (unified) web-service, then can be stored at a separete database with no backups, serving many tables. See also PostgreSQL binary data types manual, tests with bytea column, etc.
我的经验仅限于 SQL Server,但我的数据库中有数百万个大于 10KB 的 PDF 文件,该数据库的性能仍然很好。当然需要索引。对于如此大量的数据,完整数据库备份所需的时间不会比预期的时间长。再次强调,这是针对 MS-SQL 服务器的!
My experience is limited to SQL server, but I have several million PDF-files that are larger than 10KB in a database, which is still performing quite nicely. Of course indexes are required. Full database backup takes no longer than expected with such an amount of data. Again, this is for MS-SQL server!