是否用 SQL 存储图像?
一般来说,我认为将图像存储在文件系统中并通过数据库条目链接到它总是更好。但是,我正在尝试优化我的数据库设计,但有一些问题。
我的图像都是非常小的黑白缩略图(不是灰度,而是真正的黑白),尺寸为 70x70。如果我们获取图像(基本上是 1 和 0 的二维数组),它可以存储为二进制数据,每个图像大约 600 字节。
所以我的问题是查询存储在数据库中的 600 字节是否比查询链接然后访问文件系统更快?假设正在进行大量“图像”查询。
有人有这方面的经验吗?
如果重要的话,我正在使用 MySQL 和 MonetDB(分别使用,但两者都有相同的问题)。
非常感谢, 布雷特
Generally, I had thought it was always better to store images in the filesystem and link to it via the database entry. However, I am trying to optimize my db design and am having a few questions.
My images are all really small thumbmails in black and white (not greyscale, but true B&W) and are 70x70 in size. If we take the images (which is basically a 2D array of 1 and 0), it can be stored as binary data that would be approximately 600 bytes each.
So my question is whether querying the 600 bytes stored in a db would be faster than querying a link followed by accessing the filesystem; assuming there are a lot of "image" queries being made.
Does anyone have any experience with this area?
If it matters, I am using MySQL, and MonetDB (separately, but have the same question for both).
Many thanks,
Brett
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
如果只有 600 字节,那么我不会太担心,会将它们作为 blob 存储在数据库中
If it's only 600 bytes then I wouldn't worry too much and would store them in the database as a blob
There's an interesting article at High Scalablilty about how Flickr is architected. This might prove to be a useful read for you.
既然您标记了问题 sql-server 那么我建议您阅读 To Blob或 Not To Blob,令人遗憾的吉姆·格雷 (Jim Gray) 撰写的研究论文。本文对在文件系统 (NTFS) 与数据库 (SQL Server) 中存储 BLOB 的主题进行了大量详细介绍,您会惊讶地发现考虑了如此之多的角度。这是必读的。但结论是这样的:
您的情况显然属于“To BLOB”情况。
Since you tagged the question sql-server then I recommend you to read To Blob or Not To Blob, a research paper by the regretted Jim Gray. The paper goes into plenty of detail on the topic of storing BLOBs in the file system (NTFS) vs. database (SQL Server), and you'll be amazed how many angles are considered. It's a MUST read. But the conclusion is this:
Your case falls clearly in the 'To BLOB' case.
据我了解,如果您没有无缘无故地不使用 SELECT * ,那么在数据库中存储更大的文件是没有问题的(坦率地说,从来没有理由使用 SELECT * 根本)。
BLOB 和 TEXT 与其他数据分开存储,如果不显式查询,不会影响性能。
As far as I understand, there's no problem to store even bigger files in the DB, if you're not using
SELECT *
for no reason (frankly speaking, there's never a reason to useSELECT *
at all).BLOBs and TEXTs are stored separately from other data and don't affect performance if not queried explicitly.
如果你谈论的是网络应用程序,将图像存储在数据库中是愚蠢的。因为您没有桌面应用程序可能获得的任何好处,而只有困难。
If you're talking of web application, storing images in the database just stupid. As you have no benefits that desktop application may gain, but only difficulties.
这不仅与文件大小有关,还与数据库工作时您期望拥有的最大记录量有关。以前,我们曾经对任何类型的领域进行这种数学计算。只需乘以 600 字节即可获得最大记录量,如果结果是可管理的,则不必担心速度。
正如 @codeholic 所说,如果您不使用 SELECT *,一切都会正常。
This is not matter of file size only but also about the maximum amount of records you are expecting to have when the database is working. Older times we used to make this kind of math for any type of field. Just multiply 600 bytes for the maximum amount of records and if the result is something manageable don't worry about the speed.
As @codeholic says if you're not using SELECT *, everything goes fine.
将图像存储在数据库中(并在每个请求上提供该图像)可以防止您将这些图像缓存在代理服务器中(或者更确切地说,使任务复杂化很多倍,并禁止几乎所有开箱即用的解决方案)。问题在于,要衡量影响,您需要以不同的方式看待它 - 而不是“单个查询获取图像需要多长时间”,问问自己“一系列(在此处输入合理的数字)查询需要多长时间”获得同一组图像需要“。也许还会问自己“我必须支付往返数据库的费用吗?”。
并不是说这个想法没有优点——更新单个位置的图像可能是一个重要因素。此外,如果高可用性是一个因素,那么将数据库作为数据中心点进行配置会更容易(将图像放在文件系统上意味着更新时在节点之间同步它们)。更改跟踪、权限、附加数据、避免“损坏的链接”——这些也可能发挥作用。
除了以上所有内容之外,我在使用“文件系统”技术方面有过糟糕的经历。目前我们正在考虑转向“数据库”技术。
Storing the image in the database (and serving it on each request) prevents you from caching those images in a proxy server (or rather - complicates the task many-fold and bars almost all out-of-the-box solutions). The catch is that to measure the impact you need to look at it differently - instead of "how much time does a single query to fetch an image takes" ask yourself "how much time does a series of (put a reasonable number here) queries to get the same set of images takes". Maybe also ask yourself "must I pay the cost to roundtrip to the DB and back?".
Not that the idea is without merit - updating the images in a single location might be an important factor. Also, if high availability is a factor, it's much easier to configure with the DB as the central point of data (putting the images on the file system means synchronizing them between nodes when they're updated). Change tracking, permissions, additional data, avoiding "broken links" - these might play a part as well.
All of the above aside, I've had bad experience using the "filesystem" technique. Currently we're considering a move to the "database" technique.