将图像存储在数据库中
可能的重复:
在数据库中存储图像 - 是或否?
多年来我一直在使用被告知不要在数据库中存储图像,或者任何大的 BLOB。 虽然我可以理解为什么数据库效率不高,但我从来不明白为什么它们不能。 如果我可以将文件放在某处并引用它,为什么数据库引擎不能做同样的事情。 我很高兴达米安·卡茨 (Damien Katz) 在最近的 Stack Overflow 播客中提到了这一点,乔尔·斯波尔斯基 (Joel Spolsky) 和杰夫·阿特伍德 (Jeff Atwood) 至少默默地表示同意。
我一直在阅读有关 Microsoft SQL Server 2008 应该能够有效处理 BLOB 的提示,这是真的吗? 如果是这样,那么是什么阻止我们仅仅在那里存储图像并解决一个问题呢? 我能想到的一件事是,虽然图像可以由静态 Web 服务器非常快速地提供(如果它是某处的文件),但当它位于数据库中时,它必须从数据库传输到 Web 服务器应用程序(这可能比静态 Web 服务器),然后提供服务。 缓存不应该帮助/解决最后一个问题吗?
Possible Duplicate:
Storing Images in DB - Yea or Nay?
For ages I've been told not to store images on the database, or any big BLOB for that matter. While I can understand why the databases aren't/weren't efficient for that I never understood why they couldn't. If I can put a file somewhere and reference it, why couldn't the database engine do the same. I'm glad Damien Katz mentioned it on a recent Stack Overflow podcast and Joel Spolsky and Jeff Atwood, at least silently, agreed.
I've been reading hints that Microsoft SQL Server 2008 should be able to handle BLOBs efficient, is that true? If so, what is there stopping us from just storing images there and getting rid of one problem? One thing I can think of is that while the image can be served by a static web server very quickly if it's a file somewhere, when it's in the database it has to travel from the database to the web server application (which might be slower than the static web server) and then it's served. Shouldn't caching help/solve that last issue?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
是的,确实如此,SQL Server 2008 刚刚实现了一项类似于您提到的功能,它称为文件流。 如果您确定只想在您的应用程序中使用 SQL Server(或者愿意在性能或在新的基础上开发类似层上付出代价),那么这确实是在数据库中存储 blob 的一个很好的论据。数据库服务器)。 尽管我预计类似的层将开始出现(如果不同的数据库服务器尚不存在)。
一如既往,真正的好处是什么取决于特定的场景。 如果您将提供大量相对静态的大文件,那么考虑到性能/可管理性组合,此方案加上缓存可能是最佳选择。
本白皮书介绍了 SQL Server 2008 的 FILESTREAM 功能,该功能允许存储使用 SQL Server 2008 和 NTFS 文件系统的组合来高效访问 BLOB 数据。 它涵盖了 BLOB 存储的选择、配置 Windows 和 SQL Server 以使用 FILESTREAM 数据、将 FILESTREAM 与其他功能相结合的注意事项以及分区和性能等实现细节。
Yes, it's true, SQL Server 2008 just implemented a feature like the one you mention, it's called a filestream. And it's a good argument indeed for storing blobs in a DB, if you are certain you will only want to use SQL Server for your app (or are willing to pay the price in either performance or in developing a similar layer on top of the new DB server). Although I expect similar layers will start to appear if they don't already exist for different DB servers.
As always what would the real benefits be depend on the particular scenario. If you will serve lots of relatively static, big files, then this scenario plus caching will probably be the best option considering a performance/manageability combo.
This white paper describes the FILESTREAM feature of SQL Server 2008, which allows storage of and efficient access to BLOB data using a combination of SQL Server 2008 and the NTFS file system. It covers choices for BLOB storage, configuring Windows and SQL Server for using FILESTREAM data, considerations for combining FILESTREAM with other features, and implementation details such as partitioning and performance.
仅仅因为您可以做某事并不意味着您应该做某事。
如果您关心效率,您仍然很可能不想为任何足够大规模的文件服务执行此操作。
另外看起来这个话题已经被广泛讨论了......
Just because you can do something doesn't mean you should.
If you care about efficiency you'll still most likely not want to do this for any sufficiently large scale file serving.
Also it looks like this topic has been heavily discussed...
我会尽力分解您的问题并尽力解决您的各个部分。
SQL Server 2008 和文件流类型 - Vinko 的上述答案是迄今为止我见过的最好的答案。 Filestream 类型是 SQL Server 2008,这正是您所寻找的。 Filestream 处于版本 1 中,因此仍然有一些原因我不建议在企业应用程序中使用 if 。 举个例子,我记得你不能将底层物理文件的存储拆分到多个 Windows UNC 路径上。 迟早这将成为企业应用程序的一个相当严重的限制。
在数据库中存储文件 - 从更宏观的角度来看,Damien Katz 最初的方向是正确的。 大多数大型企业内容管理 (ECM) 播放器将文件存储在文件系统上,将元数据存储在 RDBMS 中。 如果您进一步了解 Amazon 的 S3 服务,您会发现具有非关系数据库后端的物理文件。 除非您要测量数十亿的存储文件,否则我不建议您走这条路并自行实施。
有关数据库中文件的更多详细信息 - 乍一看,很多内容都与数据库中的文件有关。 一是简单性,二是交易完整性。 由于 Windows 文件系统无法加入事务,因此需要跨数据库和文件系统进行的写入需要内置事务补偿逻辑。直到与 DBA 交谈后,我才真正看到故事的另一面。 他们通常不喜欢混合业务数据和 blob(备份变得很痛苦),因此除非您有一个专用于文件存储的单独数据库,否则此选项通常对 DBA 没有吸引力。 你是对的,在其他条件相同的情况下,数据库会更快。 由于不知道您的应用程序的用例,我无法对缓存选项说太多。 可以这么说,在许多企业应用程序中,文档的缓存命中率实在太低,以至于无法证明缓存它们的合理性。
希望这可以帮助。
I'll try to decompose your question and address your various parts as best I can.
SQL Server 2008 and the Filestream Type - Vinko's answer above is the best one I've seen so far. The Filestream type is the SQL Server 2008 is what you were looking for. Filestream is in version 1 so there are still some reasons why I wouldn't recommend using if for an enterprise application. As an example, my recollection is that you can't split the storage of the underlying physical files across multiple Windows UNC paths. Sooner or later that will become a pretty serious constraint for an enterprise app.
Storing Files in the Database - In the grander scheme of things, Damien Katz's original direction was correct. Most of the big enterprise content management (ECM) players store files on the filesystem and metadata in the RDBMS. If you go even bigger and look at Amazon's S3 service, you're looking at physical files with a non-relational database backend. Unless you're measuring your files under storage in the billions, I wouldn't recommend going this route and rolling your own.
A Bit More Detail on Files in the Database - At first glance, a lot of things speak for files in the database. One is simplicity, two is transactional integrity. Since the Windows file system cannot be enlisted in a transaction, writes that need to occur across the database and filesystem need to have transaction compensation logic built in. I didn't really see the other side of the story until I talked to DBAs. They generally don't like commingling business data and blobs (backup becomes painful) so unless you have a separate database dedicated to file storage, this option is generally not as appealing to DBAs. You're right that the database will be faster, all other things being equal. Not knowing the use case for your application, I can't say much about the caching option. Suffice it to say that in many enterprise applications, the cache hit rate on documents is just too darn low to justify caching them.
Hope this helps.
在数据库中存储 blob 时要谨慎的典型原因之一是数据将在事务控制下存储和编辑(更改),这意味着 DBMS 需要确保它可以回滚更改并在崩溃后恢复更改。 这通常是通过事务日志主题的一些变化来完成的。 如果 DBMS 要在 2 GB blob 中记录更改,那么它必须有一种方法来识别已更改的内容。 这可能是简单的(前图像和后图像)或更复杂的(某种二进制增量操作),计算成本更高。 即便如此,有时最终结果将是通过日志存储千兆字节的数据。 这会损害系统性能。 有多种方法可以限制更改的影响 - 减少流经日志的数据量 - 但需要权衡。
在数据库中存储文件名的代价是,DBMS(通常)无法控制文件何时更改,因此,数据的可重复性再次受到损害; 您无法保证 DBMS 外部的某些内容没有更改数据。 (这个论点有一个非常普遍的版本 - 你不能确定有人没有篡改一般的数据库存储文件。但我指的是在数据库中存储一个文件名,引用一个不受DBMS。DBMS 控制的文件受到保护,不会被非特权者随意更改。)
新的 SQL Server 功能听起来很有趣。 我还没有探索它的作用,所以我无法评论它在多大程度上避免或限制了上面提到的问题。
One of the classical reasons for caution about storing blobs in databases is that the data will be stored and edited (changed) under transaction control, which means that the DBMS needs to ensure that it can rollback changes, and recover changes after a crash. This is normally done by some variation on the theme of a transaction log. If the DBMS is to record the change in a 2 GB blob, then it has to have a way of identifying what has changed. This might be simple-minded (the before image and the after image) or more sophisticated (some sort of binary delta operation) that is more computationally expensive. Even so, sometimes the net result will be gigabytes of data to be stored through the logs. This hurts the system performance. There are various ways of limiting the impact of the changes - reducing the amount of data flowing through the logs - but there are trade-offs.
The penalty for storing filenames in the database is that the DBMS has no control (in general) over when the files change - and hence again, the reproducibility of the data is compromised; you cannot guarantee that something outside the DBMS has not changed the data. (There's a very general version of that argument - you can't be sure that someone hasn't tampered with the database storage files in general. But I'm referring to storing a file name in the database referencing a file not controlled by the DBMS. Files controlled by the DBMS are protected against casual change by the unprivileged.)
The new SQL Server functionality sounds interesting. I've not explored what it does, so I can't comment on the extent to which it avoids or limits the problems alluded to above.
SQL Server 中有一些选项可以管理存储大块数据的位置,这些选项至少从 SQL2005 就已经存在,所以我不知道为什么不能存储大块数据。 例如,MOSS 将您上传到其中的所有文档存储在 SQL 数据库中。
当然,与任何事情一样,这都会对性能产生一些影响,因此您应该注意,如果您不需要它,则不要检索该 blob,并且不要将其包含在索引等中。
There are options within SQL Server to manage where it stores large blobs of data, these have been in there since at lease SQL2005 so I don't know why you couldn't store large BLOBs of data. MOSS for instance stores all of the documents you upload to it in a SQL database.
There are of course some performance implications, as with just about anything, so you should take care that you don't retreive the blob if you don't need it, and don't include it in indexes etc.