在文件系统上混合使用 RDBMS 和文件的最佳实践

发布于 2024-12-13 13:06:24 字数 290 浏览 3 评论 0原文

在我正在处理的模式中的一个表中,我需要处理数千个“数据表”,其中大部分是 PDF 文档,有时还有 PNG、JPG 等图形图像文件。该模式对电子器件进行建模经销商门户,新产品经常添加到他们的产品组合中。

这些文档(数据表)是在推出新产品时添加的,但它们需要不时更新(由于文档的新版本,而不是产品本身),所以我认为更新为异步过程。

鉴于此,我应该只在表中保留数据表(和类似文档)的文件名/路径,而实际文件位于文件系统上,还是应该采用 blob 方法。我几乎可以肯定这应该是前一种方法,但仍然想听取社区的建议,看看是否有一些需要注意的陷阱。

In one of the tables in the schema I am working on, I need to deal with couple-of thousand "data-sheets" which are mostly PDF documents, and sometimes graphic-image files like PNG, JPG etc. The schema models a Electronics Distributor's portal, where new products get added to their portfolio frequently.

These documents (data-sheets) are added, at the time of introduction of a new product, but they need updates from time to time (s.a. due to newer version of the document, not the product itself), so I'd think the update to be an asynchronous procedure.

Given this, should I keep only the file-name/path of the data-sheets (& similar documents) in my table, with the actual file being on filesystem, or should I take the blob approach. I am almost certain that it should be the former approach, but still wanted to take community advise, and see if there are some pitfalls to watchout for.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

转瞬即逝 2024-12-20 13:06:28

为了完整起见,我只想提一下,某些数据库允许您使用这两种方法的“混合”,例如 Oracle BFILEMS SQL Server 文件流

Ask Tom 上还有一个有趣的讨论 在 Oracle BLOB 中存储文件(简而言之:“BLOB比文件更好”)。


顺便说一句,您不一定需要选择其中一个...如果您可以承受存储开销并且您在以读取为主的环境中操作,则可以将“主”数据存储在BLOB 确保完整性,但将相同的数据“缓存”在文件中以进行快速只读访问。一些注意事项:

  • 如果更新/删除 BLOB,您需要确保文件也被更新/删除。
  • 考虑按需创建/更新文件。
  • 考虑从“缓存”中逐出旧文件,即使相应的 BLOB 仍然存在。
  • 考虑使用多个“缓存”(例如,如果您有一个中间层并分布到多个物理计算机,则每台计算机都可以有自己的文件缓存)。
  • 最后,您需要确保所有这些在并发环境中都能正常运行。

因此,这不是最简单的方法,但根据您的需求,可能是完整性、性能和实施工作之间的良好权衡。

For completeness, let me just mention that some databases allow you to have a "hybrid" of these two approaches, for example Oracle BFILE or MS SQL Server FILESTREAM.

There is also an interesting discussion at Ask Tom on storing files in Oracle BLOBs (in a nutshell: "BLOBs are better than files").


BTW, you don't necessarily need to chose one over another... If you can afford storage overhead and you are operating in a read-mostly environment, you could store the "master" data in the BLOB for integrity but "cache" that same data in a file for quick read-only access. Some considerations:

  • You'd need to make sure the file is updated/removed if BLOB is updated/removed.
  • Consider creating/updating the file on-demand.
  • Consider evicting old files from the "cache" even if corresponding BLOBs still exist.
  • Consider using several "caches" (e.g. if you have a middle tier and is distributed to multiple physical machines, each machine could have its own file cache).
  • And finally, you'd need to make sure all this works robustly in a concurrent environment.

So, this is not the simplest approach but, depending on your needs, may be a good tradeoff between integrity, performance and implementation effort.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文