存储长二进制(原始数据)字符串

发布于 2024-12-04 13:03:43 字数 213 浏览 1 评论 0原文

我们正在捕获一个大小可变(从 100k 到 800k)的原始二进制字符串,并且我们希望存储这些单独的字符串。它们不需要索引(废话),并且不会对该字段的内容进行查询。

这些插入内容的数量将非常大(它们用于存档目的),假设每天 10,000 个。对于像这样的大型二进制字符串来说,最好的字段类型是什么?它应该是 text 还是 blob 还是其他?

We are capturing a raw binary string that is variable in size (from 100k to 800k) and we would like to store these individual strings. They do not need to be indexed (duh) and there will be no queries on the contents of the field.

The quantity of these inserts will be very large (they are for archival purposes), let's say 10,000 per day. What is the best field type for large binary strings like these? Should it be text or blob or something else?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

我纯我任性 2024-12-11 13:03:43

PostgreSQL 而言,类型 text 是不可能的。为此,它比 bytea 更慢、占用更多空间并且更容易出错。

基本上有 3 种方法:

  1. 使用类型 bytea(基本上相当于 SQL blob 类型的 pg)。

  2. 使用大对象

  3. 将 blob 作为文件存储在文件系统中,并且仅存储文件名
    在数据库中。

每个都有其自己的优点和缺点。

1. 处理起来相当简单,但需要最多的磁盘空间。需要一些解码和编码,这使得它也很慢。备份大小快速增长!

2. 处理起来有点尴尬,但是您有自己的基础设施来操作 blob - 如果您需要的话。而且您可以更轻松地进行单独的备份。

3. 是迄今为止最快的方法,并且使用的磁盘空间最少。但它不提供在数据库中存储时获得的引用完整性。

我有许多与图像文件类似的实现:将小缩略图存储在字节字段中,以实现引用完整性和快速引用。将原始图像作为文件存储在文件系统中。当然,您需要考虑何时以及如何删除过时的文件,如何备份外部文件等。

As far as PostgreSQL is concerned, type text is out of the question. It is slower, uses more space and is more error-prone than bytea for the purpose.

There are basically 3 approaches:

  1. Use type bytea (basically the pg equivalent of the SQL blob type).

  2. Use "large objects".

  3. Store blobs as files in the filesystem and only store the filename
    in the database.

Each has it's own advantages and disadvantages.

1. is rather simple to handle but needs the most disk space. Some decoding and encoding is required, which makes it also slow-ish. Backups grow rapidly in size!

2. is slightly awkward in handling, but you have your own infrastructure to manipulate the blobs - if you should need that. And you can more easily make separate backups.

3. is by far the fastest way and uses the least disk space. But it does not provide the referential integrity that you get when you store inside the database.

I have a number of implementations like that for image files: store a small thumbnail in a bytea-field for referential integrity and quick reference. Store the original image as file in the file-system. Of course, you need to put some thought into when and how to delete outdated files, how to backup the external files and such.

酒几许 2024-12-11 13:03:43

您可能应该在大多数数据库中使用blobtext 列通常会根据文本编码或区域设置进行规范化或转换;因此实际字节可能不会按预期保留。在几乎所有情况下,这是 textblob 列之间的唯一区别。

You should probably use blob in most databases. text columns will often be normalized or transformed according to a text encoding or locale; and so the actual bytes may not be preserved as expected. In almost all cases, this is the only difference between a text and blob column.

我们有用户 VARBINARY 列来存储最多 300 MB 的文件。

We have user VARBINARY column to store upto 300 MB of files.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文