存储长二进制(原始数据)字符串
我们正在捕获一个大小可变(从 100k 到 800k)的原始二进制字符串,并且我们希望存储这些单独的字符串。它们不需要索引(废话),并且不会对该字段的内容进行查询。
这些插入内容的数量将非常大(它们用于存档目的),假设每天 10,000 个。对于像这样的大型二进制字符串来说,最好的字段类型是什么?它应该是 text
还是 blob
还是其他?
We are capturing a raw binary string that is variable in size (from 100k to 800k) and we would like to store these individual strings. They do not need to be indexed (duh) and there will be no queries on the contents of the field.
The quantity of these inserts will be very large (they are for archival purposes), let's say 10,000 per day. What is the best field type for large binary strings like these? Should it be text
or blob
or something else?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
就 PostgreSQL 而言,类型
text
是不可能的。为此,它比bytea
更慢、占用更多空间并且更容易出错。基本上有 3 种方法:
使用类型
bytea
(基本上相当于 SQL blob 类型的 pg)。使用“大对象”。
将 blob 作为文件存储在文件系统中,并且仅存储文件名
在数据库中。
每个都有其自己的优点和缺点。
1. 处理起来相当简单,但需要最多的磁盘空间。需要一些解码和编码,这使得它也很慢。备份大小快速增长!
2. 处理起来有点尴尬,但是您有自己的基础设施来操作 blob - 如果您需要的话。而且您可以更轻松地进行单独的备份。
3. 是迄今为止最快的方法,并且使用的磁盘空间最少。但它不提供在数据库中存储时获得的引用完整性。
我有许多与图像文件类似的实现:将小缩略图存储在字节字段中,以实现引用完整性和快速引用。将原始图像作为文件存储在文件系统中。当然,您需要考虑何时以及如何删除过时的文件,如何备份外部文件等。
As far as PostgreSQL is concerned, type
text
is out of the question. It is slower, uses more space and is more error-prone thanbytea
for the purpose.There are basically 3 approaches:
Use type
bytea
(basically the pg equivalent of the SQL blob type).Use "large objects".
Store blobs as files in the filesystem and only store the filename
in the database.
Each has it's own advantages and disadvantages.
1. is rather simple to handle but needs the most disk space. Some decoding and encoding is required, which makes it also slow-ish. Backups grow rapidly in size!
2. is slightly awkward in handling, but you have your own infrastructure to manipulate the blobs - if you should need that. And you can more easily make separate backups.
3. is by far the fastest way and uses the least disk space. But it does not provide the referential integrity that you get when you store inside the database.
I have a number of implementations like that for image files: store a small thumbnail in a bytea-field for referential integrity and quick reference. Store the original image as file in the file-system. Of course, you need to put some thought into when and how to delete outdated files, how to backup the external files and such.
您可能应该在大多数数据库中使用
blob
。text
列通常会根据文本编码或区域设置进行规范化或转换;因此实际字节可能不会按预期保留。在几乎所有情况下,这是text
和blob
列之间的唯一区别。You should probably use
blob
in most databases.text
columns will often be normalized or transformed according to a text encoding or locale; and so the actual bytes may not be preserved as expected. In almost all cases, this is the only difference between atext
andblob
column.我们有用户 VARBINARY 列来存储最多 300 MB 的文件。
We have user VARBINARY column to store upto 300 MB of files.