PostgreSQL:BYTEA 与 OID+大对象?
我使用 Hibernate 3.2 和 PostgreSQL 8.4 启动了一个应用程序。我有一些 byte[]
字段映射为 @Basic
(= PG bytea),其他字段映射为 @Lob
(=PG大物体)。为什么不一致?因为我是一个 Hibernate 菜鸟。
现在,这些字段最大为 4 Kb(但平均为 2-3 kb)。 PostgreSQL 文档提到当字段很大时 LO 很好,但我不明白“大”是什么意思。
我已经升级到带有 Hibernate 3.6 的 PostgreSQL 9.0,并且我一直坚持将注释更改为 @Type(type="org.hibernate.type.PrimitiveByteArrayBlobType")
。这个错误带来了潜在的兼容性问题,我最终发现与普通字段相比,大对象处理起来很痛苦。
所以我正在考虑将其全部更改为bytea
。但我担心 bytea
字段是以十六进制编码的,因此编码和解码时会产生一些开销,这会损害性能。
这两者的性能是否有良好的基准? 有人进行了转换并看到了变化吗?
I started an application with Hibernate 3.2 and PostgreSQL 8.4. I have some byte[]
fields that were mapped as @Basic
(= PG bytea) and others that got mapped as @Lob
(=PG Large Object). Why the inconsistency? Because I was a Hibernate noob.
Now, those fields are max 4 Kb (but average is 2-3 kb). The PostgreSQL documentation mentioned that the LOs are good when the fields are big, but I didn't see what 'big' meant.
I have upgraded to PostgreSQL 9.0 with Hibernate 3.6 and I was stuck to change the annotation to @Type(type="org.hibernate.type.PrimitiveByteArrayBlobType")
. This bug has brought forward a potential compatibility issue, and I eventually found out that Large Objects are a pain to deal with, compared to a normal field.
So I am thinking of changing all of it to bytea
. But I am concerned that bytea
fields are encoded in Hex, so there is some overhead in encoding and decoding, and this would hurt the performance.
Are there good benchmarks about the performance of both of these?
Anybody has made the switch and saw a difference?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
基本上,在某些情况下,每种情况都有道理。 bytea 更简单,通常是首选。客户端库为您提供解码,因此这不是问题。
然而,LOB 具有一些巧妙的功能,例如能够在其中查找并将 LOB 视为字节流而不是字节数组。
“Big”的意思是“足够大,你不想一次性将其发送给客户”。从技术上讲,bytea 限制为 1GB 压缩,lob 限制为 2GB 压缩,但实际上,无论如何,您首先会达到另一个限制。如果它足够大,您不希望它直接出现在结果集中,并且您不想一次将其全部发送到客户端,请使用 LOB。
Basically there are cases where each makes sense. bytea is simpler and generally preferred. The client libs give you the decoding so that's not an issue.
However LOBs have some neat features, such as an ability to seek within them and treat the LOB as a byte stream instead of a byte array.
"Big" means "Big enough you don't want to send it to the client all at once." Technically bytea is limited to 1GB compressed and a lob is limited to 2GB compressed, but really you hit the other limit first anyway. If it's big enough you don't want it directly in your result set and you don';t want to send it to the client all at once, use a LOB.
bytea 输入可以采用十六进制或转义格式,这是您的选择。存储将是相同的。从版本 9.0 开始,输出默认值为十六进制,但您可以通过编辑参数 bytea_output。
我还没有看到任何基准。
bytea input can be in hex or escape format, that's your choice. Storage will be the same. As of version 9.0, the output default is hex, but you can change this by editting the parameter bytea_output.
I haven't seen any benchmarks.
tl;dr 使用 bytea
...除非您需要流或 >1GB 值
Bytea: 与任何其他 TOAST 值类似的字节序列。每个值限制为 1GB,每个表限制为 32TB。
大对象:二进制数据分为多行。支持像操作系统文件一样的查找、读取和写入,因此操作不需要一次将其全部加载到内存中。每个值限制为 4TB,每个数据库限制为 32TB。
大对象有以下缺点:
每个数据库只有大对象表。
删除“所属”记录时,不会自动删除大对象。请参阅
lo
模块中的lo_manage
函数。由于只有一张表,大对象权限必须逐条记录处理。
流媒体很困难,并且与简单的
bytea
相比,客户端驱动程序的支持较少。它是系统架构的一部分,因此您对分区和表空间等选项的控制有限甚至无法控制。
我敢说,
bytea
可以更好地服务于现实世界中 93% 的大型对象使用。tl;dr Use bytea
...unless you need streaming or >1GB values
Bytea: A byte sequence that works like any other TOAST-able value. Limited to 1GB per value, 32TB per table.
Large object: Binary data split up into multiple rows. Supports seek, read, and write like an OS file, so operations don't require loading it all into memory at once. Limited to 4TB per value, 32TB per database.
Large objects have the following downsides:
The is only large object table per database.
Large objects aren't automatically removed when the "owning" record is deleted. See the
lo_manage
function in thelo
module.Since there is only one table, large object permissions have to be handled record by record.
Streaming is difficult, and has less support by client drivers than simple
bytea
.It's part of the system schema, so you have limited to no control over options like partitioning and tablespaces.
I venture that 93% of real-world uses of large objects would be better served by
bytea
.我没有方便地比较大型对象和 bytea,但请注意,在 9.0 中切换到十六进制输出格式也是因为它比以前的自定义编码更快。就二进制数据的文本编码而言,您可能不会比目前的速度快得多。
如果这对您来说还不够好,您可以考虑在 PostgreSQL 客户端和服务器之间使用二进制协议。然后,您基本上可以直接从磁盘获取内容,就像大型对象一样。我不知道 PostgreSQL JDBC 是否支持该功能,但快速搜索表明不支持。
I don't have a comparison of large objects and bytea handy, but note that the switch to the hex output format in 9.0 was made also because it is faster than the previous custom encoding. As far as text encoding of binary data goes, you probably won't get much faster than what there currently is.
If that is not good enough for you, you can consider using the binary protocol between PostgreSQL client and server. Then you basically get the stuff straight from disk, much like large objects. I don't know if the PostgreSQL JDBC supports that yet, but a quick search suggests no.