将任意键/值对与 Postgres 数据库中的表行关联的最佳执行方法?
我为 Postgres 8.4 DB 准备了一个完美的关系数据模式,但我需要能够将任意键/值对与我的几个表关联起来,分配的键随行而变化。键/值对是用户生成的,因此我无法提前预测它们或争论有序的架构更改。
我有以下要求:
- 经常读取键/值对,偶尔写入。读取速度必须相当快。
- 不需要(当前)查询键或值。 (但有一天它可能会派上用场。)
我看到以下可能的解决方案:
- Entity -属性值 模式/反模式。 令人烦恼,但这种烦恼通常会被我的 ORM 所抵消。
- 将键/值对作为序列化 JSON 数据存储在文本列上。 一个简单的解决方案,ORM 再次派上用场,但我可以告别未来的自己对查询的需求。
- 在其他一些 NoSQL 数据库中存储键/值对——可能是键/值或文档存储。 ORM 在这里没有帮助。我必须自己管理单独的查询(以及迫在眉睫的数据完整性问题?)。
我担心查询性能,因为我希望有一天会有很多这样的查询。我还担心程序员的表现,因为我必须构建、维护和使用该死的东西。这里有明显的最佳方法吗?或者我错过了什么?
I have an otherwise perfectly relational data schema in place for my Postgres 8.4 DB, but I need the ability to associate arbitrary key/value pairs with several of my tables, with the assigned keys varying by row. Key/value pairs are user-generated, so I have no way of predicting them ahead of time or wrangling orderly schema changes.
I have the following requirements:
- Key/value pairs will be read often, written occasionally. Reads must be reasonably fast.
- No (present) need to query off of the keys or values. (But it might come in handy some day.)
I see the following possible solutions:
- The Entity-Attribute-Value pattern/antipattern. Annoying, but the annoyance would be generally offset by my ORM.
- Storing key/value pairs as serialized JSON data on a text column. A simple solution, and again the ORM comes in handy, but I can kiss my future self's need for queries good-bye.
- Storing key/value pairs in some other NoSQL db--probably a key/value or document store. ORM is no help here. I'll have to manage the separate queries (and looming data integrity issues?) myself.
I'm concerned about query performance, as I hope to have a lot of these some day. I'm also concerned about programmer performance, as I have to build, maintain, and use the darned thing. Is there an obvious best approach here? Or something I've missed?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
这正是 PostgreSQL 中 hstore 数据类型的用途。
http://www.postgresql.org/docs/current/static/hstore.html
它真的很快(你可以索引它)并且很容易处理。唯一的缺点是您只能存储字符数据,但其他解决方案也会遇到这个问题。
索引支持“exists”运算符,因此您可以快速查询存在某个键的行,或者特定属性具有特定值的行。
9.0 版本变得更好,因为取消了一些大小限制。
That's precisely what the hstore datatype is for in PostgreSQL.
http://www.postgresql.org/docs/current/static/hstore.html
It's really fast (you can index it) and quite easy to handle. The only drawback is that you can only store character data, but you'd have that problem with the other solutions as well.
Indexes support "exists" operator, so you can query quite quickly for rows where a certain key is present, or for rows where a specific attribute has a specific value.
And with 9.0 it got even better because some size restrictions were lifted.
hstore 通常是一个很好的解决方案,但我个人更喜欢使用普通的键:值表。一个表包含定义,另一个表包含值和将值绑定到定义的关系,以及将值绑定到其他表中的特定记录的关系。
为什么我反对 hstore?因为它就像一个注册表模式。经常作为反模式的例子被提及。你可以把任何东西放在那里,很难简单地验证它是否仍然需要,当加载整行时(尤其是在 ORM 中),整个 hstore 都会被加载,这可能有很多垃圾并且没有什么意义。更不用说需要将 hstore 数据类型转换为您的语言类型,并在保存时再次转换回来。所以你会得到一些类型转换的开销。
所以实际上我正在尝试将我正在工作的公司中的所有 hstore 转换为简单的键:值表。但这并不是那么困难的任务,因为保存在 hstore 中的结构非常巨大(或者至少很大),并且读/写对象会产生巨大的函数调用开销。因此,制作一个简单的任务,如“select * from base_product where id = 1;”使服务器出汗并严重影响性能。需要指出的是,性能问题不是因为 db,而是因为 python 必须多次转换从 postgres 收到的结果。而 key:value 不需要这样的转换。
hstore is generally good solution for that, but personally I prefer to use plain key:value tables. One table with definitions, other table with values and relation to bind values to definition, and relation to bind values to particular record in other table.
Why I'm against hstore? Because it's like a registry pattern. Often mentioned as example of anti pattern. You can put anything there, it's hard to easy validate if it's still needed, when loading a whole row (in ORM especially), the whole hstore is loaded which can have much junk and very little sense. Not mentioning that there is need to convert hstore data type into your language type and convert back again when saved. So you get some overhead of type conversion.
So actually I'm trying to convert all hstores in company I'm working for into simple key:value tables. It's not that hard task though, because structures kept here in hstore are huge (or at least big), and reading/writing an object crates huge overhead of function calls. Thus making a simple task like that "select * from base_product where id = 1;" is making a server sweat and hits performance badly. Want to point that performance issue is not because db, but because python has to convert several times results received from postgres. While key:value is not requiring such conversion.
由于您不控制数据,因此不要尝试使问题变得过于复杂。
这类似于 EAV,但没有 attribute_keys 表,如果您不控制其中的内容,则该表没有附加值。
为了提高速度,您应该定期执行“使用 sometable_attributes_idx 对 sometable_attributes 进行集群”,这样一行的所有属性将在物理上接近。
As you do not control data then do not try to overcomplicate this.
This is like EAV, but without attribute_keys table, which has no added value if you do not control what will be there.
For speed you should periodically do "cluster sometable_attributes using sometable_attributes_idx", so all attributes for one row will be physically close.