PostgreSQL - 整数[]最佳实践
最近在开发一个网络应用程序时,我决定在数据模型中使用整数[]。有 2 个表,一个包含文章数据,第二个包含标签(标签 ID 和描述),决定在article.tags integer[] 列中标记一篇文章的标签 ID。
正如 Milen A. Radev 指出的:
提示:数组不是集合;搜索特定的数组元素可能是数据库设计错误的标志。考虑使用一个单独的表,其中每个项目都有一行,该项目将成为数组元素。这将更容易搜索,并且对于大量元素来说可能会更好地扩展。
不仅如此,结果还必须使用 JDBC 和 iBatis 来处理整数[],我应该说“有趣”。
目前,我可以摆脱我必须做的事情的工作实施。为了简单起见,它可能会使用存储article.id 和tag.id 关系的单独表进行重新设计。
最后,我很困惑整数[]最适合什么用途以及在什么情况下使用?
我想我经过一番努力才明白它不适合什么。
Working on a web app lately I decided to use integer[] in the data model. Having 2 tables, one with articles' data and a second with tags (tag id and description), decided to the tag ids an article would be tagged with in a article.tags integer[] column.
As Milen A. Radev pointed out:
Tip: Arrays are not sets; searching for specific array elements can be a sign of database misdesign. Consider using a separate table with a row for each item that would be an array element. This will be easier to search, and is likely to scale better for a large number of elements.
Not only that, but having to work with integer[] using JDBC and iBatis turned out, shall I say "interesting".
For the moment, I can get away with the working implementation in place for what I had to do. It will probably get re-worked for simplicity's sake using a separate table storing article.id and tag.id relationships.
In the end I'm puzzled by what is integer[] best used for and in what context?
I think I figured out the hard way what it's not best for.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
恕我直言,由于任何数组都违反 1NF,所以最好的上下文是:...(鼓声)..... 无。
这就涉及到为什么我们会有不适合查询的数据的问题。所有值都可能是可搜索的,如果我们不应该对可搜索值使用数组,我们会再次得出这样的结论:它永远不值得使用。
这仅留下一个假设的情况,即您仅出于在客户端上进行分析和操作的目的而存储数组。我确信这些存在,但根据我的经验不存在。
编辑:上面我说过,“我确信这些存在......”看看@mu的答案太短了。
IMHO, since any array is a violation of 1NF, the best context is:... (drumroll)..... none.
This gets to the question of why we would have data that is not meant to be queried. All values are potentially searchable, and if we are not supposed to use an array on searchable values, we are again led to the conclusion it is never worth using.
This leaves only a hypothetical case where you are storing an array solely for the purpose of analysis and manipulation on the client. I'm sure these exist, but not in my experience.
EDIT: Above I said, "I'm sure these exist..." Look at @mu is too short's answer for an example.
integer[] 对于转换表来说是很好的选择。其中键是索引,并且已知每个索引都有一个值,或者有某种方式来表示空位置(例如-1)。我认为在这种情况下它会比外键更快。
另一个用途是图表。每次测试运行的结果是什么。测试运行有 6 个结果。是一行,integer[] 是 6 个结果的数组。
integer[] would be good for a conversion table. Where the key is the index, and it is known that every index has a value, or there is someway to represent empty positions (like -1). I would think in this case it would be faster than a foreign key.
Another use would be a chart. What's the result per test run. Test runs have 6 results. is a row, the integer[] is the array of 6 results.
如果数组是一个整体的值,我可能会考虑使用数组数据类型。但是什么时候数组只是一个整体的值呢?我真的不知道。
也许只有在搜索整体价值时才更有意义。在您想知道相应的行之前,拥有一个应该准确包含您正在搜索的项目的表似乎没有多大意义。所以如果你想搜索完整的数组作为键。
IFF the array is a value as a whole, I would potentially consider using the array datatype. But when is an array only a value as a whole? I don't really know.
Maybe when it would only make more sense to search for the value as a whole. Having a table that should contain exactly the items you are searching for before you want to know the corresponding row doesn't seem to make much sense. So if you want to search for the complete array as key.
我可以想到三个应用:
第一个是非规范化。权衡包括: 您无法轻松地单独更新或处理元素。但一次获取所有这些内容既简单又快捷。它还节省了大量空间。
第二个稍微相关的是,您使用数组不是为了存储,而是为了一些中间处理。例如,如果您想要使用非 SQL 语言批量处理数据,而该语言无法轻松处理正确的集合。
第三个用于存储数据的有序列表。我遇到过一些类似的应用程序,但很难具体说明。当然,您也可以在带有额外列的表中表示该位置,但有时这没有多大意义,因为您不需要单独访问数据库中的各个部分。在某些情况下,这只是客户端应用程序想要稍后存储和检索的列表。
但你的总体感觉是对的。如果您不太了解,您的第一直觉可能应该是不要使用数组。
I can think of three applications:
The first is for denormalization. Tradeoffs include: You can't easily update or process the elements individually. But it's easy and quick to fetch all of them at once. It also saves a lot of space.
The second, slightly related, is that you use arrays not for storage but for some intermediate processing. For example, if you want to process data in batches in a non-SQL-ish language that doesn't handle proper sets easily.
The third is for storing an ordered list of data. I have come across a few applications like that, but it's hard to put a finger on it. Of course you can also represent that in a table with an extra column for the position, but sometimes this doesn't make much sense because you don't need to access the individual pieces separately in the database. In some cases, this is just a list that the client application wants to store and retrieve later.
But your overall sense is right. If you don't know better, your first instinct should probably be not to use arrays.
在处理树结构(例如注释线程)时,我在 PostgreSQL 中使用了数组。您可以将从根到节点的路径存储为分支编号数组。然后,以正确的显示顺序拉出整个树是一件简单的事情:
PostgreSQL 以唯一合理的方式比较数组。使用数组作为从根开始的路径还为您提供了一种计算节点深度的简单方法。您可以使用字符串(例如每个分支编号 3 个基于 96 的数字)和 ASCII-betical 排序来达到相同的目的,但数组更清晰。
是的,还有其他处理树的方法,这些方法更加迂腐正确,但使用数组提供了非常清晰的实现。如果我进行了大量的树操作,那么维护路径数组将涉及大量繁忙的工作,因此我可能会采用不同的表示形式。
不完全是 Java 特有的,但在某些情况下,数组是手头数据的自然且有用的表示(甚至在 SQL 中)。
I've used arrays in PostgreSQL when dealing with tree structures such as comment threads. You can store the path from the root to your node as an array of branch numbers. Then, pulling out the whole tree in the correct display order is a simple matter of:
PostgreSQL compares arrays in the only sensibly way. Using an array for the path from the root also gives you an easy way to compute the depth of a node. You could use a string (with say 3 base-96 digits per branch number) and ASCII-betical sorting for the same purpose but an array is a lot clearer.
Yes, there are other ways of dealing with trees that are more pedantically correct but using an array offered a crystal clear implementation. If I was doing a lot of tree manipulation then maintaining the path arrays would involve a lot of busy work so I'd probably go with a different representation.
Not exactly Java-specific but there are cases where arrays are a natural and useful representation (even in SQL) of the data at hand.