JPA 性能 - SELECT DISTINCT 和/或 Java Set

发布于 2025-01-17 22:44:42 字数 970 浏览 0 评论 0原文

我有一个带有JParepository的Spring Boot应用程序。我现在正在尝试获得DB 中某个字段的所有唯一值。

假设我有一个对象foo,带有整数PK id 和一个非唯一的整数字段bar。 I want to get all the unique values for bar and I'm wondering which query from the following is the most efficient:

  1. Use DISTINCT in the SELECT声明并将结果存储在列表中。
@Query("select distinct f.bar from Foo f")
List<Integer> getUniqueBar();
  1. 中使用不同的选择并将结果存储在集合中。
@Query("select distinct f.bar from Foo f")
Set<Integer> getUniqueBar();
  1. 不要在选择中使用不同的,然后将结果存储在集合中。
@Query("select f.bar from Foo f")
Set<Integer> getUniqueBar();

所有3个查询产生相同的结果。哪一个是最有效的?

IMO,应该是1,因为它在DB中进行了重复的过滤,但是我不确定是否在集合上使用列表会带来很多增益。

(只要它是Collection),我对将结果存储的实际容器不感兴趣。)

I have a Spring Boot application with a JpaRepository. I am trying right now to obtain all the unique values for a certain field in my db.

Let's suppose I have an object Foo with an Integer PK id and a non-unique Integer field bar. I want to get all the unique values for bar and I'm wondering which query from the following is the most efficient:

  1. Use DISTINCT in the SELECT statement and store the result in a List.
@Query("select distinct f.bar from Foo f")
List<Integer> getUniqueBar();
  1. Use DISTINCT in the SELECT and store the result in a Set.
@Query("select distinct f.bar from Foo f")
Set<Integer> getUniqueBar();
  1. Don't use DISTINCT in the select and store the result in a Set.
@Query("select f.bar from Foo f")
Set<Integer> getUniqueBar();

All the 3 queries yield the same result. Which one is the most efficient?

IMO, it should be 1, since it does the duplicate filtering in the db, but I'm not sure if using a List over a Set brings much gain.

(I'm not interested in the actual container I'm storing the result into, as long as it is a Collection.)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

情场扛把子 2025-01-24 22:44:42

选项1或2最适合性能,选项3将返回表中的所有条,甚至重复的值。我猜想您看不到数据库中重复的值的原因是因为您将结果映射到无法包含重复项的集合。而且,如果我们谈论选项1的性能与选项2相比,我会说这实际上取决于您将使用这些收藏品的方法。例如,请参见以下问题:绩效和内存分配比较设置

我会选择选项2,以使其在数据库中进行过滤,并在代码中清楚地表明,该集合中不应有重复项。

Option 1 or 2 is best for performance, option 3 will return all bars from the table, even duplicated values. I'm guessing the reason you don't see the duplicated values from the database is because you map the result to a Set which cannot contain duplicates. And if we talk about performance for option 1 compared to option 2 I would say it really depends on what you will be using these collections for. For example see this question: Performance and Memory allocation comparison between List and Set.

I would go with the Option 2 to make the filtering happen in the database and make it clear in the code that there should be no duplicates in this collection.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文