避免 for 循环并尝试使用集合 API(性能)

发布于 2024-12-10 13:17:05 字数 918 浏览 0 评论 0原文

我有一个旧项目的一段代码。

逻辑(高层)如下:
用户发送一系列{id,Xi},其中id是数据库中对象的主键。
目的是更新数据库,但 Xi 值系列始终唯一

即,如果用户发送 {1,X1} 并且在数据库中我们有 {1,X2},{2,X1} 则输入应该被拒绝,否则我们最终会有重复项,即 {1,X1},{2,X1} 即我们在不同行中有两次 X1

在较低级别,用户发送一系列封装此信息的自定义对象。

目前,此方法的实现使用“暴力”,即对输入和 jdbc 结果集进行连续 for 循环以确保唯一性。

我不喜欢这种方法,而且实际的实现存在细微的错误,但这是另一个故事了。

我正在寻找一种在编码和性能方面更好的方法。

我的想法如下:

  • 从用户的输入列表创建一个Set。如果 Set 的大小与列表不同,则用户的输入有重复。在此停止。
  • 从 jdbc 加载数据。
  • 使用用户的输入创建一个 HashMap。关键是主键。
  • 循环结果集。如果 HashMap包含与 ResultSet 的行 id 具有相同值的键,则将其添加到 HashMap
  • 最后得到 HashMap< /code> 的值作为 List。如果它包含重复项,则拒绝输入。

这是我想出的算法。
还有比这更好的方法吗? (我假设我本身的算法没有错误)

I have a piece of code from an old project.

The logic (in a high level) is as follows:
The user sends a series of {id,Xi} where id is the primary key of the object in the database.
The aim is that the database is updated but the series of Xi values is always unique.

I.e. if the user sends {1,X1} and in the database we have {1,X2},{2,X1} the input should be rejected otherwise we end up with duplicates i.e. {1,X1},{2,X1} i.e. we have X1 twice in different rows.

In lower level the user sends a series of custom objects that encapsulate this information.

Currently the implementation for this uses "brute-force" i.e. continuous for-loops over input and jdbc resultset to ensure uniqueness.

I do not like this approach and moreover the actual implementation has subtle bugs but this is another story.

I am searching for a better approach, both in terms of coding and performance.

What I was thinking is the following:

  • Create a Set from the user's input list. If the Set has different size than list, then user's input has duplicates.Stop there.
  • Load data from jdbc.
  • Create a HashMap<Long,String> with the user's input. The key is the primary key.
  • Loop over result set. If HashMap does not contain a key with the same value as ResultSet's row id then add it to HashMap
  • In the end get HashMap's values as a List.If it contains duplicates reject input.

This is the algorithm I came up.
Is there a better approach than this? (I assume that I am not erroneous on the algorithm it self)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

德意的啸 2024-12-17 13:17:05

纯粹从性能角度来看,为什么不让数据库发现存在重复项(例如 {1,X1},{2,X1})?在表中设置唯一约束,然后当更新语句因抛出异常而失败时,捕获它并处理您在这些输入条件下想要执行的操作。如果您需要回滚任何部分更新,您可能还希望将其作为单个事务运行。当然,这是假设您没有任何其他业务规则来驱动此处未提及的更新。

使用您的算法,您花费了太多时间迭代 HashMapList 来删除重复项(恕我直言)。

Purely from performance point of view , why not let the database figure out that there are duplicates ( like {1,X1},{2,X1} ) ? Have a unique constraint in place in the table and then when the update statement fails by throwing the exception , catch it and deal with what you would want to do under these input conditions. You may also want to run this as a single transaction just if you need to rollback any partial updates. Ofcourse this is assuming that you dont have any other business rules driving the updates that you havent mentioned here.

With your algorithm , you are spending too much time iterating over HashMaps and Lists to remove duplicates IMHO.

荭秂 2024-12-17 13:17:05

由于您无法更改数据库,如评论中所述。我可能会扩展你的 Set 想法。创建一个 HashMap 并将数据库中的所有项目放入其中,然后还使用数据库中的所有值创建一个 HashSet在其中。

然后,当您检查用户输入时,根据哈希图检查键并查看值是否相同,如果相同,那么您无需执行任何操作,因为确切的输入已经在您的数据库中。

如果它们不相同,则根据 HashSet 检查该值以查看它是否已存在。如果是这样,那么您就有了一个副本。

应该比循环执行得更好。

编辑:

对于多个更新,请对从数据库创建的 HashMap 执行所有更新,然后再次检查 Map 的值集,看看其大小是否与按键设置。

可能有更好的方法来做到这一点,但这是我得到的最好的方法。

Since you can't change the database, as stated in the comments. I would probably extend out your Set idea. Create a HashMap<Long, String> and put all of the items from the database in it, then also create a HashSet<String> with all of the values from your database in it.

Then as you go through the user input, check the key against the hashmap and see if the values are the same, if they are, then great you don't have to do anything because that exact input is already in your database.

If they aren't the same then check the value against the HashSet to see if it already exists. If it does then you have a duplicate.

Should perform much better than a loop.

Edit:

For multiple updates perform all of the updates on the HashMap created from your database then once again check the Map's value set to see if its' size is different from the key set.

There might be a better way to do this, but this is the best I got.

左耳近心 2024-12-17 13:17:05

我会选择数据库端解决方案。假设一个表包含 idvalue 列,您应该创建一个包含所有“值”的列表,并使用以下 SQL:

select count(*) from tbl where value in (:values);

绑定 :values<但是,值列表中的 /code> 参数适合您的环境。 (当使用 Spring JDBC 和支持 in 运算符的数据库时,这很简单,对于较少的设置来说则更简单。作为最后的手段,您可以动态生成 SQL。)您将获得一个包含一行和一列数字类型。如果为0,则可以插入新数据;如果为 1,则报告约束违规。 (如果是其他问题,你就会遇到一个全新的问题。)

如果你需要检查用户输入中的每个项目,请将查询更改为:

select value from tbl where value in (:values)

将结果存储在一个集合中(称为例如duplicates),然后循环遍历用户输入项并检查当前项的值是否在duplicates中。

这应该比将整个数据集放入内存中表现更好。

I'd opt for a database-side solution. Assuming a table with the columns id and value, you should make a list with all the "values", and use the following SQL:

select count(*) from tbl where value in (:values);

binding the :values parameter to the list of values however is appropriate for your environment. (Trivial when using Spring JDBC and a database that supports the in operator, less so for lesser setups. As a last resort you can generate the SQL dynamically.) You will get a result set with one row and one column of a numeric type. If it's 0, you can then insert the new data; if it's 1, report a constraint violation. (If it's anything else you have a whole new problem.)

If you need to check for every item in the user input, change the query to:

select value from tbl where value in (:values)

store the result in a set (called e.g. duplicates), and then loop over the user input items and check whether the value of the current item is in duplicates.

This should perform better than snarfing the entire dataset into memory.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文