执行“优化”在 java.util.ArrayList 或 SQL 中设置操作
这是我第一次需要提出问题来解决我的查询,然后今天之前的线程能够解决我的查询。
场景: 我在数据库中有两个表(假设 A 和 B),其中有一些相同的行和一些不同的行。需要注意的是,两个表都有数百万行或至少超过一百万行。
现在我必须对它们执行集合操作,就像不一样。相同的行(交叉点),没有。新行数 (AB) 和数量。旧行 (BA)。
我有两个选择: 1)我可以在数据库上执行设置操作作为查询本身。
2)或者我可以获取记录并对内存中的 java.util.ArrayLIst 执行操作。
但在这两种情况下,花费的时间都太长,大约 5 到 10 分钟,这可能是最好的(优化)方法。
请帮忙。
This is the first time I need to ask question to get my query solved before today previous threads were able to solve my queries.
Scenario:
I have two tables(lets say A & B) in a database with some same rows and some different rows. Point to note is that both tables have millions of rows or at least more than a million.
Now I have to perform set operations on them like no. of same rows (intersection) , no. of new rows(A-B) and no. of old rows (B-A).
I have two choices:
1) I can perform set operations as query it-self on the database.
2) Or I can fetch the records and perform operations on java.util.ArrayLIst in memory.
But in both the cases its taking too long around 5 to 10 mins what could be the best(OPTIMIZED) approach for this.
Please help.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
首先,
union
、minus
等不是对表的操作,而是对select
语句结果的操作。在您的情况下,您正在扫描巨大的表,创建巨大的结果集,然后比较每个结果。这非常慢。您必须以可以使用索引的方式进行优化。其次,将结果从 DB 加载到 Java,然后处理它们本质上比在 DB 中处理并在 Java 中准备好结果要慢。
First of all,
union
,minus
, etc. are not operations on tables, but onselect
statement results. In your case you are scanning huge tables, create huge result sets and then compare each of the results. This is very slow. You have to optimize in a way that indices can be used.Second, loading results from DB to Java, and then processing them is inherently slower then to process in the DB, and get ready result in Java.
选择数据库方法,因为优化 RDBMS 的查询逻辑需要花费数百人年的时间。我确信您不想在您的客户端中重复这种努力。如果当前使用 SQL 太慢,请确保所有必需的索引都已就位,并对查询执行计划进行分析(不同数据库管理系统的具体操作方法有所不同),以查找查询中的任何瓶颈(例如全表)扫描)。
Go for the database approach, because hundreds of man-years went into optimizing the query logic of the RDBMS. I am sure that you do not want to replicate this effort in your client. If using SQL is currently too slow, make sure that all required indices are in place and perform an analysis of the query execution plan (how to do that varies among different database management systems) to find any bottlenecks in your query (such as full table scans).
我会把它留给数据库。
如果表还没有主键,您可以在两个表中创建一个列(索引)来存储列值的哈希值,使用该列进行更简单的比较,例如:
I would leave it up to the Database.
If the tables do not already have a Primary Key, you can create a column (indexed) in both tables that stores a hash of the column values, use that column to do easier comparisons, for instance: