从 SQL Server 中非常大的表 (600M) 获取子集 (200M) 的最快方法

发布于 2024-09-28 06:53:09 字数 710 浏览 4 评论 0原文

我们面临以下问题,我们正在努力想出最好的解决方案。

我们使用的是 SQL Server 2008。我们有一个包含超过 6 亿条记录、大约 25 列的表。其中一列是 ID,并已建立索引。我们需要从此表中获取记录的子集。主要有两种情况:

a)子集包含1000到2亿行之间的任意位置;要获取的行的 ID 存储在 SQL Server 的索引表中;

b) 子集包含少于 100 万行;要获取的行的 ID 存储在应用程序服务器的内存(.NET Web 应用程序)中。

我们的挑战是尽快获得这个子集。我们确实需要这在几秒钟内发生。

到目前为止我们的解决方案:

a)我们在两个表之间进行连接。这可行,但速度不够快。该查询类似于 SELECT * FROM Big_Table JOIN IndexTable ON Big_Table.ID = IndexTable.ID。

b) 我们确实没有解决方案。我们尝试运行 WHERE IN 查询,但如果子集接近百万行,则需要很长时间(SELECT * FROM Big_Table WHERE ID IN (ID1, ID2, ID3...))。解决方案可能是将索引存储在临时表中,但随后又会回到情况 a。

我们正在尝试尽可能地优化 SQL Server,特别是使用良好的索引和分区。今天我更多地问你认为最好的方法是提取两种情况(a 和 b)的子集。

任何想法或建议表示赞赏。

谢谢

we are facing the following problem and we are trying to come up with the best possible solution.

We are using SQL Server 2008. We have a table that has more than 600 millions records, and has about 25 columns. One of the columns is an ID and is indexed. We need to get a subset of records from this table. There are mainly 2 cases:

a) the subset contains anywhere between 1000 and 200 millions rows ; the IDs of the rows to get are stored in an indexed table in SQL Server;

b) the subset contains less than 1 million rows ; the IDs of the rows to get are stored in the application's server's memory (.NET web app).

Our challenge is to get this subset as fast as possible. We do need this to happen in a few seconds.

Our solutions so far:

a) we do a join between the two tables. That works but that is not fast enough. The query looks something like SELECT * FROM Big_Table JOIN IndexTable ON Big_Table.ID = IndexTable.ID.

b) we do not really have a solution. We tried to run a WHERE IN query but that simply takes forever if the subset approaches the million rows (SELECT * FROM Big_Table WHERE ID IN (ID1, ID2, ID3...)). A solution could be to store the indexes in a temporary table but then this falls back to case a.

We are trying to optimize SQL Server as much as we can, notably using good indexes and partioning. I am more asking today of what you think the best approach is to extract the subset in both cases (a and b).

Any idea or suggestion is appreciated.

Thanks

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

筑梦 2024-10-05 06:53:09

既然你正在进行分页,你不能先从 IndexTable 中子选择一系列行,然后将其加入到大表中吗?

像这样的东西:

select * from big_table where id in (select top 100 id from indextable)

Since you're doing pagination, couldn't you subselect a range of rows from the IndexTable first, then join this to the big table?

Something like:

select * from big_table where id in (select top 100 id from indextable)
×纯※雪 2024-10-05 06:53:09

对于您的问题a)(SELECT * FROM Big_Table JOIN IndexTable ON Big_Table.ID = IndexTable.ID查询)最明显的问题是:您确定在ID上有索引吗?两个表中的代码>列?

The most obvious question for your problem a) (the SELECT * FROM Big_Table JOIN IndexTable ON Big_Table.ID = IndexTable.ID query) is: are you sure to have indices on the ID column in both tables?

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文