SQL Join 需要很长时间

发布于 2024-07-25 10:38:30 字数 429 浏览 5 评论 0原文

我有两个表

(1) MonthlyTarget {SalesManCode, TargetMonthYear, TargetValue}; 该表有 1966177 行。

(2) MonthlySales  {SalesManCode, SaleDate, AchievedValue};

该表有 400310 行。

我必须进行一个查询,生成如下表所示的结果:

{SalesManCode, JanTar, JanAch, FebTar, FebAch,....., DecTar, DecAch}

问题是,连接这两个表需要很长时间。

应该查询什么?

如何优化查询?

我不想考虑索引。

I have two tables

(1) MonthlyTarget {SalesManCode, TargetMonthYear, TargetValue}; this table has 1966177 rows.

(2) MonthlySales  {SalesManCode, SaleDate, AchievedValue};

this table has 400310 rows.

I have to make a query that produces a result like the following table:

{SalesManCode, JanTar, JanAch, FebTar, FebAch,....., DecTar, DecAch}

The problem is, joining these two tables taking a long time.

What should be the query?

How can the query be optimized?

I don't want to consider indexing.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

叹沉浮 2024-08-01 10:38:31

好吧,如果您不想考虑索引,那么您将始终执行全表扫描,并且性能不会提高。

Well, if you don't want to consider indexing, then you will always be preforming full table scans and performance will not be improved.

享受孤独 2024-08-01 10:38:31

您的 MonthlyTarget 表中似乎缺少一些列,即“T​​argetDate”列。

除了每个人都已经说过的关于索引的内容之外,有时分而治之的方法确实很有帮助。 不要将 1966177 行表连接到 400310 行表,而是创建小型临时表并将它们连接在一起:

CREATE TABLE #MonthlySalesAgg
(
    SalesManCode int,
    JanTar money,
    FebTar money,
    MarTar money,
    AprTar money,
    MayTar money,
    JunTar money,
    JulTar money,
    AugTar money,
    SepTar money,
    OctTar money,
    NovTar money,
    DecTar money

    PRIMARY KEY CLUSTERED (SalesManCode)
)

INSERT INTO #MonthlySalesAgg
SELECT *
FROM
(SELECT SalesManCode, TargetValue, SaleMonth = Month(TargetDate) FROM MonthlyTarget) as temp
PIVOT
(
    Max(TargetValue)
    FOR [SaleMonth] IN ([1],[2],[3],[4],[5],[6],[7],[8],[9],[10],[11],[12])
) as p

CREATE TABLE #MonthlyTargetAgg
(
    SalesManCode int,
    JanAch money,
    FebAch money,
    MarAch money,
    AprAch money,
    MayAch money,
    JunAch money,
    JulAch money,
    AugAch money,
    SepAch money,
    OctAch money,
    NovAch money,
    DecAch money

    PRIMARY KEY CLUSTERED (SalesManCode)
)

INSERT INTO #MonthlyTargetAgg
SELECT * FROM
(SELECT SalesManCode, AchievedValue, SaleMonth = Month(SaleDate) FROM MonthlySales) as temp
PIVOT
(
    Sum(AchievedValue)
    FOR [SaleMonth] IN ([1],[2],[3],[4],[5],[6],[7],[8],[9],[10],[11],[12])
) as p

上面的查询创建两个中间表,它们应包含与 SalesMan 表相同数量的记录。 加入它们很简单:

SELECT *
FROM #MonthlyTargetAgg target
INNER JOIN #MonthlySalesAgg sales ON target.SalesManCode = sales.SalesManCode

如果您发现自己需要始终按月提取数据,请将代码移至视图中。

PIVOT 需要 SQL Server 2005 或更高版本,并且它通常是一个非常有用的运算符。 希望 SQL Server 2008 将允许用户一次旋转多个列,这将导致比上面显示的查询更简单的结果。

使用 SQL Server 2000:

PIVOT 是语法糖。 例如,

SELECT * FROM
(SELECT SalesManCode, AchievedValue, SaleMonth = Month(SaleDate) FROM MonthlySales) as temp
PIVOT
(
    Sum(AchievedValue)
    FOR [SaleMonth] IN ([1],[2],[3],[4],[5],[6],[7],[8],[9],[10],[11],[12])
) as p

变成

SELECT
    SalesManCode,
    [1] = Sum(case SaleMonth when 1 then AchievedValue else 0 end),
    [2] = Sum(case SaleMonth when 2 then AchievedValue else 0 end),
    [3] = Sum(case SaleMonth when 3 then AchievedValue else 0 end),
    [4] = Sum(case SaleMonth when 4 then AchievedValue else 0 end),
    [5] = Sum(case SaleMonth when 5 then AchievedValue else 0 end),
    [6] = Sum(case SaleMonth when 6 then AchievedValue else 0 end),
    [7] = Sum(case SaleMonth when 7 then AchievedValue else 0 end),
    [8] = Sum(case SaleMonth when 8 then AchievedValue else 0 end),
    [9] = Sum(case SaleMonth when 9 then AchievedValue else 0 end),
    [10] = Sum(case SaleMonth when 10 then AchievedValue else 0 end),
    [11] = Sum(case SaleMonth when 11 then AchievedValue else 0 end),
    [12] = Sum(case SaleMonth when 12 then AchievedValue else 0 end)
FROM
    (SELECT SalesManCode, AchievedValue, SaleMonth = Month(SaleDate) FROM MonthlySales) as temp
GROUP BY SalesManCode

It looks like you're missing some columns in your MonthlyTarget table, namely a "TargetDate" column.

In addition to what everyone has already said about indexing, sometimes a divide-and-conquer approach can really help. Rather than joining a 1966177 row table to a 400310 row table, create to tiny temp tables and join them together instead:

CREATE TABLE #MonthlySalesAgg
(
    SalesManCode int,
    JanTar money,
    FebTar money,
    MarTar money,
    AprTar money,
    MayTar money,
    JunTar money,
    JulTar money,
    AugTar money,
    SepTar money,
    OctTar money,
    NovTar money,
    DecTar money

    PRIMARY KEY CLUSTERED (SalesManCode)
)

INSERT INTO #MonthlySalesAgg
SELECT *
FROM
(SELECT SalesManCode, TargetValue, SaleMonth = Month(TargetDate) FROM MonthlyTarget) as temp
PIVOT
(
    Max(TargetValue)
    FOR [SaleMonth] IN ([1],[2],[3],[4],[5],[6],[7],[8],[9],[10],[11],[12])
) as p

CREATE TABLE #MonthlyTargetAgg
(
    SalesManCode int,
    JanAch money,
    FebAch money,
    MarAch money,
    AprAch money,
    MayAch money,
    JunAch money,
    JulAch money,
    AugAch money,
    SepAch money,
    OctAch money,
    NovAch money,
    DecAch money

    PRIMARY KEY CLUSTERED (SalesManCode)
)

INSERT INTO #MonthlyTargetAgg
SELECT * FROM
(SELECT SalesManCode, AchievedValue, SaleMonth = Month(SaleDate) FROM MonthlySales) as temp
PIVOT
(
    Sum(AchievedValue)
    FOR [SaleMonth] IN ([1],[2],[3],[4],[5],[6],[7],[8],[9],[10],[11],[12])
) as p

The queries above create two intermediate tables which should contain the same number of records as your SalesMan table. Joining them is straightforward:

SELECT *
FROM #MonthlyTargetAgg target
INNER JOIN #MonthlySalesAgg sales ON target.SalesManCode = sales.SalesManCode

If you find yourself needing to pull out data by month all the time, move the code into a view instead.

PIVOT requires SQL Server 2005 or higher, and its often a very useful operator. Hopefully SQL Server 2008 will allow users to pivot on more than one column at a time, which will result in an even simpler query than shown above.

Using SQL Server 2000:

PIVOT is syntax sugar. For example,

SELECT * FROM
(SELECT SalesManCode, AchievedValue, SaleMonth = Month(SaleDate) FROM MonthlySales) as temp
PIVOT
(
    Sum(AchievedValue)
    FOR [SaleMonth] IN ([1],[2],[3],[4],[5],[6],[7],[8],[9],[10],[11],[12])
) as p

Becomes

SELECT
    SalesManCode,
    [1] = Sum(case SaleMonth when 1 then AchievedValue else 0 end),
    [2] = Sum(case SaleMonth when 2 then AchievedValue else 0 end),
    [3] = Sum(case SaleMonth when 3 then AchievedValue else 0 end),
    [4] = Sum(case SaleMonth when 4 then AchievedValue else 0 end),
    [5] = Sum(case SaleMonth when 5 then AchievedValue else 0 end),
    [6] = Sum(case SaleMonth when 6 then AchievedValue else 0 end),
    [7] = Sum(case SaleMonth when 7 then AchievedValue else 0 end),
    [8] = Sum(case SaleMonth when 8 then AchievedValue else 0 end),
    [9] = Sum(case SaleMonth when 9 then AchievedValue else 0 end),
    [10] = Sum(case SaleMonth when 10 then AchievedValue else 0 end),
    [11] = Sum(case SaleMonth when 11 then AchievedValue else 0 end),
    [12] = Sum(case SaleMonth when 12 then AchievedValue else 0 end)
FROM
    (SELECT SalesManCode, AchievedValue, SaleMonth = Month(SaleDate) FROM MonthlySales) as temp
GROUP BY SalesManCode
烂柯人 2024-08-01 10:38:31

检查表上的索引是否正确。 如果不查看数据库本身,几乎不可能说出来,但 99% 的慢连接是由于不正确或丢失的表索引造成的。

Check for proper indexing on the tables. It is almost impossible to say without looking at the database itself but 99% of the time slow joins are due to improper or missing table indices.

晨与橙与城 2024-08-01 10:38:31

我不想考虑索引。

你必须考虑索引。 无论您如何编写查询,数据库引擎别无选择,只能扫描每个表以查找连接,并且它可能会一遍又一遍地执行此操作。 你没有选择。

如果您因为无法控制数据库而不想搞乱索引,请考虑将数据导出到本地 SQL Express 实例。

哎呀,即使将数据导出到平面文件,按 SalesManCode 对文件进行排序并编写一个简单的程序来读取和匹配它们也会更快。

I don't want to consider indexing.

You have to consider indexing. It doesn't matter how you write the query, the db engine has no choice but to scan each table looking for the joins and it will likely do that over and over again. You have no choice.

If you don't want to mess with indexing because you don't have control over the database, consider exporting the data to a local SQL Express instance.

Heck, even exporting the data to flat files, sorting the files by SalesManCode and writing a simple program to read and match them would be faster.

旧城烟雨 2024-08-01 10:38:31

除了解雇一群销售人员之外,请考虑以下选项:

  • 创建一个批处理流程以每晚运行并使用查询的输出填充一些报告表
  • 创建所需查询的索引/物化视图(尽管您需要在索引视图上创建索引,所以这可能会破坏您的无索引规则)

Short of firing a bunch of sales people, consider the following options:

  • create a batch process to run this nightly and populate some reporting tables with the output of your query
  • create an indexed/materialized view of the query you want (although you will need to create an index on the indexed view, so maybe this breaks your no-indexes rule)
难得心□动 2024-08-01 10:38:31

你有过滤器吗? 您能否将一些部分结果存储在临时表中,然后在减小数据大小后加入到其余数据中?

Do you have any filters? Could you store some partial results in a temp-table, then join to the rest of your data after you've reduced the size of your data?

世界等同你 2024-08-01 10:38:31

如果无法选择索引,则唯一加快速度的其他方法是将其放在更快的服务器上。 有些事情告诉我索引会更容易。

If indexing isn't an option, the only other way to speed it up is putting it on a faster server. Something tells me indexing would be easier though.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文