用于选择首选糖果的高效 SQL 2000 查询

发布于 2024-07-25 11:23:45 字数 722 浏览 3 评论 0原文

(我希望我能想出一个更具描述性的标题...如果您能说出我要询问的查询类型,请建议一个或编辑这篇文章)

数据库:SQL Server 2000

示例数据(假设 500,000 行):

Name   Candy       PreferenceFactor
Jim    Chocolate   1.0
Brad   Lemon Drop   .9
Brad   Chocolate    .1
Chris  Chocolate    .5
Chris  Candy Cane   .5
499,995 more rows...

请注意,具有给定“名称”的行数是无限的。

期望的查询结果:

Jim    Chocolate   1.0
Brad   Lemon Drop   .9
Chris  Chocolate    .5
~250,000 more rows...

(由于 Chris 对拐杖糖和巧克力的偏好相同,因此一致的结果就足够了)。

问题: 如何从数据中选择名称、糖果,其中每个结果行都包含唯一的名称,以便所选糖果的每个名称具有最高的 PreferenceFactor。 (快速有效的答案优先)。

表上需要哪些索引? 如果 Name 和 Candy 是另一个表的整数索引(除了需要一些联接之外),这会有什么不同吗?

(I wish I could have come up with a more descriptive title... suggest one or edit this post if you can name the type of query I'm asking about)

Database: SQL Server 2000

Sample Data (assume 500,000 rows):

Name   Candy       PreferenceFactor
Jim    Chocolate   1.0
Brad   Lemon Drop   .9
Brad   Chocolate    .1
Chris  Chocolate    .5
Chris  Candy Cane   .5
499,995 more rows...

Note that the number of rows with a given 'Name' is unbounded.

Desired Query Results:

Jim    Chocolate   1.0
Brad   Lemon Drop   .9
Chris  Chocolate    .5
~250,000 more rows...

(Since Chris has equal preference for Candy Cane and Chocolate, a consistent result is adequate).

Question:
How do I Select Name, Candy from data where each resulting row contains a unique Name such that the Candy selected has the highest PreferenceFactor for each Name. (speedy efficient answers preferred).

What indexes are required on the table? Does it make a difference if Name and Candy are integer indexes into another table (aside from requiring some joins)?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(11

牵你的手,一向走下去 2024-08-01 11:23:45

您会发现以下查询优于给出的所有其他答案,因为它适用于单次扫描。 这模拟了 MS Access 的 First 和 Last 聚合函数,这基本上就是您正在做的事情。

当然,您的 CandyPreference 表中可能会使用外键而不是名称。 为了回答您的问题,实际上最好将 Candy 和 Name 作为另一个表的外键。

如果 CandyPreferences 表中还有其他列,那么拥有包含相关列的覆盖索引将产生更好的性能。 使列尽可能小将增加每页的行数并再次提高性能。 如果您最常使用 WHERE 条件执行查询来限制行,那么覆盖 WHERE 条件的索引就变得很重要。

Peter 的做法是正确的,但存在一些不必要的复杂性。

CREATE TABLE #CandyPreference (
   [Name] varchar(20),
   Candy varchar(30),
   PreferenceFactor decimal(11, 10)
)
INSERT #CandyPreference VALUES ('Jim', 'Chocolate', 1.0)
INSERT #CandyPreference VALUES ('Brad', 'Lemon Drop', .9)
INSERT #CandyPreference VALUES ('Brad', 'Chocolate', .1)
INSERT #CandyPreference VALUES ('Chris', 'Chocolate', .5)
INSERT #CandyPreference VALUES ('Chris', 'Candy Cane', .5)

SELECT
   [Name],
   Candy = Substring(PackedData, 13, 30),
   PreferenceFactor = Convert(decimal(11,10), Left(PackedData, 12))
FROM (
   SELECT
      [Name],
      PackedData = Max(Convert(char(12), PreferenceFactor) + Candy)
   FROM CandyPreference
   GROUP BY [Name]
) X

DROP TABLE #CandyPreference

实际上我不推荐这种方法,除非性能至关重要。 “规范”的方法是 OrbMan 的标准 Max/GROUP BY 派生表,然后连接到它以获取选定的行。 但是,当有多个列参与 Max 的选择,并且选择器的最终组合可以重复时,即当没有列提供任意唯一性时(如这里的情况),该方法开始变得困难如果 PreferenceFactor 相同,我们就使用该名称。

编辑:最好提供更多使用说明,以帮助提高清晰度并帮助人们避免问题。

  • 作为一般经验法则,当尝试提高查询性能时,如果可以节省 I/O,您可以进行大量额外的数学计算。 保存整个表查找或扫描可以大大加快查询速度,即使使用所有转换和子字符串等也是如此。
  • 由于精度和排序问题,使用此方法使用浮点数据类型可能不是一个好主意。 不过,除非您处理的是非常大或非常小的数字,否则无论如何您都不应该在数据库中使用浮点型。
  • 最好的数据类型是那些在转换为二进制或字符后不打包并按相同顺序排序的数据类型。 Datetime、smalldatetime、bigint、int、smallint 和tinyint 都直接转换为二进制并正确排序,因为它们没有打包。 对于二进制,避免 left() 和 right(),使用 substring() 使值可靠地返回到原始值。
  • 我利用了在此查询中小数点前面只有一位数字的 Preference,允许直接转换为 char,因为小数点之前总是至少有一个 0。 如果可能有更多数字,则必须对转换后的数字进行小数对齐,以便正确排序。 最简单的方法可能是乘以您的偏好评级,这样就没有小数部分,转换为 bigint,然后转换为二进制(8)。 一般来说,数字之间的转换比 char 和其他数据类型之间的转换要快,尤其是对于日期数学。
  • 注意空值。 如果有的话,你必须将它们转换成某种东西然后再转换回来。

You will find that the following query outperforms every other answer given, as it works with a single scan. This simulates MS Access's First and Last aggregate functions, which is basically what you are doing.

Of course, you'll probably have foreign keys instead of names in your CandyPreference table. To answer your question, it is in fact very much best if Candy and Name are foreign keys into another table.

If there are other columns in the CandyPreferences table, then having a covering index that includes the involved columns will yield even better performance. Making the columns as small as possible will increase the rows per page and again increase performance. If you are most often doing the query with a WHERE condition to restrict rows, then an index that covers the WHERE conditions becomes important.

Peter was on the right track for this, but had some unneeded complexity.

CREATE TABLE #CandyPreference (
   [Name] varchar(20),
   Candy varchar(30),
   PreferenceFactor decimal(11, 10)
)
INSERT #CandyPreference VALUES ('Jim', 'Chocolate', 1.0)
INSERT #CandyPreference VALUES ('Brad', 'Lemon Drop', .9)
INSERT #CandyPreference VALUES ('Brad', 'Chocolate', .1)
INSERT #CandyPreference VALUES ('Chris', 'Chocolate', .5)
INSERT #CandyPreference VALUES ('Chris', 'Candy Cane', .5)

SELECT
   [Name],
   Candy = Substring(PackedData, 13, 30),
   PreferenceFactor = Convert(decimal(11,10), Left(PackedData, 12))
FROM (
   SELECT
      [Name],
      PackedData = Max(Convert(char(12), PreferenceFactor) + Candy)
   FROM CandyPreference
   GROUP BY [Name]
) X

DROP TABLE #CandyPreference

I actually don't recommend this method unless performance is critical. The "canonical" way to do it is OrbMan's standard Max/GROUP BY derived table and then a join to it to get the selected row. Though, that method starts to become difficult when there are several columns that participate in the selection of the Max, and the final combination of selectors can be duplicated, that is, when there is no column to provide arbitrary uniqueness as in the case here where we use the name if the PreferenceFactor is the same.

Edit: It's probably best to give some more usage notes to help improve clarity and to help people avoid problems.

  • As a general rule of thumb, when trying to improve query performance, you can do a LOT of extra math if it will save you I/O. Saving an entire table seek or scan speeds up the query substantially, even with all the converts and substrings and so on.
  • Due to precision and sorting issues, use of a floating point data type is probably a bad idea with this method. Though unless you are dealing with extremely large or small numbers, you shouldn't be using float in your database anyway.
  • The best data types are those that are not packed and sort in the same order after conversion to binary or char. Datetime, smalldatetime, bigint, int, smallint, and tinyint all convert directly to binary and sort correctly because they are not packed. With binary, avoid left() and right(), use substring() to get the values reliably returned to their originals.
  • I took advantage of Preference having only one digit in front of the decimal point in this query, allowing conversion straight to char since there is always at least a 0 before the decimal point. If more digits are possible, you would have to decimal-align the converted number so things sort correctly. Easiest might be to multiply your Preference rating so there is no decimal portion, convert to bigint, and then convert to binary(8). In general, conversion between numbers is faster than conversion between char and another data type, especially with date math.
  • Watch out for nulls. If there are any, you must convert them to something and then back.
半夏半凉 2024-08-01 11:23:45
select c.Name, max(c.Candy) as Candy, max(c.PreferenceFactor) as PreferenceFactor
from Candy c
inner join (
    select Name, max(PreferenceFactor) as MaxPreferenceFactor
    from Candy
    group by Name
) cm on c.Name = cm.Name and c.PreferenceFactor = cm.MaxPreferenceFactor
group by c.Name
order by PreferenceFactor desc, Name
select c.Name, max(c.Candy) as Candy, max(c.PreferenceFactor) as PreferenceFactor
from Candy c
inner join (
    select Name, max(PreferenceFactor) as MaxPreferenceFactor
    from Candy
    group by Name
) cm on c.Name = cm.Name and c.PreferenceFactor = cm.MaxPreferenceFactor
group by c.Name
order by PreferenceFactor desc, Name
染年凉城似染瑾 2024-08-01 11:23:45

我尝试过:

SELECT X.PersonName,
    (
        SELECT TOP 1 Candy
        FROM CandyPreferences
        WHERE PersonName=X.PersonName AND PreferenceFactor=x.HighestPreference
    ) AS TopCandy
FROM 
(
    SELECT PersonName, MAX(PreferenceFactor) AS HighestPreference
    FROM CandyPreferences
    GROUP BY PersonName
) AS X

这似乎有效,尽管没有真实数据和实际负载,我无法谈论效率。

不过,我确实在 PersonName 和 Candy 上创建了主键。 使用 SQL Server 2008 并且没有附加索引表明它使用了两个聚集索引扫描,因此情况可能会更糟。


我更多地使用了这个,因为我需要一个借口来使用“datadude”的数据生成计划功能。 首先,我重构了一个表,使其具有单独的糖果名称和人名表。 我这样做主要是因为它允许我使用测试数据生成而无需阅读文档。 架构变为:

CREATE TABLE [Candies](
    [CandyID] [int] IDENTITY(1,1) NOT NULL,
    [Candy] [nvarchar](50) NOT NULL,
 CONSTRAINT [PK_Candies] PRIMARY KEY CLUSTERED 
(
    [CandyID] ASC
),
 CONSTRAINT [UC_Candies] UNIQUE NONCLUSTERED 
(
    [Candy] ASC
)
)
GO

CREATE TABLE [Persons](
    [PersonID] [int] IDENTITY(1,1) NOT NULL,
    [PersonName] [nvarchar](100) NOT NULL,
 CONSTRAINT [PK_Preferences.Persons] PRIMARY KEY CLUSTERED 
(
    [PersonID] ASC
)
)
GO

CREATE TABLE [CandyPreferences](
    [PersonID] [int] NOT NULL,
    [CandyID] [int] NOT NULL,
    [PrefernceFactor] [real] NOT NULL,
 CONSTRAINT [PK_CandyPreferences] PRIMARY KEY CLUSTERED 
(
    [PersonID] ASC,
    [CandyID] ASC
)
)
GO

ALTER TABLE [CandyPreferences]  
WITH CHECK ADD  CONSTRAINT [FK_CandyPreferences_Candies] FOREIGN KEY([CandyID])
REFERENCES [Candies] ([CandyID])
GO

ALTER TABLE [CandyPreferences] 
CHECK CONSTRAINT [FK_CandyPreferences_Candies]
GO

ALTER TABLE [CandyPreferences]  
WITH CHECK ADD  CONSTRAINT [FK_CandyPreferences_Persons] FOREIGN KEY([PersonID])
REFERENCES [Persons] ([PersonID])
GO

ALTER TABLE [CandyPreferences] 
CHECK CONSTRAINT [FK_CandyPreferences_Persons]
GO

查询变为:

SELECT P.PersonName, C.Candy
FROM (
    SELECT X.PersonID,
        (
            SELECT TOP 1 CandyID
            FROM CandyPreferences
            WHERE PersonID=X.PersonID AND PrefernceFactor=x.HighestPreference
        ) AS TopCandy
    FROM 
    (
        SELECT PersonID, MAX(PrefernceFactor) AS HighestPreference
        FROM CandyPreferences
        GROUP BY PersonID
    ) AS X
) AS Y
INNER JOIN Persons P ON Y.PersonID = P.PersonID
INNER JOIN Candies C ON Y.TopCandy = C.CandyID

对于 150,000 颗糖果、200,000 个人和 500,000 个 CandyPreferences,查询花费了大约 12 秒并生成了 200,000 行。


下面的结果让我很惊讶。 我更改了查询以删除最终的“漂亮”联接:

SELECT X.PersonID,
    (
        SELECT TOP 1 CandyID
        FROM CandyPreferences
        WHERE PersonID=X.PersonID AND PrefernceFactor=x.HighestPreference
    ) AS TopCandy
FROM 
(
    SELECT PersonID, MAX(PrefernceFactor) AS HighestPreference
    FROM CandyPreferences
    GROUP BY PersonID
) AS X

现在,对于 200,000 行,这需要两到三秒的时间。

现在,需要明确的是,我在这里所做的一切都不是为了提高此查询的性能:我认为 12 秒是成功的。 它现在表示 90% 的时间都花在聚集索引搜索上。

I tried:

SELECT X.PersonName,
    (
        SELECT TOP 1 Candy
        FROM CandyPreferences
        WHERE PersonName=X.PersonName AND PreferenceFactor=x.HighestPreference
    ) AS TopCandy
FROM 
(
    SELECT PersonName, MAX(PreferenceFactor) AS HighestPreference
    FROM CandyPreferences
    GROUP BY PersonName
) AS X

This seems to work, though I can't speak to efficiency without real data and a realistic load.

I did create a primary key over PersonName and Candy, though. Using SQL Server 2008 and no additional indexes shows it using two clustered index scans though, so it could be worse.


I played with this a bit more because I needed an excuse to play with the Data Generation Plan capability of "datadude". First, I refactored the one table to have separate tables for candy names and person names. I did this mostly because it allowed me to use the test data generation without having to read the documentation. The schema became:

CREATE TABLE [Candies](
    [CandyID] [int] IDENTITY(1,1) NOT NULL,
    [Candy] [nvarchar](50) NOT NULL,
 CONSTRAINT [PK_Candies] PRIMARY KEY CLUSTERED 
(
    [CandyID] ASC
),
 CONSTRAINT [UC_Candies] UNIQUE NONCLUSTERED 
(
    [Candy] ASC
)
)
GO

CREATE TABLE [Persons](
    [PersonID] [int] IDENTITY(1,1) NOT NULL,
    [PersonName] [nvarchar](100) NOT NULL,
 CONSTRAINT [PK_Preferences.Persons] PRIMARY KEY CLUSTERED 
(
    [PersonID] ASC
)
)
GO

CREATE TABLE [CandyPreferences](
    [PersonID] [int] NOT NULL,
    [CandyID] [int] NOT NULL,
    [PrefernceFactor] [real] NOT NULL,
 CONSTRAINT [PK_CandyPreferences] PRIMARY KEY CLUSTERED 
(
    [PersonID] ASC,
    [CandyID] ASC
)
)
GO

ALTER TABLE [CandyPreferences]  
WITH CHECK ADD  CONSTRAINT [FK_CandyPreferences_Candies] FOREIGN KEY([CandyID])
REFERENCES [Candies] ([CandyID])
GO

ALTER TABLE [CandyPreferences] 
CHECK CONSTRAINT [FK_CandyPreferences_Candies]
GO

ALTER TABLE [CandyPreferences]  
WITH CHECK ADD  CONSTRAINT [FK_CandyPreferences_Persons] FOREIGN KEY([PersonID])
REFERENCES [Persons] ([PersonID])
GO

ALTER TABLE [CandyPreferences] 
CHECK CONSTRAINT [FK_CandyPreferences_Persons]
GO

The query became:

SELECT P.PersonName, C.Candy
FROM (
    SELECT X.PersonID,
        (
            SELECT TOP 1 CandyID
            FROM CandyPreferences
            WHERE PersonID=X.PersonID AND PrefernceFactor=x.HighestPreference
        ) AS TopCandy
    FROM 
    (
        SELECT PersonID, MAX(PrefernceFactor) AS HighestPreference
        FROM CandyPreferences
        GROUP BY PersonID
    ) AS X
) AS Y
INNER JOIN Persons P ON Y.PersonID = P.PersonID
INNER JOIN Candies C ON Y.TopCandy = C.CandyID

With 150,000 candies, 200,000 persons, and 500,000 CandyPreferences, the query took about 12 seconds and produced 200,000 rows.


The following result surprised me. I changed the query to remove the final "pretty" joins:

SELECT X.PersonID,
    (
        SELECT TOP 1 CandyID
        FROM CandyPreferences
        WHERE PersonID=X.PersonID AND PrefernceFactor=x.HighestPreference
    ) AS TopCandy
FROM 
(
    SELECT PersonID, MAX(PrefernceFactor) AS HighestPreference
    FROM CandyPreferences
    GROUP BY PersonID
) AS X

This now takes two or three seconds for 200,000 rows.

Now, to be clear, nothing I've done here has been meant to improve the performance of this query: I considered 12 seconds to be a success. It now says it spends 90% of its time in a clustered index seek.

话少情深 2024-08-01 11:23:45

对 Emtucifor 解决方案发表评论(因为我无法定期发表评论)

我喜欢这个解决方案,但有一些关于如何改进它的评论(在这种特定情况下)。

如果将所有内容都放在一张表中,则效果不大,但像 John Saunders 的解决方案中那样,只有很少的表,情况就会有所不同。

当我们处理 [CandyPreferences] 表中的数字时,我们可以使用数学运算而不是串联来获取最大值。

我建议 PreferenceFactor 使用十进制而不是实数,因为我相信我们在这里不需要实际数据类型的大小,甚至进一步我建议十进制(n,n),其中n<10仅将小数部分存储在5个字节中。 假设十进制(3,3)就足够了(1000级偏好因子),我们可以做简单的

PackedData = Max(PreferenceFactor + CandyID)

此外,如果我们知道我们有少于 1,000,000 个 CandyID,我们可以添加转换为:

PackedData = Max( Cast(PreferenceFactor + CandyID asdecimal(9,3)))

允许 sql server 在临时表中使用 5 个字节

使用底函数拆包非常简单快捷。

Niikola

-- 后来添加 ---

我测试了 John 和 Emtucifor 的两种解决方案(修改为使用 John 的结构并使用我的建议)。 我还测试了有连接和没有连接的情况。

Emtucifor 的解决方案显然获胜,但利润并不大。 如果 SQL Server 必须执行一些物理读取,情况可能会有所不同,但在所有情况下它们都是 0。

以下是查询:

    SELECT
   [PersonID],
   CandyID = Floor(PackedData),
   PreferenceFactor = Cast(PackedData-Floor(PackedData) as decimal(3,3))
FROM (
   SELECT
      [PersonID],
      PackedData = Max(Cast([PrefernceFactor] + [CandyID] as decimal(9,3)))
   FROM [z5CandyPreferences] With (NoLock)
   GROUP BY [PersonID]
) X

SELECT X.PersonID,
        (
                SELECT TOP 1 CandyID
                FROM z5CandyPreferences
                WHERE PersonID=X.PersonID AND PrefernceFactor=x.HighestPreference
        ) AS TopCandy,
                    HighestPreference as PreferenceFactor
FROM 
(
        SELECT PersonID, MAX(PrefernceFactor) AS HighestPreference
        FROM z5CandyPreferences
        GROUP BY PersonID
) AS X


Select p.PersonName,
       c.Candy,
       y.PreferenceFactor
  From z5Persons p
 Inner Join (SELECT [PersonID],
                    CandyID = Floor(PackedData),
                    PreferenceFactor = Cast(PackedData-Floor(PackedData) as decimal(3,3))
                    FROM ( SELECT [PersonID],
                                  PackedData = Max(Cast([PrefernceFactor] + [CandyID] as decimal(9,3)))
                             FROM [z5CandyPreferences] With (NoLock)
                            GROUP BY [PersonID]
                         ) X
            ) Y on p.PersonId = Y.PersonId
 Inner Join z5Candies c on c.CandyId=Y.CandyId

Select p.PersonName,
       c.Candy,
       y.PreferenceFactor
  From z5Persons p
 Inner Join (SELECT X.PersonID,
                    ( SELECT TOP 1 cp.CandyId
                        FROM z5CandyPreferences cp
                       WHERE PersonID=X.PersonID AND cp.[PrefernceFactor]=X.HighestPreference
                    ) CandyId,
                    HighestPreference as PreferenceFactor
               FROM ( SELECT PersonID, 
                             MAX(PrefernceFactor) AS HighestPreference
                        FROM z5CandyPreferences
                       GROUP BY PersonID
                    ) AS X
            ) AS Y on p.PersonId = Y.PersonId
 Inner Join z5Candies as c on c.CandyID=Y.CandyId

以及结果:

 TableName          nRows
 ------------------ -------
 z5Persons          200,000
 z5Candies          150,000
 z5CandyPreferences 497,445


Query                       Rows Affected CPU time Elapsed time
--------------------------- ------------- -------- ------------
Emtucifor     (no joins)          183,289   531 ms     3,122 ms
John Saunders (no joins)          183,289 1,266 ms     2,918 ms
Emtucifor     (with joins)        183,289 1,031 ms     3,990 ms
John Saunders (with joins)        183,289 2,406 ms     4,343 ms


Emtucifor (no joins)
--------------------------------------------
Table               Scan count logical reads
------------------- ---------- -------------
z5CandyPreferences           1         2,022 


John Saunders (no joins)
--------------------------------------------
Table               Scan count logical reads
------------------- ---------- -------------
z5CandyPreferences     183,290       587,677

Emtucifor (with joins)
--------------------------------------------
Table               Scan count logical reads
------------------- ---------- -------------
Worktable                    0             0
z5Candies                    1           526
z5CandyPreferences           1         2,022
z5Persons                    1           733

John Saunders (with joins) 
--------------------------------------------
Table               Scan count logical reads
------------------- ---------- -------------
z5CandyPreferences      183292       587,912
z5Persons                    3           802
Worktable                    0             0
z5Candies                    3           559
Worktable                    0             0

Comment on Emtucifor solution (as I cant make regular comments)

I like this solution, but have some comments how it could be improved (in this specific case).

It can't be done much if you have everything in one table, but having few tables as in John Saunders' solution will make things a bit different.

As we are dealing with numbers in [CandyPreferences] table we can use math operation instead of concatenation to get max value.

I suggest PreferenceFactor to be decimal instead of real, as I believe we don't need here size of real data type, and even further I would suggest decimal(n,n) where n<10 to have only decimal part stored in 5 bytes. Assume decimal(3,3) is enough (1000 levels of preference factor), we can do simple

PackedData = Max(PreferenceFactor + CandyID)

Further, if we know we have less than 1,000,000 CandyIDs we can add cast as:

PackedData = Max(Cast(PreferenceFactor + CandyID as decimal(9,3)))

allowing sql server to use 5 bytes in temporary table

Unpacking is easy and fast using floor function.

Niikola

-- ADDED LATER ---

I tested both solutions, John's and Emtucifor's (modified to use John's structure and using my suggestions). I tested also with and without joins.

Emtucifor's solution clearly wins, but margins are not huge. It could be different if SQL server had to perform some Physical reads, but they were 0 in all cases.

Here are the queries:

    SELECT
   [PersonID],
   CandyID = Floor(PackedData),
   PreferenceFactor = Cast(PackedData-Floor(PackedData) as decimal(3,3))
FROM (
   SELECT
      [PersonID],
      PackedData = Max(Cast([PrefernceFactor] + [CandyID] as decimal(9,3)))
   FROM [z5CandyPreferences] With (NoLock)
   GROUP BY [PersonID]
) X

SELECT X.PersonID,
        (
                SELECT TOP 1 CandyID
                FROM z5CandyPreferences
                WHERE PersonID=X.PersonID AND PrefernceFactor=x.HighestPreference
        ) AS TopCandy,
                    HighestPreference as PreferenceFactor
FROM 
(
        SELECT PersonID, MAX(PrefernceFactor) AS HighestPreference
        FROM z5CandyPreferences
        GROUP BY PersonID
) AS X


Select p.PersonName,
       c.Candy,
       y.PreferenceFactor
  From z5Persons p
 Inner Join (SELECT [PersonID],
                    CandyID = Floor(PackedData),
                    PreferenceFactor = Cast(PackedData-Floor(PackedData) as decimal(3,3))
                    FROM ( SELECT [PersonID],
                                  PackedData = Max(Cast([PrefernceFactor] + [CandyID] as decimal(9,3)))
                             FROM [z5CandyPreferences] With (NoLock)
                            GROUP BY [PersonID]
                         ) X
            ) Y on p.PersonId = Y.PersonId
 Inner Join z5Candies c on c.CandyId=Y.CandyId

Select p.PersonName,
       c.Candy,
       y.PreferenceFactor
  From z5Persons p
 Inner Join (SELECT X.PersonID,
                    ( SELECT TOP 1 cp.CandyId
                        FROM z5CandyPreferences cp
                       WHERE PersonID=X.PersonID AND cp.[PrefernceFactor]=X.HighestPreference
                    ) CandyId,
                    HighestPreference as PreferenceFactor
               FROM ( SELECT PersonID, 
                             MAX(PrefernceFactor) AS HighestPreference
                        FROM z5CandyPreferences
                       GROUP BY PersonID
                    ) AS X
            ) AS Y on p.PersonId = Y.PersonId
 Inner Join z5Candies as c on c.CandyID=Y.CandyId

And the results:

 TableName          nRows
 ------------------ -------
 z5Persons          200,000
 z5Candies          150,000
 z5CandyPreferences 497,445


Query                       Rows Affected CPU time Elapsed time
--------------------------- ------------- -------- ------------
Emtucifor     (no joins)          183,289   531 ms     3,122 ms
John Saunders (no joins)          183,289 1,266 ms     2,918 ms
Emtucifor     (with joins)        183,289 1,031 ms     3,990 ms
John Saunders (with joins)        183,289 2,406 ms     4,343 ms


Emtucifor (no joins)
--------------------------------------------
Table               Scan count logical reads
------------------- ---------- -------------
z5CandyPreferences           1         2,022 


John Saunders (no joins)
--------------------------------------------
Table               Scan count logical reads
------------------- ---------- -------------
z5CandyPreferences     183,290       587,677

Emtucifor (with joins)
--------------------------------------------
Table               Scan count logical reads
------------------- ---------- -------------
Worktable                    0             0
z5Candies                    1           526
z5CandyPreferences           1         2,022
z5Persons                    1           733

John Saunders (with joins) 
--------------------------------------------
Table               Scan count logical reads
------------------- ---------- -------------
z5CandyPreferences      183292       587,912
z5Persons                    3           802
Worktable                    0             0
z5Candies                    3           559
Worktable                    0             0
东京女 2024-08-01 11:23:45

您可以使用以下 select 语句

select Name,Candy,PreferenceFactor
from candyTable ct 
where PreferenceFactor = 
    (select max(PreferenceFactor) 
     from candyTable where ct.Name = Name)

,但通过此选择,您将在结果集中获得“Chris”2 次。

如果您想获得用户最喜欢的食物而不是使用

select top 1 Name,Candy,PreferenceFactor
from candyTable ct
where name = @name
and PreferenceFactor= 
    (select max([PreferenceFactor]) 
     from candyTable where name = @name )

我认为将名称和糖果更改为整数类型可能会帮助您提高性能。 您还应该在两列上插入索引。

[编辑]已更改! 到 @

you could use following select statements

select Name,Candy,PreferenceFactor
from candyTable ct 
where PreferenceFactor = 
    (select max(PreferenceFactor) 
     from candyTable where ct.Name = Name)

but with this select you will get "Chris" 2 times in your result set.

if you want to get the the most preferred food by user than use

select top 1 Name,Candy,PreferenceFactor
from candyTable ct
where name = @name
and PreferenceFactor= 
    (select max([PreferenceFactor]) 
     from candyTable where name = @name )

i think changing the name and candy to integer types might help you improve performance. you also should insert indexes on both columns.

[Edit] changed ! to @

嗼ふ静 2024-08-01 11:23:45
SELECT Name, Candy, PreferenceFactor
  FROM table AS a
 WHERE NOT EXISTS(SELECT * FROM table AS b
                   WHERE b.Name = a.Name
                     AND (b.PreferenceFactor > a.PreferenceFactor OR (b.PreferenceFactor = a.PreferenceFactor AND b.Candy > a.Candy))
SELECT Name, Candy, PreferenceFactor
  FROM table AS a
 WHERE NOT EXISTS(SELECT * FROM table AS b
                   WHERE b.Name = a.Name
                     AND (b.PreferenceFactor > a.PreferenceFactor OR (b.PreferenceFactor = a.PreferenceFactor AND b.Candy > a.Candy))
小傻瓜 2024-08-01 11:23:45
select name, candy, max(preference)
from tablename
where candy=@candy
order by name, candy

通常需要对经常包含在 where 子句中的列建立索引。 在这种情况下,我想说名称和糖果列的索引将具有最高优先级。

列的查找表通常取决于列中重复值的数量。 在 250,000 行中,如果只有 50 个重复值,则确实需要在那里有整数引用(外键)。 在这种情况下,应该进行糖果引用,而名称引用实际上取决于数据库中不同人员的数量。

select name, candy, max(preference)
from tablename
where candy=@candy
order by name, candy

usually indexing is required on columns which are frequently included in where clause. In this case I would say indexing on name and candy columns would be of highest priority.

Having lookup tables for columns usually depends on number of repeating values with in columns. Out of 250,000 rows, if there are only 50 values that are repeating, you really need to have integer reference (foreign key) there. In this case, candy reference should be done and name reference really depends on the number of distinct people within the database.

蒗幽 2024-08-01 11:23:45

我将您的列名称更改为 PersonName 以避免任何常见的保留字冲突。

SELECT     PersonName, MAX(Candy) AS PreferredCandy, MAX(PreferenceFactor) AS Factor
FROM         CandyPreference
GROUP BY PersonName
ORDER BY Factor DESC

I changed your column Name to PersonName to avoid any common reserved word conflicts.

SELECT     PersonName, MAX(Candy) AS PreferredCandy, MAX(PreferenceFactor) AS Factor
FROM         CandyPreference
GROUP BY PersonName
ORDER BY Factor DESC
提赋 2024-08-01 11:23:45
SELECT d.Name, a.Candy, d.MaxPref
FROM myTable a, (SELECT Name, MAX(PreferenceFactor) AS MaxPref FROM myTable) as D
WHERE a.Name = d.Name AND a.PreferenceFactor = d.MaxPref

这将为您提供与给定名称匹配的 PrefFactor 的行。
(例如,如果约翰对于柠檬和巧克力的 HighPref 为 1)。

请原谅我的回答,因为我是在没有 SQL 查询分析器的情况下编写的。

SELECT d.Name, a.Candy, d.MaxPref
FROM myTable a, (SELECT Name, MAX(PreferenceFactor) AS MaxPref FROM myTable) as D
WHERE a.Name = d.Name AND a.PreferenceFactor = d.MaxPref

This should give you rows with matching PrefFactor for a given Name.
(e.g. if John as a HighPref of 1 for Lemon & Chocolate).

Pardon my answer as I am writing it without SQL Query Analyzer.

酒儿 2024-08-01 11:23:45

像这样的事情会起作用:

select name
, candy  = substring(preference,7,len(preference))
  -- convert back to float/numeric
, factor = convert(float,substring(preference,1,5))/10
from (
  select name, 
    preference = (
      select top 1 
           -- convert from float/numeric to zero-padded fixed-width string
           right('00000'+convert(varchar,convert(decimal(5,0),preferencefactor*10)),5)
         + ';' + candy
       from candyTable b
       where a.name = b.name
       order by 
         preferencefactor desc
       , candy
       )
  from (select distinct name from candyTable) a
  ) a

性能应该与方法一样好。 检查您的查询计划。

相关子查询中的 TOP 1 ... ORDER BY 允许我们为外部查询中每行返回的行指定任意规则。 在这种情况下,我们希望每个名字的偏好系数最高,并在决胜局中提供糖果。

子查询只能返回一个值,所以我们必须将糖果和偏好因子合并到一个字段中。 这里分号只是为了提高可读性,但在其他情况下,您可以使用它来解析外部查询中与 CHARINDEX 的组合字段。

如果您想要输出完全精确,您可以使用它(假设preferencefactor是一个浮点数):

convert(varchar,preferencefactor) + ';' + candy

然后用以下命令解析它:

factor = convert(float,substring(preference,1,charindex(';',preference)-1))
candy = substring(preference,charindex(';',preference)+1,len(preference))

Something like this would work:

select name
, candy  = substring(preference,7,len(preference))
  -- convert back to float/numeric
, factor = convert(float,substring(preference,1,5))/10
from (
  select name, 
    preference = (
      select top 1 
           -- convert from float/numeric to zero-padded fixed-width string
           right('00000'+convert(varchar,convert(decimal(5,0),preferencefactor*10)),5)
         + ';' + candy
       from candyTable b
       where a.name = b.name
       order by 
         preferencefactor desc
       , candy
       )
  from (select distinct name from candyTable) a
  ) a

Performance should be decent with with method. Check your query plan.

TOP 1 ... ORDER BY in a correlated subquery allows us to specify arbitrary rules for which row we want returned per row in the outer query. In this case, we want the highest preference factor per name, with candy for tie-breaks.

Subqueries can only return one value, so we must combine candy and preference factor into one field. The semicolon is just for readability here, but in other cases, you might use it to parse the combined field with CHARINDEX in the outer query.

If you wanted full precision in the output, you could use this instead (assuming preferencefactor is a float):

convert(varchar,preferencefactor) + ';' + candy

And then parse it back with:

factor = convert(float,substring(preference,1,charindex(';',preference)-1))
candy = substring(preference,charindex(';',preference)+1,len(preference))
若水微香 2024-08-01 11:23:45

我还测试了 ROW_NUMBER() 版本 + 添加了附加索引

创建索引 IX_z5CandyPreferences On z5CandyPreferences(PersonId,PrefernceFactor,CandyID)

Emtucifor 和 ROW_NUMBER() 版本(带有索引)之间的响应时间是边际的(如果有的话 - 测试应该重复几次次并取平均值,但我预计不会有任何显着差异)

这是查询:

Select p.PersonName,
       c.Candy,
       y.PrefernceFactor
  From z5Persons p
 Inner Join (Select * from (Select cp.PersonId,
       cp.CandyId,
       cp.PrefernceFactor,
       ROW_NUMBER() over (Partition by cp.PersonId Order by cp.PrefernceFactor, cp.CandyId ) as hp
  From z5CandyPreferences cp) X
   Where hp=1) Y on p.PersonId = Y.PersonId
 Inner Join z5Candies c on c.CandyId=Y.CandyId

以及带有和不带有新索引的结果:

                           |     Without index    |      With Index
                           ----------------------------------------------
Query (Aff.Rows 183,290)   |CPU time Elapsed time | CPU time Elapsed time
-------------------------- |-------- ------------ | -------- ------------
Emtucifor     (with joins) |1,031 ms     3,990 ms |   890 ms     3,758 ms
John Saunders (with joins) |2,406 ms     4,343 ms | 1,735 ms     3,414 ms
ROW_NUMBER()  (with joins) |2,094 ms     4,888 ms |   953 ms     3,900 ms.


Emtucifor (with joins)         Without index |              With Index
-----------------------------------------------------------------------
Table              |Scan count logical reads | Scan count logical reads
-------------------|---------- ------------- | ---------- -------------
Worktable          |         0             0 |          0             0
z5Candies          |         1           526 |          1           526
z5CandyPreferences |         1         2,022 |          1           990
z5Persons          |         1           733 |          1           733

John Saunders (with joins)     Without index |              With Index
-----------------------------------------------------------------------
Table              |Scan count logical reads | Scan count logical reads
-------------------|---------- ------------- | ---------- -------------
z5CandyPreferences |    183292       587,912 |    183,290       585,570
z5Persons          |         3           802 |          1           733
Worktable          |         0             0 |          0             0
z5Candies          |         3           559 |          1           526
Worktable          |         0             0 |          -             -


ROW_NUMBER() (with joins)      Without index |              With Index 
-----------------------------------------------------------------------
Table              |Scan count logical reads | Scan count logical reads
-------------------|---------- ------------- | ---------- -------------
z5CandyPreferences |         3          2233 |          1           990
z5Persons          |         3           802 |          1           733
z5Candies          |         3           559 |          1           526
Worktable          |         0             0 |          0             0

I tested also ROW_NUMBER() version + added additional index

Create index IX_z5CandyPreferences On z5CandyPreferences(PersonId,PrefernceFactor,CandyID)

Response times between Emtucifor's and ROW_NUMBER() version (with index in place) are marginal (if any - test should be repeated number of times and take averages, but I expect there would not be any significant difference)

Here is query:

Select p.PersonName,
       c.Candy,
       y.PrefernceFactor
  From z5Persons p
 Inner Join (Select * from (Select cp.PersonId,
       cp.CandyId,
       cp.PrefernceFactor,
       ROW_NUMBER() over (Partition by cp.PersonId Order by cp.PrefernceFactor, cp.CandyId ) as hp
  From z5CandyPreferences cp) X
   Where hp=1) Y on p.PersonId = Y.PersonId
 Inner Join z5Candies c on c.CandyId=Y.CandyId

and results with and without new index:

                           |     Without index    |      With Index
                           ----------------------------------------------
Query (Aff.Rows 183,290)   |CPU time Elapsed time | CPU time Elapsed time
-------------------------- |-------- ------------ | -------- ------------
Emtucifor     (with joins) |1,031 ms     3,990 ms |   890 ms     3,758 ms
John Saunders (with joins) |2,406 ms     4,343 ms | 1,735 ms     3,414 ms
ROW_NUMBER()  (with joins) |2,094 ms     4,888 ms |   953 ms     3,900 ms.


Emtucifor (with joins)         Without index |              With Index
-----------------------------------------------------------------------
Table              |Scan count logical reads | Scan count logical reads
-------------------|---------- ------------- | ---------- -------------
Worktable          |         0             0 |          0             0
z5Candies          |         1           526 |          1           526
z5CandyPreferences |         1         2,022 |          1           990
z5Persons          |         1           733 |          1           733

John Saunders (with joins)     Without index |              With Index
-----------------------------------------------------------------------
Table              |Scan count logical reads | Scan count logical reads
-------------------|---------- ------------- | ---------- -------------
z5CandyPreferences |    183292       587,912 |    183,290       585,570
z5Persons          |         3           802 |          1           733
Worktable          |         0             0 |          0             0
z5Candies          |         3           559 |          1           526
Worktable          |         0             0 |          -             -


ROW_NUMBER() (with joins)      Without index |              With Index 
-----------------------------------------------------------------------
Table              |Scan count logical reads | Scan count logical reads
-------------------|---------- ------------- | ---------- -------------
z5CandyPreferences |         3          2233 |          1           990
z5Persons          |         3           802 |          1           733
z5Candies          |         3           559 |          1           526
Worktable          |         0             0 |          0             0
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文