在 Sybase Ase 中优化 SQL 透视表的最佳方法
我正在寻找优化以下从历史表创建透视结果集的 SQL 语句。这可能已经是最有效的方法,但我一直认为必须有一种更有效的方法来执行此操作。
我试图优化
select Col1, Col2,
Max(case when TypeId = 1 then ColValue end) as Pivot1,
Max(case when TypeId = 2 then ColValue end) as Pivot2,
Max(case when TypeId = 3 then ColValue end) as Pivot3,
Max(case when TypeId = 4 then ColValue end) as Pivot4,
Max(case when TypeId = 5 then ColValue end) as Pivot5,
Max(case when TypeId = 6 then ColValue end) as Pivot6,
Max(case when TypeId = 7 then ColValue end) as Pivot7,
Max(case when TypeId = 8 then ColValue end) as Pivot8,
Max(case when TypeId = 9 then ColValue end) as Pivot9,
Max(case when TypeId = 10 then ColValue end) as Pivot10,
Max(case when TypeId = 11 then ColValue end) as Pivot11
from RowTable
group by Col1, Col2
UPDATE 的 SQL 语句:下面是表定义
CREATE TABLE dbo.RowTable (
Id int NOT NULL,
Col1 char(8) NOT NULL,
Col2 tinyint NOT NULL,
TypeId int NOT NULL,
ColValue datetime NOT NULL,
CreatedBy varchar(50) NOT NULL,
Rowstamp timestamp NOT NULL
)
LOCK DATAROWS
GO
ALTER TABLE dbo.RowTable
ADD CONSTRAINT ukRowTable
UNIQUE (Col1, Col2, TypeId)
WITH max_rows_per_page = 0, reservepagegap = 0
I am looking to optimize the following SQL statement that is creating a Pivoted Result set from a Historical Table. This may already be the most per-formant way of doing it but I keep thinking that there has to be a more per-formant way of doing this.
SQL Statement that I am trying to optimize
select Col1, Col2,
Max(case when TypeId = 1 then ColValue end) as Pivot1,
Max(case when TypeId = 2 then ColValue end) as Pivot2,
Max(case when TypeId = 3 then ColValue end) as Pivot3,
Max(case when TypeId = 4 then ColValue end) as Pivot4,
Max(case when TypeId = 5 then ColValue end) as Pivot5,
Max(case when TypeId = 6 then ColValue end) as Pivot6,
Max(case when TypeId = 7 then ColValue end) as Pivot7,
Max(case when TypeId = 8 then ColValue end) as Pivot8,
Max(case when TypeId = 9 then ColValue end) as Pivot9,
Max(case when TypeId = 10 then ColValue end) as Pivot10,
Max(case when TypeId = 11 then ColValue end) as Pivot11
from RowTable
group by Col1, Col2
UPDATE: Below is the table definition
CREATE TABLE dbo.RowTable (
Id int NOT NULL,
Col1 char(8) NOT NULL,
Col2 tinyint NOT NULL,
TypeId int NOT NULL,
ColValue datetime NOT NULL,
CreatedBy varchar(50) NOT NULL,
Rowstamp timestamp NOT NULL
)
LOCK DATAROWS
GO
ALTER TABLE dbo.RowTable
ADD CONSTRAINT ukRowTable
UNIQUE (Col1, Col2, TypeId)
WITH max_rows_per_page = 0, reservepagegap = 0
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
对原始问题的答复
1。表的性能按原样。好吧,在任何人都可以从性能方面评估该代码之前,我们需要创建表语句,包括索引。
更高级别的性能。
透视是一种以行、列的形式表达可用数据的功能。如果数据库(表)标准化为 3NF 或 5NF(面向行),那么对行对象执行列函数将会很慢。与产品无关。如果您希望快速进行列式访问(对于透视或任何其他列式功能),则需要 6NF 中的数据。这也恰好使得任务所需的 SQL 更加简单。
如果您的数据建模者准备了用于透视的表(通常是数据仓库类型使用;维度事实结构),那么它可能不是真正的 6NF,但至少它会比 5NF 更好,并且更容易提取透视值。当我看到 DDL 时,我将能够确定它是什么(真正的 6NF;比 5NF 更好,但不是 6NF)。然后我可以确定您是否使用最好的代码来获得您需要的内容。
只有当表不在 6NF 中时,它才会很慢或“昂贵”。
在这个阶段,从您的代码来看,它甚至看起来不像一个数据透视(使用该术语的标准含义),它看起来像是各种值的
MAX()
(调用结果)列Pivotx
不会使其成为数据透视);并且您正在读取每一行一次。也就是说,你有一种程序性思维模式,而不是一种枢轴思维或面向集合的思维模式。因此,代码很可能无法获得您需要的值(无论其性能是否良好,是一个单独的问题)。您使用
GROUP BY
确认了非过程集的过程方法,这会很慢(创建工作表;如果您的数据很大,这将是巨大的),并且通过维度可以更快地获得相同的信息。为什么不使用这个可透视表的维度表?发布与此表相关的所有维度表的 DDL,或发布数据模型。回复评论
我想帮助你,但有两个障碍。首先,互动间隔 19 天。其次,您发布的 SQL 将不起作用:对于每一行,它在 11 列中返回相同的
ColValue
;我无法弄清楚您使用。好的,需要MAX()
的目的MAX()
来击败GROUP BY
提交。因此,我仍然不知道你的意图是什么(而不是你编码的内容)。这种混淆是合理的,但在这里我们失去了意义。是的,有更快的方法,但我需要了解意图和父表(例如,您是否有一个
(Col1, Col2)
是 Unique 的表?如果它是数据库,那么表并不独立,它们是相关的,并且这些关系有一定的目的,我知道您不认为它们是相关的,但这种限制产生了您发布的代码;解决方案是。 >超出该限制。无论如何,为了避免进一步延迟,请尝试此代码,这只是一个猜测,对我来说似乎不正确,因为
(Col1, Col2, TypeId) 是唯一的;因此每个
Col1, Col2
结果行只有一组TypeId
(结果集中的列标题):也许你可以给我反馈。
对更新问题的回复
好的,现在我们有一个非标准化表。新的一组步骤。它是一个构造的结果集,使用返回标量的相关子查询。这不是行与列的重新排列;而是行与列的重新排列。它不是标准枢轴(因此提供的代码不是枢轴)。死容易。您可能希望更改问题标题,因为人们正在寻找真正的枢轴。是的,这会表现得更好(假设您的 DDL 是真实表的真实表示)。
需要明确的是,Pivot(即 MS SQL
PIVOT
函数)是一种不同的动物。我可以为非标准化数据库提供一个丑陋且缓慢的枢轴;或来自 5NF 数据库的干净但缓慢的 Pivot;或者来自 6NF 数据库的干净快速的 Pivot。不是这样的。假设它是一个关系数据库。给定提供的 DDL,将有一个 ParentTable,其中
(Col1, Col2)
是唯一的。代码:
如果没有 ParentTable(即它不是关系数据库),请使用
SELECT-INTO
即时创建一个 ParentTable,或使用派生表:您可以删除 RowTable 中的
Id
列,它是 100% 冗余的列和索引,没有任何作用。Response to Original Question
1. Performance with Table As Is.Well, before anyone can evaluate that code, perfomance-wise, we need the create table statement, including the indices.
Higher Class of Performance.
Pivoting is a function of expressing the data available in rows, in columns. If the database (table) is say, normalised to 3NF or 5NF, which is row-oriented, then performing columnar functions on row objects is going to be slow. Nothing to do with the product. If you want columnar access at speed (for Pivoting or any other columnar function), you need the data in 6NF. That also happens to make the SQL required for the task much more straight-forward.
If your data modeller prepared the table for pivoting (typically data warehouse type usage; Dimension-Fact structure), then it may not be true 6NF, but at least it will be better than 5NF, and easier to extract Pivoted values. When I see the DDL, I will be able to determine what it is (true 6NF; better than 5NF but not 6NF). Then I can determine if you are using the best code to obtain what you need.
It is only slow or "expensive" when the table is not in 6NF.
At this stage, from your code, it does not even look like a Pivot (using the standard meaning of the term), it looks like a
MAX()
of various values (calling the resulting columnPivotx
does not make it a Pivot); and you are reading every row, once. That is, you have a procedural mindset, not a Pivoting or set-oriented mindset. Therefore, the code is likely not to get the values you require (whether it performs well or not, is a separate issue).Your use of
GROUP BY
confirms the procedural approach to the non-procedural set, and that is going to be slow (creates worktables; which will be huge if your data is huge), and the same info can be obtained much faster via the Dimensions. Why don't you use the dimension tables for this pivotable table ? Post either the DDL for all Dimension tables that are related to this table, or the Data Model.Response to Comments
I am trying to help you, but there are two obstacles. First, 19 days between interactions. Second, your posted SQL will not work: for each row, it returns the same
ColValue
in 11 columns;I cannot figure out the purpose of your use of. Ok, TheMAX()
MAX()
is required to beat theGROUP BY
into submission. Therefore I am still at a loss as to what you intend (not what you have coded). The obfuscation is fair enough, but here we have lost the meaning.Yes, there are faster ways, but I need to understand the intent, and the parent tables (eg. do you have a table where
(Col1, Col2)
is Unique ? If it is a database, then tables do not stand alone, they are related, and the relations have some purpose. I realise you do not think they are relevant, but that limitation has produced the code you have posted; the solution is beyond that limitation.Anyway, in order to avoid further delay, please try this code. It is just a guess, doesn't appear correct to me, because
(Col1, Col2, TypeId)
is Unique; therefore there will be only one set ofTypeId
(column heading in the result set) for eachCol1, Col2
result row:And perhaps your can give me feedback on that.
Response to Updated Question
Ok, now we have one unnormalised table. New set of steps. It is a constructed result set, using Correlated Subqueries which return Scalars. It is not a re-arrangement of rows vs columns; it is not a standard Pivot (therefore the code provided is not a pivot). Dead easy. You may wish to change the Question Heading, because people are looking for a true Pivot. And yes, this will perform much better (assuming your DDL is a true representation of the real tables).
To be clear, a Pivot (ala MS SQL
PIVOT
function) is a different animal. I can provide an ugly-and-slow Pivot for an Unnormalised database; or a clean-but-slow Pivot from a 5NF database; or a clean-and-fast Pivot from a 6NF database. This is not it.Let's assume it is a Relational Database. Given the DDL provided, there would be a ParentTable in which
(Col1, Col2)
is Unique.The code:
If there is not a ParentTable (ie. it is not a Relational Database), create one on the fly with
SELECT-INTO
, or use a Derived Table:You can get rid of the
Id
column in RowTable, it is a 100% redundant column and index, that serves no purpose.旋转本质上是一项昂贵的操作。我不认为这可以优化。
Pivoting is an inherently expensive operation. I don't think that this can be optimized.