SQL Server 跨行压缩
我必须以特定顺序返回约 70,000 行 4 列 INT,并且只能使用非常浅的缓存,因为所涉及的数据非常不稳定并且必须是最新的。数据的一个特性是,当它有序时,它通常是高度重复的。
我已经开始研究减少行数的各种方法,以减少网络带宽和客户端处理时间/资源,但没有设法在 T-SQL 中找到任何可以“压缩”重复行的技术分成一行和一个“计数”列。例如
prop1 prop2 prop3 prop4
--------------------------------
0 0 1 53
0 0 2 55
1 1 1 8
1 1 1 8
1 1 1 8
1 1 1 8
0 0 2 55
0 0 2 55
0 0 1 53
:
prop1 prop2 prop3 prop4 count
-----------------------------------------
0 0 1 53 1
0 0 2 55 1
1 1 1 8 4
0 0 2 55 2
0 0 1 53 1
我估计,如果这是可能的,在许多情况下,70,000 行的结果集将减少到最多几千行。
我是否在这里咆哮错误的树(是否有隐式压缩作为 SQL Server 协议的一部分)?
有没有办法做到这一点(SQL Server 2005)?
我有理由不应该这样做吗?
谢谢。
I'm having to return ~70,000 rows of 4 columns of INTs in a specific order and can only use very shallow caching as the data involved is highly volatile and has to be up to date. One property of the data is that it is often highly repetitive when it is in order.
I've started to look at various methods of reducing the row count in order to reduce network bandwidth and client side processing time/resources, but have not managed to find any kind of technique in T-SQL where I can 'compress' repetative rows down into a single row and a 'count' column. e.g.
prop1 prop2 prop3 prop4
--------------------------------
0 0 1 53
0 0 2 55
1 1 1 8
1 1 1 8
1 1 1 8
1 1 1 8
0 0 2 55
0 0 2 55
0 0 1 53
Into:
prop1 prop2 prop3 prop4 count
-----------------------------------------
0 0 1 53 1
0 0 2 55 1
1 1 1 8 4
0 0 2 55 2
0 0 1 53 1
I'd estimate that if this was possible, in many cases what would be a 70,000 row result set would be down to a few thousand at most.
Am I barking up the wrong tree here (is there implicit compression as part of the SQL Server protocol)?
Is there a way to do this (SQL Server 2005)?
Is there a reason I shouldn't do this?
Thanks.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
您可以使用
count
函数!这将要求您使用group by
子句,在其中告诉count
如何分解或group
本身。Gropu by
用于任何聚合函数 在 SQL 中。更新:OP提到这些是按
y
和x
排序的,而不是结果集的一部分。在这种情况下,您仍然可以使用y
和x
作为group by
的一部分。请记住,如果没有排序列,顺序就没有任何意义,因此在这种情况下,我们必须使用
组中的
。y
和x
来尊重这一点通过You can use the
count
function! This will require you to use thegroup by
clause, where you tellcount
how to break up, orgroup
, itself.Gropu by
is used for any aggregate function in SQL.Update: The OP mentioned these are ordered by
y
andx
, not part of the result set. In this case, you can still usey
andx
as part of thegroup by
.Keep in mind that order means nothing if it doesn't have ordering columns, so in this case, we have to respect that with
y
andx
in thegroup by
.这会起作用,尽管看起来很痛苦:
要点如下:
This will work, though it is painful to look at:
The gist is the following:
70,000 行(每列四个整数)对于现代 LAN 上的带宽来说并不是真正的问题,除非您有许多工作站同时执行此查询;在带宽更受限制的 WAN 上,您可以使用 DISTINCT 来消除重复行,这种方法可以节省带宽,但会消耗一些服务器 CPU。然而,除非您有一台真正超载的服务器,并且始终以峰值负载或接近峰值负载运行,否则这种额外的消耗只是一个短暂的现象。 70,000 行几乎没什么。
70,000 rows of four integer columns is not really a worry for bandwidth on a modern LAN, unless you have many workstations executing this query concurrently; and on a WAN with more restricted bandwidth you could use DISTINCT to eliminate duplicate rows, an approach which would be frugal with your bandwidth but consume some server CPU. Again, however, unless you have a really overloaded server that is always performing at or near peak loads, this additional consumption would be a mere blip. 70,000 rows is next to nothing.