SQL Server 中的伪随机可重复排序（不是 NEWID() 也不是 RAND()）

发布于 2024-07-11 18:41:08 字数 1252 浏览 10 评论 0原文

我想以可重复的方式对结果进行随机排序，以用于分页等目的。因为这个 NEWID() 太随机了，无法重新获得相同的结果。按兰德（种子）排序将是理想的，因为使用相同的种子会产生相同的随机集合。不幸的是， Rand() 状态每行都会重置，有人有解决方案吗？

declare @seed as int;
set @seed = 1000;

create table temp (
id int,
date datetime)

insert into temp (id, date) values (1,'20090119')
insert into temp (id, date) values (2,'20090118')
insert into temp (id, date) values (3,'20090117')
insert into temp (id, date) values (4,'20090116')
insert into temp (id, date) values (5,'20090115')
insert into temp (id, date) values (6,'20090114')

-- re-seeds for every item
select *, RAND(), RAND(id+@seed) as r from temp order by r
--1 2009-01-19 00:00:00.000 0.277720118060575   0.732224964471124
--2 2009-01-18 00:00:00.000 0.277720118060575   0.732243597442382
--3 2009-01-17 00:00:00.000 0.277720118060575   0.73226223041364
--4 2009-01-16 00:00:00.000 0.277720118060575   0.732280863384898
--5 2009-01-15 00:00:00.000 0.277720118060575   0.732299496356156
--6 2009-01-14 00:00:00.000 0.277720118060575   0.732318129327415
-- Note how the last column is +=~0.00002

drop table temp

-- interestingly this works:
select RAND(@seed), RAND()
--0.732206331499865 0.306382810665955

请注意，我尝试了 Rand(ID) 但结果只是排序。显然 Rand(n) < 兰特(n+1)

原文

I would like to randomly sort a result in a repeatable fashion for purposes such as paging. For this NEWID() is too random in that the same results cannot be re-obtained. Order by Rand(seed) would be ideal as with the same seed the same random collection would result. Unfortunately, the Rand() state resets with every row, does anyone have a solution?

declare @seed as int;
set @seed = 1000;

create table temp (
id int,
date datetime)

insert into temp (id, date) values (1,'20090119')
insert into temp (id, date) values (2,'20090118')
insert into temp (id, date) values (3,'20090117')
insert into temp (id, date) values (4,'20090116')
insert into temp (id, date) values (5,'20090115')
insert into temp (id, date) values (6,'20090114')

-- re-seeds for every item
select *, RAND(), RAND(id+@seed) as r from temp order by r
--1 2009-01-19 00:00:00.000 0.277720118060575   0.732224964471124
--2 2009-01-18 00:00:00.000 0.277720118060575   0.732243597442382
--3 2009-01-17 00:00:00.000 0.277720118060575   0.73226223041364
--4 2009-01-16 00:00:00.000 0.277720118060575   0.732280863384898
--5 2009-01-15 00:00:00.000 0.277720118060575   0.732299496356156
--6 2009-01-14 00:00:00.000 0.277720118060575   0.732318129327415
-- Note how the last column is +=~0.00002

drop table temp

-- interestingly this works:
select RAND(@seed), RAND()
--0.732206331499865 0.306382810665955

Note, I tried Rand(ID) but that just turns out to be sorted. Apparently Rand(n) < Rand(n+1)

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

旧伤慢歌 2024-07-18 18:41:08

基于 gkrogers 哈希建议，这非常有效。对性能有什么想法吗？

declare @seed as int;
set @seed = 10;

create table temp (
id int,
date datetime)

insert into temp (id, date) values (1,'20090119')
insert into temp (id, date) values (2,'20090118')
insert into temp (id, date) values (3,'20090117')
insert into temp (id, date) values (4,'20090116')
insert into temp (id, date) values (5,'20090115')
insert into temp (id, date) values (6,'20090114')

-- re-seeds for every item
select *, HASHBYTES('md5',cast(id+@seed as varchar)) r
from temp order by r
--1 2009-01-19 00:00:00.000 0x6512BD43D9CAA6E02C990B0A82652DCA
--5 2009-01-15 00:00:00.000 0x9BF31C7FF062936A96D3C8BD1F8F2FF3
--4 2009-01-16 00:00:00.000 0xAAB3238922BCC25A6F606EB525FFDC56
--2 2009-01-18 00:00:00.000 0xC20AD4D76FE97759AA27A0C99BFF6710
--3 2009-01-17 00:00:00.000 0xC51CE410C124A10E0DB5E4B97FC2AF39
--6 2009-01-14 00:00:00.000 0xC74D97B01EAE257E44AA9D5BADE97BAF

drop table temp

编辑：注意，如果使用动态 SQL，则在查询中使用的 @seed 声明可以替换为参数或常量 int。（不需要以 TSQL 方式声明 @int）

Building off of gkrogers hash suggestion this works great. Any thoughts on performance?

declare @seed as int;
set @seed = 10;

create table temp (
id int,
date datetime)

insert into temp (id, date) values (1,'20090119')
insert into temp (id, date) values (2,'20090118')
insert into temp (id, date) values (3,'20090117')
insert into temp (id, date) values (4,'20090116')
insert into temp (id, date) values (5,'20090115')
insert into temp (id, date) values (6,'20090114')

-- re-seeds for every item
select *, HASHBYTES('md5',cast(id+@seed as varchar)) r
from temp order by r
--1 2009-01-19 00:00:00.000 0x6512BD43D9CAA6E02C990B0A82652DCA
--5 2009-01-15 00:00:00.000 0x9BF31C7FF062936A96D3C8BD1F8F2FF3
--4 2009-01-16 00:00:00.000 0xAAB3238922BCC25A6F606EB525FFDC56
--2 2009-01-18 00:00:00.000 0xC20AD4D76FE97759AA27A0C99BFF6710
--3 2009-01-17 00:00:00.000 0xC51CE410C124A10E0DB5E4B97FC2AF39
--6 2009-01-14 00:00:00.000 0xC74D97B01EAE257E44AA9D5BADE97BAF

drop table temp

EDIT: Note, the declaration of @seed as it's use in the query could be replace with a parameter or with a constant int if dynamic SQL is used. (declaration of @int in a TSQL fashion is not necessary)

回复收藏 0 原文

分开我的手 2024-07-18 18:41:08

您可以使用每行中的值来重新计算 rand 函数：

Select *, Rand(@seed + id) as r from temp order by r

添加 ID 可确保为每行重新播种 rand。但对于种子值，您将始终返回相同的行序列（前提是表不更改）

You can use a value from each row to re-evaluate the rand function:

Select *, Rand(@seed + id) as r from temp order by r

adding the ID ensures that the rand is reseeded for each row. But for a value of seed you will always get back the same sequence of rows (provided that the table does not change)

回复收藏 0 原文

流年里的时光 2024-07-18 18:41:08

创建哈希可能比创建种子随机数更耗时。

为了获得 RAND([seed]) 输出的更多变化，您还需要使 [seed] 发生显着变化。可能比如......

SELECT
    *,
    RAND(id * 9999)    AS [r]
FROM
   temp
ORDER BY
   r

使用常量可以确保您所要求的可复制性。但如果您希望表变得足够大，请小心 (id * 9999) 导致溢出的结果......

Creating a hash can be much more time consuming than creating a seeded random number.

To get more variation in the ourput of RAND([seed]) you need to make the [seed] vary significantly too. Possibly such as...

SELECT
    *,
    RAND(id * 9999)    AS [r]
FROM
   temp
ORDER BY
   r

Using a constant ensures the replicability you asked for. But be careful of the result of (id * 9999) causing an overflow if you expect your table to get big enough...

回复收藏 0 原文

时光磨忆 2024-07-18 18:41:08

SELECT *, checksum(id) AS r FROM table ORDER BY r

这种作品。尽管 checksum() 的输出对我来说看起来并不那么随机。 MSDN 文档指出：

[...]，我们不建议使用 CHECKSUM 来检测值是否已更改，除非您的应用程序可以容忍偶尔丢失更改。考虑使用 HashBytes 代替。当指定MD5哈希算法时，HashBytes对于两个不同的输入返回相同结果的概率远低于CHECKSUM。

但可能会更快。

SELECT *, checksum(id) AS r FROM table ORDER BY r

This kind of works. Although the output from checksum() does not look all that random to me. The MSDN Documentation states:

[...], we do not recommend using CHECKSUM to detect whether values have changed, unless your application can tolerate occasionally missing a change. Consider using HashBytes instead. When an MD5 hash algorithm is specified, the probability of HashBytes returning the same result for two different inputs is much lower than that of CHECKSUM.

But may be it faster.

回复收藏 0 原文

尬尬 2024-07-18 18:41:08

经过一些阅读后，这是一种可以接受的方法。

Select Rand(@seed) -- now rand is seeded

Select *, 0 * id + Rand() as r from temp order by r

表达式中包含 id 会导致每一行都重新计算。但将其乘以 0 可确保它不会影响 rand 的结果。

这是多么可怕的做事方式啊！

After doing some reading this is an accepted method.

Select Rand(@seed) -- now rand is seeded

Select *, 0 * id + Rand() as r from temp order by r

Having id in the expression causes it to be reevaluated every row. But multiplying it by 0 ensures that it doesnt not affect the outcome of rand.

What a horrible way of doing things!

回复收藏 0 原文

爱的故事 2024-07-18 18:41:08

create table temp (
id int,
date datetime)

insert into temp (id, date) values (1,'20090119')
insert into temp (id, date) values (2,'20090118')
insert into temp (id, date) values (3,'20090117')
insert into temp (id, date) values (4,'20090116')
insert into temp (id, date) values (5,'20090115')
insert into temp (id, date) values (6,'20090114')

-- re-seeds for every item
select *, NEWID() r
from temp order by r

drop table temp

create table temp (
id int,
date datetime)

insert into temp (id, date) values (1,'20090119')
insert into temp (id, date) values (2,'20090118')
insert into temp (id, date) values (3,'20090117')
insert into temp (id, date) values (4,'20090116')
insert into temp (id, date) values (5,'20090115')
insert into temp (id, date) values (6,'20090114')

-- re-seeds for every item
select *, NEWID() r
from temp order by r

drop table temp

回复收藏 0 原文

赠意 2024-07-18 18:41:08

这在过去对我来说效果很好，它可以应用于任何表（只需添加 ORDER BY 子句）：

SELECT *
FROM MY_TABLE
ORDER BY  
  (SELECT ABS(CAST(NEWID() AS BINARY(6)) % 1000) + 1);

This has worked well for me in the past, and it can be applied to any table (just bolt on the ORDER BY clause):

SELECT *
FROM MY_TABLE
ORDER BY  
  (SELECT ABS(CAST(NEWID() AS BINARY(6)) % 1000) + 1);

回复收藏 0 原文

~没有更多了~

关于作者

邮友

暂无简介

0 文章

0 评论

23 人气

关注发私信

友情链接

文江博客

SQL Server 中的伪随机可重复排序（不是 NEWID() 也不是 RAND()）

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（7）

关于作者

相关话题

热门标签

推荐作者

lixs

敷衍

盗梦空间

tian

13375331123

你对谁都笑

友情链接

SQL Server 中的伪随机可重复排序（不是 NEWID() 也不是 RAND()）

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（7）

关于作者

相关话题

热门标签

推荐作者

lixs

敷衍

盗梦空间

tian

13375331123

你对谁都笑

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

敷衍