随机生成 MySQL 行时确保字段唯一

发布于 2024-12-05 17:40:41 字数 233 浏览 1 评论 0原文

我正在生成大约 100,000 行随机数据,大约有 10 列,并且我需要所有值都是唯一的。

我能想到的唯一可靠的方法是对每个随机生成进行查询,以确保它尚不存在,但这会增加数十万个查询。另一种选择是在表上创建一个 UNIQUE 约束,但是我需要有某种方法来计算有多少重复项失败,以便我可以重新创建这些行(我需要知道我将生成完全相同数量的随机行)按照指定),或者在插入失败时使用新一代重试。有办法做到这一点吗?或者我是否错过了其他一些方法?

I'm generating somewhere around 100,000 rows of random data, with around 10 columns, and I need to have all of values to be unique.

The only solid way I could think of doing this would be doing a query on each random generation to make sure that it does not exist yet, but that adds hundreds of thousands of queries. The other alternative would be creating a UNIQUE constraint on the table, but then I would need to have some way of counting how many duplicates fail, so that I can recreate those rows (I need to know that I will generate exactly as many random rows as specified), or having something that retries with a new generation when that insert fails. Is there a way to do that? Or am I missing some other approach all together?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

烛影斜 2024-12-12 17:40:41

我应该在 python 中生成一个列表中的随机数。
然后,使用 set 来获取唯一值。

l = [1,5,5,4,5,4,8,9,2]

set(l)

set([1, 2, 4, 5, 8, 9])

现在,您可以生成 200.000 个数字以获得至少 100.000 个唯一的数字,然后截断列表,等等。


我尝试了这个:

import random
l = []
for i in range (200000):
    l.append (random.random())
print len(set(l))

它给了我一组长度为 200.000 的数据。生成的随机数中没有重复项!

I should generate random numbers in python, in a list.
Then, use set to get uniques.

l = [1,5,5,4,5,4,8,9,2]

set(l)

set([1, 2, 4, 5, 8, 9])

Now, you may generate 200.000 numbers to get at least 100.000 unique ones, and truncate the list, or so.


I tried this :

import random
l = []
for i in range (200000):
    l.append (random.random())
print len(set(l))

and it gave me a set of length 200.000. There were no dups within the generated random !

季末如歌 2024-12-12 17:40:41

您在列中填充什么类型的数据?它必须遵循任何特定格式吗?

无论数据格式如何,我都会离线生成所有数据,然后将其一次性插入数据库中。

例如:

from base64 import b64encode
from random import shuffle

arr=[]
for i in range(0,1000000): #Assuming you need 1mil entries
    arr.append(b64encode(str(i)))
shuffle(arr)
#Add arr[] to the database

这样您就知道所有字符串都是唯一的,并且随机播放将使顺序(更加)随机。

What sorts of data are you filling the columns with? Does it have to follow any particular format?

Regardless of the data format, I'd generate all the data offline and then insert it into the DB all at once.

For example:

from base64 import b64encode
from random import shuffle

arr=[]
for i in range(0,1000000): #Assuming you need 1mil entries
    arr.append(b64encode(str(i)))
shuffle(arr)
#Add arr[] to the database

This way you know that all the strings will be unique, and the shuffle will make the order (more) random.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文