最有效的方法...唯一的随机字符串

发布于 2024-08-07 01:45:47 字数 1037 浏览 2 评论 0原文

我需要高效地将 5 个字符的随机字符串插入数据库,同时确保它是唯一的。生成随机字符串不是问题,但目前我正在做的是生成字符串,然后检查数据库是否已经存在......如果存在,我会重新开始。

有没有更有效的方法来完成这个过程?

请注意,我不想使用 GUID 或任何其他超过 5 个字符的内容...我必须坚持使用 5 个字符。

PS:我不认为这有什么区别,但我的字符串都是区分大小写的。

这是“随机字符串”部分 -

    Public Function GetRandomNumbers(ByVal numChars As Integer) As String
    Dim chars As String() = { _
     "A", "B", "C", "D", "E", "F", _
     "G", "H", "I", "J", "K", "L", _
     "M", "N", "O", "P", "Q", "R", _
     "S", "T", "U", "V", "W", "X", _
     "Y", "Z", "0", "1", "2", "3", _
     "4", "5", "6", "7", "8", "9", _
     "a", "b", "c", "d", "e", "f", _
     "g", "h", "i", "j", "k", "l", _
     "m", "n", "o", "p", "q", "r", _
     "s", "t", "u", "v", "w", "x", _
     "y", "z"}
    Dim rnd As New Random()
    Dim random As String = String.Empty
    Dim i As Integer = 0
    While i < numChars
        random += chars(rnd.[Next](0, 62))
        System.Math.Max(System.Threading.Interlocked.Increment(i), i - 1)
    End While
    Return random
End Function

I need to efficiently insert a 5 character RANDOM string into a database while also ensuring that it is UNIQUE. Generating the random string is not the problem, but currently what I am doing is generating the string and then checking the DB if it exists already... if it does, I start over.

Is there a more efficient way to do this process?

Please note, I do NOT want to use GUID or anything else that is more than 5 characters.... I MUST stick to 5 characters.

PS: I don't think it makes a difference, but my strings are all case sensitive.

Here is the "Random String" portion -

    Public Function GetRandomNumbers(ByVal numChars As Integer) As String
    Dim chars As String() = { _
     "A", "B", "C", "D", "E", "F", _
     "G", "H", "I", "J", "K", "L", _
     "M", "N", "O", "P", "Q", "R", _
     "S", "T", "U", "V", "W", "X", _
     "Y", "Z", "0", "1", "2", "3", _
     "4", "5", "6", "7", "8", "9", _
     "a", "b", "c", "d", "e", "f", _
     "g", "h", "i", "j", "k", "l", _
     "m", "n", "o", "p", "q", "r", _
     "s", "t", "u", "v", "w", "x", _
     "y", "z"}
    Dim rnd As New Random()
    Dim random As String = String.Empty
    Dim i As Integer = 0
    While i < numChars
        random += chars(rnd.[Next](0, 62))
        System.Math.Max(System.Threading.Interlocked.Increment(i), i - 1)
    End While
    Return random
End Function

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

笑看君怀她人 2024-08-14 01:45:48

您可以生成 GUID 并且只使用前 5 个字符吗?

You could generate a GUID and only use the first 5 characters?

小镇女孩 2024-08-14 01:45:48

随机性更重要,还是唯一性更重要? ——注意我说的是“更”重要;我知道你两者都需要。

如果随机性更重要,那么您将需要某种方法来跟踪历史值。数据库本身(具有适当的索引)将是实现此目的的最佳方法。

如果唯一性更重要,那么只需使用计数器并将其用零填充到五位数字即可。当然,这会将您限制为 100,000 行,因此您可以选择使用计数器和字符空间转换(例如,1 =“A”、2 =“B”、27 =“AA”等) 。

Is randomness more important, or is uniqueness more important? -- note that I said "more" important; I get the fact that you need both.

If randomness is more important, then you're going to need some way to track historical values. The database itself (with an appropriate index) is going to be the best way to do this.

If uniqueness is more important, then simply use a counter and zero-pad it to five digits. This will, of course, limit you to 100,000 rows, so you could alternatively use a counter and a transformation into character space (eg, 1 = "A", 2 = "B", 27 = "AA", and so on).

梦境 2024-08-14 01:45:48

有一种方法可以随机选择未使用的唯一单词,但它可能不会比您现在所做的更好。

原理是,您确定哪些单词的排列是未使用的,根据有多少未使用的排列生成一个随机数,然后选择一个。

例如,如果您使用具有三个字符的单词,并且只有字符 0 和 1,则有八种可能的排列。如果您已经使用了组合“010”和“100”,您将得到如下所示的结果:

PI = 排列索引
UI = 未使用的排列索引

No. PI UI
----------
000 0  0
001 1  1
010 2  -
011 3  2
100 4  -
101 5  3
110 6  4
111 7  5

要选择未使用的排列,您只需生成 0 到 5 之间的随机数,然后选择相应的排列即可。

保留所有可能排列的列表当然是不切实际的,因此您需要一个可以从字符串确定排列索引的函数,以及一个可以从排列索引确定字符串的函数。

此外,要确定哪些排列未使用,您必须检查哪些排列已使用,因此您仍然需要在某些时候查询表。

There is a method to pick unused unique words by random, but it's probably not going to be any better than what you are doing now.

The principle is that you determine which permutations of the words that are unused, generate a random number based on how many unused permuations there are, and pick that one.

If you for example would use a word with three characters, and only the character 0 and 1, there are eight possible permutations. If you already used the combinations "010" and "100", you would get something that looks like this:

PI = permutation index
UI = unused permutation index

No. PI UI
----------
000 0  0
001 1  1
010 2  -
011 3  2
100 4  -
101 5  3
110 6  4
111 7  5

To pick an unused permutation, you simply generate a random number from 0 to 5, and pick the corresponding permutation.

Keeping a list of all possible permuations is of course not practical, so you would need a function that can determine the permutation index from the string, and one function that can determine the string from the permutation index.

Also, to determine which permutations are unused, you have to check which are used, so you still have to query the table at some point.

や三分注定 2024-08-14 01:45:48

如果要将字符串插入到现有的已填充表中,那么您始终需要检查该字符串是否不存在(它不必是显式 SELECT)。您可以手动执行此操作,也可以对列设置 UNIQUE 约束并让数据库执行此操作。因此,如果数据库因为该字符串已存在而返回错误,则生成另一个错误。

请注意,如果您有一个空表并想用多个随机字符串填充它,那么这是一个不同的问题。

If you are inserting the string to an existing, populated, table then you will always need to check if the string doesn't exist there (it doesn't have to be an explicit SELECT). You can either to it manually, or have an UNIQUE constraint on the column and let the database to do it. So if the database returns an error because the string is already there, generate another one.

Note that if you have an empty table and want to populate it with multiple random strings, it's a different problem.

怪异←思 2024-08-14 01:45:48

我认为你应该坚持你最初的想法。对索引施加唯一约束并让数据库为您检查/报告重复项将是相当有效的重复项检查方法,但此假设取决于未提供的一些信息,例如行数以及随机选择的数据遇到重复项的可能性。

使用您的参数完全预填充唯一的序列池需要 4.59 亿行的表。

您可以使用布隆过滤器将可管理的统计信息加载到数据库或主内存中并避免重复,但根据行计数和过滤器配置,当行计数占 4.59 亿限制的相当大的百分比时,这可能会导致过滤器饱和。由于过滤器可以报告误报,因此您应该努力确保系统不会陷入永远尝试通过过滤器的排列的情况。

I think you should stick to your origional idea. Putting a unique constraint on the index and letting the database check/report dupes for you would be fairly effecient method of dupe checking but this assumption depends on some information not provided like the number of rows and likelyhood of encountering dupes with randomly selected data.

Fully pre-populating a unique sequence pool with your parameters requires a 459 million row table.

You could use a bloom filter to load managable statistics into a database or main memory and avoid dupes but depending on the row count and filter configuration this may lead to saturation of the filter when the row count is an appreciable percentage of the 459 million limit.. Since the filter can report false positives you should work to ensure that you don't get into a situation where your system is stuck trying permutations that pass the filter approaching forever.

无法言说的痛 2024-08-14 01:45:48

既然您知道您的单词有多长,为什么不采用基于树的方法呢? (我们称之为随机树遍历)

假设您的单词有 n 个字符。生成 S 中所有符号 s 的列表,并将每个符号的计数器与字符串中可能的位置相关联,本质上是一个维度为 s 乘以 n 的矩阵 M。现在掷骰子并选择第一个字母并查找 M(s,1)。如果 M(s,1) 大于或等于以 s 开头的可能单词的数量,则再次滚动。否则增加 M(s,1)。

对从 1 到 n 的每个字母重复此操作。

应该很快,直到你用完很多单词。

As you know how long your word shall be, why not a tree-based approach? (Let's call it randomised tree walk)

Say your word has n characters. Generate a list of all symbols s in S and relate a counter for each symbol and possible position in the string, essentially a matrix M of dimensions s times n. Now roll your dice and choose the first letter and lookup M(s,1). If M(s,1) is larger or equal the number of possible words starting with s, roll again. Otherwise increment M(s,1).

Repeat this for every letter 1 till n.

Should be pretty fast until you have used up to many words.

眉目亦如画i 2024-08-14 01:45:47

创建一个表,其中包含按顺序添加的大量 5 字符字符串(因此它们是唯一的),并以 GUID 作为主键。添加一列来指示它们是否被使用。

当您需要一个新号码时,您可以从池中选择前 1 个,按 guid 排序(因此它变成随机的),并将结果设置为“已用”。

Create a table with a big pool of 5-character strings that are added in sequence (so they are unique), and have a GUID as their primary key. Add a column to indicate whether they are used or not.

When you need a new number, you select top 1 from the pool, order by the guid (so it becomes random), and set the result as "spent".

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文