当前位置：文江博客话题详情

生成一个不太全局唯一的标识符

发布于 2024-08-16 03:13:16 字数 1355 浏览 4 评论 0原文

我发现了许多关于生成 UID 的不同问题，但据我所知，我的要求有些独特（哈哈）。

总结一下：我需要生成一个非常短的 ID，该 ID 是“本地”唯一的，但不必是“全局”或“通用”唯一的。这些限制不仅仅是基于美观或空间问题，而是因为它本质上被用作硬件标签并且受到硬件的限制。以下是规范：

硬性要求

ID 必须仅包含十进制数字（基础数据是 BCD）；
ID 的最大长度为 12 个字符（数字）。
必须离线生成 - 数据库/网络连接并不总是可用！

软要求

我们希望它以日历年和/或月份开始。由于这确实浪费了大量的熵，所以我不介意对此做出妥协或完全废弃它（如果有必要）。
从特定机器生成的 ID 应按顺序显示。
ID 不必按机器排序 - 例如，机器 1 可以输出 [123000, 124000, 125000]，机器 2 可以输出 [123500, 123600, 124100]。
然而，从集体意义上看，顺序越多越好。像 [200912000001, 200912000002, 200912000003, ...] 这样的一组 ID 是完美的，尽管这显然不能跨多台机器扩展。

使用场景：

该方案范围内的ID将由10台，最多可能100台不同的机器生成。
生成的 ID 总数不会超过几百万个。
并发度极低。单台机器生成 ID 的频率不会超过每 5 分钟左右一次。此外，很可能不会有超过 5 台机器同时在同一小时甚至同一天内生成 ID。我预计一天内在给定机器上生成的 ID 少于 100 个，所有机器生成的 ID 少于 500 个。
少数机器（3-5 台）很可能负责生成 80% 以上的 ID。

我知道可以使用少于 12 个十进制数字将时间戳编码为低至 100 毫秒甚至 10 毫秒的精度，这足以保证此应用程序的“足够唯一”ID。我之所以在这里问这个问题，是因为我真的很想尝试在其中合并人类可读的年/月，或者对有关源机器的一些信息进行编码，或者两者兼而有之。

我希望有人可以帮助对这些软要求做出妥协……或者解释为什么考虑到其他要求，这些要求都不可能。

（PS，我的“母语”语言是 C#，但如果有人有任何绝妙的想法，任何语言的代码甚至伪代码都可以。）

更新：

现在我有机会在上面睡觉了，我我认为我实际上要做的是默认使用时间戳编码，并允许各个安装通过定义自己的 2 位或 3 位机器 ID 来切换到机器顺序 ID。这样，想要弄乱 ID 并装入人类可读信息的客户可以找到自己的确保唯一性的方法，并且我们不对误用负责。如果机器碰巧进行所有在线安装，也许我们可以通过提供服务器实用程序来处理机器 ID 来提供帮助。

原文

I've found a number of different questions on generating UIDs, but as far as I can tell, my requirements here are somewhat unique (ha).

To summarize: I need to generate a very short ID that's "locally" unique, but does not have to be "globally" or "universally" unique. The constraints are not simply based on aesthetic or space concerns, but due to the fact that this is essentially being used as a hardware tag and is this subject to the hardware's constraints. Here are the specifications:

Hard Requirements

The ID must contain only decimal digits (the underlying data is a BCD);
The maximum length of the ID is 12 characters (digits).
Must be generated offline - a database/web connection is not always available!

Soft Requirements

We'd like it to begin with the calendar year and/or month. As this does waste a lot of entropy, I don't mind compromising on this or scrapping it entirely (if necessary).
IDs generated from a particular machine should appear sequential.
IDs do not have to sort by machine - for example, it's perfectly fine for machine 1 to spit out [123000, 124000, 125000], and machine 2 to spit out [123500, 123600, 124100].
However, the more sequential-looking in a collective sense, the better. A set of IDs like [200912000001, 200912000002, 200912000003, ...] would be perfect, although this obviously does not scale across multiple machines.

Usage Scenario:

IDs within the scope of this scheme will be generated from 10, maybe 100 different machines at most.
There will not be more than a few million IDs generated, total.
Concurrency is extremely low. A single machine will not generate IDs more often than every 5 minutes or so. Also, most likely no more than 5 machines at a time will generate IDs within the same hour or even the same day. I expect less than 100 IDs to be generated within one day on a given machine and less than 500 for all machines.
A small number of machines (3-5) would most likely be responsible for generating more than 80% of the IDs.

I know that it's possible to encode a timestamp down to 100 ms or even 10 ms precision using less than 12 decimal digits, which is more than enough to guarantee a "unique enough" ID for this application. The reason I am asking this here on SO, is because I would really like to either try to incorporate human-readable year/month in there or encode some piece of information about the source machine, or both.

I'm hoping that someone can either help with a compromise on those soft requirements... or explain why none of them are possible given the other requirements.

(P.S. My "native" language is C# but code in any language or even pseudocode is fine if anybody has any brilliant ideas.)

Update:

Now that I've had the chance to sleep on it, I think what I'm actually going to do is use a timestamp encoding by default, and allow individual installations to switch to a machine-sequential ID by defining their own 2- or 3-digit machine ID. That way, customers who want to mess with the ID and pack in human-readable information can sort out their own method of ensuring uniqueness, and we're not responsible for misuse. Maybe we help out by providing a server utility to handle machine IDs if they happen to be doing all online installations.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

太阳公公是暖光 2024-08-23 03:13:16

“我在这里问这个问题的原因是
所以，是因为我真的很想
要么尝试合并
那里有人类可读的年/月或
对一些信息进行编码
源计算机，或两者。”

我以前处理过这个问题，从长远来看，尝试将有用的信息存储到序列号中是一个坏主意。设备序列号应该毫无意义。就像主键一样当您开始尝试将真实数据放入序列号时，您就将业务逻辑放入其中，

并且您将被迫像将来您讨厌的任何其他代码一样维护它。相信我。;o)

如果您尝试存储日期/时间值，那么您将浪费无效时间/日期的数字空间，例如，月份字段中的值永远不会大于 12。

直接的纪元/单位时间计数器会更好，但对于每分钟仅生成几个 id 的机器，您仍然会浪费大量空间，

请查看 Wikipedia 上的 VIN 页面。只有少数制造商和几千辆汽车有空间，他们现在正在重复使用车辆识别号，因为他们在其中添加了意义，从而耗尽了空间。

http://en.wikipedia.org/wiki/VIN

这并不是说所有含义序列号不好，只需严格限制它以确保数字不会冲突。

像这样的...

位置 1-3：999 台机器
位置 4-12：序列号

这就是您需要避免冲突的全部。如果你添加一个位置数字，那么当你到达 11 个位置时你就完蛋了。

抱歉，如果这听起来像咆哮。我经常处理电子产品和各种机械零件的制造工作。除非有大量可用空间或辅助标签（其中-哇-提供了前面提到的必要的ID空间），否则它永远不会长期结束。

回复收藏 0 原文

隔岸观火 2024-08-23 03:13:16

yyMMddhhmmID 怎么样？

yy = two-digit year
MM = two-digit month
dd = two-digit day
hh = two-digit hour (24-hour time)
mm = two-digit minute
ID = machine-specific ID

示例：来自 ID = 01 的计算机的 0912113201。

或者（如果您不喜欢两位数年份（Y2K 哈哈）），yyyyMMIDxxxx 怎么样？

yyyy = four-digit year
MM = two-digit month
ID = machine-specific ID
xxxx = sequentially-incremented integer

示例：来自 ID = 01 的计算机的 200912010001。

正如您所说，每台机器每五分钟最多只会生成一个标识符，这为您每月提供了 8,928 (24 * 31 * 60 / 5 = 8928) 个标识符的空间，这些标识符适合 xxxx。如果您需要在 xxxx 序列或机器 ID 中添加额外的数字，则可以将年份压缩为三位数年份 yyy（例如 009）。

这两个都符合您要求的时间戳/机器 ID。

我们都喜欢具体的代码：

class Machine {
    public int ID { get; private set; }
    public Machine(int id) {
        ID = id;
    }
}

 class IdentifierGenerator {
    readonly Machine machine;
    int seed;
    const int digits = 4;
    readonly int modulus;
    readonly string seedFormat;
    public IdentifierGenerator(Machine machine) {
        this.machine = machine;
        this.modulus = (int)Math.Pow(10, digits);
        this.seedFormat = new string('0', digits);
    }

    public string Generate() {
        string identifier = DateTime.Now.ToString("yyyyMM") 
                                + machine.ID.ToString("00") 
                                + seed.ToString(seedFormat);
        seed = (seed + 1) % modulus;
        return identifier;
    }
}

Machine m = new Machine(1);
IdentifierGenerator gen = new IdentifierGenerator(m);
Console.WriteLine(gen.Generate());
Console.WriteLine(gen.Generate());

输出：

200912010000
200912010001

How about yyMMddhhmmID?

yy = two-digit year
MM = two-digit month
dd = two-digit day
hh = two-digit hour (24-hour time)
mm = two-digit minute
ID = machine-specific ID

Example: 0912113201 from machine with ID = 01.

Alternatively (if you don't like two-digit years (Y2K lol)), how about yyyyMMIDxxxx?

yyyy = four-digit year
MM = two-digit month
ID = machine-specific ID
xxxx = sequentially-incremented integer

Example: 200912010001 from machine with ID = 01.

As you said that each machine will only generate one identifier maximum every five minutes, this gives you room for 8,928 (24 * 31 * 60 / 5 = 8928) identifiers per month which will fit in xxxx. Here you could squeeze the year down to a three-digit year yyy (009, e.g.) if you needed an extra digit in the xxxx sequence or the machine ID.

Both of these fit timestamp/machine ID as you requested.

We all like concrete code:

class Machine {
    public int ID { get; private set; }
    public Machine(int id) {
        ID = id;
    }
}

 class IdentifierGenerator {
    readonly Machine machine;
    int seed;
    const int digits = 4;
    readonly int modulus;
    readonly string seedFormat;
    public IdentifierGenerator(Machine machine) {
        this.machine = machine;
        this.modulus = (int)Math.Pow(10, digits);
        this.seedFormat = new string('0', digits);
    }

    public string Generate() {
        string identifier = DateTime.Now.ToString("yyyyMM") 
                                + machine.ID.ToString("00") 
                                + seed.ToString(seedFormat);
        seed = (seed + 1) % modulus;
        return identifier;
    }
}

Machine m = new Machine(1);
IdentifierGenerator gen = new IdentifierGenerator(m);
Console.WriteLine(gen.Generate());
Console.WriteLine(gen.Generate());

Outputs:

200912010000
200912010001

回复收藏 0 原文

凉栀 2024-08-23 03:13:16

安装软件时，还要安装包含唯一数字 ID 的机器 ID 文件/注册表项。由于您只有几台机器，因此该数字不应超过 3 或 4 位。使用这些作为 MS 数字。从 1 开始按顺序生成剩余的数字。

回复收藏 0 原文

心房的律动 2024-08-23 03:13:16

我收集您正在为 Windows 进行开发（回复：您针对 Jason 的回答对“MSI/EXE”的评论）。因此，您可以通过 WMI 或类似方式获取一些唯一的硬件属性（例如处理器或 HDD 序列号，或 NIC 的 MAC 地址）作为唯一计算机 ID 的基础。另一种选择也可能是使用您自己开发的硬件的唯一序列号（如果有）。

这很可能比您需要的更长，因此您可能会截断或散列它以将其减少到（例如）16 位左右，并将其用作您的机器 ID。显然，这可能会导致冲突，但机器数量较少（约 100 台）意味着这种情况不太可能发生，并且使用加密哈希（例如 MD5）的截断输出可以使这种情况变得更小。

然后，由于您有一个（很可能是唯一的）机器 ID，因此您可以使用其他答案列出的方法生成本质上唯一的 ID。

回复收藏 0 原文

裂开嘴轻声笑有多痛 2024-08-23 03:13:16

24 小时内有 864000 个 100 毫秒的滴答声，因此将其附加到日期 09.12.24.86400.0 上可能会起作用，但您必须失去世纪才能适应 12 位数字，而且您没有任何空间用于机器 ID。

回复收藏 0 原文

习惯那些不曾习惯的习惯 2024-08-23 03:13:16

想法一：

YYMMDDmmnnnn

其中

YY is two digit year
MM is two digit month
DD is two digit day
mm is a two digit code unique to that machine (00 - 99)
nnnn is a sequential four digit code for that machine on that day.

想法二：

mmmmnnnnnnnn

哪里

mmmm is four digit code unique to the machine
nnnnnnnn is a sequential number.

Idea number one:

YYMMDDmmnnnn

where

YY is two digit year
MM is two digit month
DD is two digit day
mm is a two digit code unique to that machine (00 - 99)
nnnn is a sequential four digit code for that machine on that day.

Idea number two:

mmmmnnnnnnnn

Where

mmmm is four digit code unique to the machine
nnnnnnnn is a sequential number.

回复收藏 0 原文

漫雪独思 2024-08-23 03:13:16

我的建议是将多种方法组合在一个 id 中。例如：从两年数字、两个月数字开始，然后生成一个随机数，其中时间作为接下来几位数字的种子，然后生成最后几个数字的唯一机器 ID。或者类似的东西。

回复收藏 0 原文

℡Ms空城旧梦 2024-08-23 03:13:16

每台机器都有一个起始 ID DDNNN，其中 DD 是唯一的机器标识符，NNN 是该机器当天生成的当前标识符。每台机器都会跟踪它在特定日期生成的 id，并在需要新的 id 时通过将最后一个 id 加 1 来分配下一个 id。它会在每天开始时将其计数器重置为 0。日期 YYYYDOY 被添加到每台机器生成的数字之前（4 位数的年份，3 位数的年份）。该数字保证是唯一的，因为机器标识符是唯一的。

如果您需要更多空间来容纳更多机器，您可以删除年份中的千禧年并为机器 ID 添加一个数字：YYYDOYDDDNNN。

回复收藏 0 原文