当前位置：文江博客话题详情

惯用语“保证唯一” C++ 中的标识符

发布于 2024-11-01 09:44:53 字数 447 浏览 4 评论 0原文

是否有一种惯用的 C++ 方法来保留和回收保证唯一的标识符？我的要求是：

假设存在当前未保留的 ID，reserve_id(void) 返回该 ID。
在一个不间断的 Reserve_id() 调用序列中，不会返回两次单个标识符
存在一个函数 recycle(id_type)，它将标识符返回到可用池。

例如，我看到 Boost::Uuid ，但是 a) 我没有看到任何文档可以保证两个 UUID 的唯一性，并且 b) 我暂时只能使用早期版本的 Boost (1.40)。如果这特别适合该任务，我可以推动升级。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

梦在夏天 2024-11-08 09:44:53

我认为您已经通过查找 Boost::Uuid 解决了这个问题，但您要求回收已生成的标识符。

从问题中的您链接到的文档中：

当 UUID 由其中之一生成时
定义的机制，它们是
保证是独特的、不同的
来自所有其他生成的 UUID（即
是，它以前从未生成过
并且它永远不会再次生成），
或者它极有可能是独一无二的
（取决于机制）。

如果您一心想要回收和重新使用现有的标识符，我想您可以随着时间的推移不断建立一个 UUID 池，仅在需要时生成新的标识符并发现池是空的。但我无法想象这样的场景比生成新的 UUID 更好。

编辑：您曾评论说您需要保证唯一性。实际上，当以编程方式生成唯一标识符时，您永远不会得到一个。实际上，您会将生成的 ID 存储在具有有限大小的数据类型中，因此您可以生成的可能 ID 集也是有限的。恕我直言，您能实现的最好结果就是在容差阈值内模拟唯一性。

您可以通过

使用一种技术来实现这一点，该技术使获得重复 UUID 的机会变得非常遥远（这就是 Boost::UUID 所做的）；
将极有可能唯一的 UUID 的生成包装在其他逻辑中，在已生成的 UUID 列表中查找新生成的 UUID，以消除新 UUID 重复的微小可能性。显然，当您在列表中接近大量 UUID 时，这样做的实用性就会降低。您预计会生成多少个？
如果您想要真正大量的唯一ID，大于适合本机类型的数量，您可以实现一个管理内存并进行必要的数学运算的类型，然后生成连续的ID，或者您可以使用类似的东西GNU Bignum 库可以为您做到这一点。

I think you've already solved this problem for most practical purposes by finding Boost::Uuid, with the exception of your requirement to recycle identifiers already generated.

From the documentation you linked to in the question:

When UUIDs are generated by one of the
defined mechanisms, they are either
guaranteed to be unique, different
from all other generated UUIDs (that
is, it has never been generated before
and it will never be generated again),
or it is extremely likely to be unique
(depending on the mechanism).

If you're hell-bent on recycling and re-using existing identifiers, I suppose you could maintain build up a pool of UUIDs over time, generating new ones only when you need one and find that the pool is empty. But I can't imagine a scenario where that would be preferable to generating a new UUID.

EDIT: You've commented that you need a guarantee as to uniqueness. Realistically, you're never going to get one when programatically generating a unique identifier. In practice, you're going to store the generated ID in a data type which has finite size, and so the possible set of IDs you can generate is finite too. IMHO, the best you can achieve then is to simulate uniqueness within a tolerance threshold.

You could do this by

Using a technique that makes the chances of getting a duplicate UUID very remote (this is what Boost::UUID will do);
Wrapping the generation of the highly-probably-to-be-unique UUID in some other logic that looks up the newly-generated UUID in a list of already-generated UUIDs to eliminate that tiny chance that the new one is a duplicate. Obviously, the practicality of doing this becomes decreases as you approach very large quantities of UUIDs in your list. How many do you anticipate generating?
If you want truly huge quantities of unique IDs, bigger than would fit in a native type, you could implement a type that manages the memory and does the necessary maths, and just produce sequential Ids, or you could perhaps use something like the GNU Bignum Library to do it for you.

回复收藏 0 原文

笑咖 2024-11-08 09:44:53

您需要什么样的独特性？
只是在程序的生命周期中是唯一的，还是在多次运行/跨进程中是唯一的？

如果是前者，那么您可以new一个内存字节，然后使用该内存的地址作为标识符。这将保证是唯一的，直到您删除内存，此时它可能会被回收。

这可以很容易地包装在这样的类中：

#include <stdint.h>

class UID
{
public:
        typedef uint64_t id_type;

        static const id_type reserve_id()
        {
                uint8_t* idBlock = new uint8_t;
                *idBlock = validId;
                return (id_type)idBlock;
        }

        static void recycle(id_type id)
        {
                uint8_t* idBlock = (uint8_t*)id;
                if (*idBlock == validId)
                {
                        *idBlock = 0;
                        delete idBlock;
                }
        }
private:
        static const uint8_t validId = 0x1D;
};

可能有点不寻常，但如果您只需要每个进程的唯一性，它就满足您的要求:)

What sort of uniqueness do you require?
Just unique for the lifetime of the program or unique across multiple runs/cross-process?

If it is the former then you could just new a byte of memory then use the address of that memory as your identifier. This would be guaranteed to be unique until you delete the memory, at which point it may be recycled.

This could easily be wrapped in a class like this:

#include <stdint.h>

class UID
{
public:
        typedef uint64_t id_type;

        static const id_type reserve_id()
        {
                uint8_t* idBlock = new uint8_t;
                *idBlock = validId;
                return (id_type)idBlock;
        }

        static void recycle(id_type id)
        {
                uint8_t* idBlock = (uint8_t*)id;
                if (*idBlock == validId)
                {
                        *idBlock = 0;
                        delete idBlock;
                }
        }
private:
        static const uint8_t validId = 0x1D;
};

Possibly a bit unusual, but it meets your requirements if you only need per-process uniqueness :)

回复收藏 0 原文

匿名。 2024-11-08 09:44:53

ID 的有效期是多久？您真的需要回收它们吗？或者您可以忍受它们永远独一无二吗？您需要一次生成多少个？您可以为 id 分配多少位？

这是一个简单的方法：获取以太网卡的 MAC 地址（这是全球唯一的硬件问题），混合时间/日期（以毫秒为分辨率）和递增整数计数器（每个生成的 id 递增一次），然后您将得到一个id 在您的时间/日期范围内是唯一的，只要您不在本机上的一毫秒内生成 MAXINT 即可。现在它不是随机的，攻击者很容易预测，所以不要为了安全而使用它，它肯定不是最有效的位使用方式，但它是全球唯一的。

回复收藏 0 原文

不喜欢何必死缠烂打 2024-11-08 09:44:53

是的，这很简单。

reserve_id 函数是operator new(0)。
这分配零字节，但具有唯一的地址。
recycle函数当然是operator delete

回复收藏 0 原文

娇柔作态 2024-11-08 09:44:53

这个问题似乎与 C++ 无关，它更像是一个基本问题。在任何给定时间预计有多少个 ID 有效？如果您希望在任何给定时间都只有很少的有效 ID，只需根据您的性能要求和相对回收/保留频率将它们放入链表、向量或集合等容器中。排序链表可能是最好的选择，因为您将在 O(n) 中进行回收和保留操作。一个向量有 O(n), O(n log n) ，而一个集合分别有 O(n log n), O(n) （可能是错的，我很快就想到了）。

void recycle(ID) {
    container.remove(ID);
    // abort if unsuccessiful (= invalid ID)
}

ID reserve() {
    static ID last = 0;
    while(container.find(last)) {
        last++;
    }
    return last;
}

The problem does not seem connected to C++, it is more of a fundamental issue. How many IDs are expected to be valid at any given time? If you expect to have few valid IDs at any given time, just put them in a container such as linked list, vector or set depending on your performance requirements and relative recycle/reserve frequency. A sorted linked list is probably the best option as you will have both recycle and reserve operations in O(n). A vector has O(n), O(n log n) and a set has O(n log n), O(n) respectively (might be wrong, I did the thinking very quicky).

void recycle(ID) {
    container.remove(ID);
    // abort if unsuccessiful (= invalid ID)
}

ID reserve() {
    static ID last = 0;
    while(container.find(last)) {
        last++;
    }
    return last;
}

回复收藏 0 原文

ま昔日黯然 2024-11-08 09:44:53

下面是我在 .NET 项目中使用的一个简化实现，已快速转换为 C++（可能需要添加一些 C# 中隐式的边界验证）：

可以租赁 (T acquire()) 并返回 (void release(T)) 连续的整数，带有可选的开始（偏移量）。如果需要，您可以将它们用作更复杂的标识符/结构的索引或种子。
不需要迭代，而是在第二个索引数组中使用偏移逻辑（重复内存要求）。
释放的数字将在下次获取时返回。
一些额外的便利方法，例如获取租赁总数 (T size()) 或验证它们 (bool validate(T))。

#include <stdexcept>
#include <vector>

template <class T>
class NumberPool
{
public:
    NumberPool(const T capacity, const T offset = 0)
        : m_capacity{ capacity }
        , m_offset{ offset }
    {
        m_indices.resize(m_capacity);
        m_numbers.resize(m_capacity);
        clear();
    }

    T acquire()
    {
        if (m_size == m_capacity)
            throw std::out_of_range("No more numbers available.");

        T i = m_size++;
        T n = m_numbers[i];
        m_indices[n] = i;

        return n + m_offset;
    }

    T capacity() const
    {
        return m_capacity;
    }

    void clear()
    {
        m_size = 0;
        for (T i = 0; i < m_capacity; ++i)
            m_numbers[i] = i;
    }

    void release(T n)
    {
        n -= m_offset;

        T last = m_numbers[--m_size];
        T i = m_indices[n];
        m_indices[last] = i;
        m_numbers[m_size] = n;
        m_numbers[i] = last;
    }

    T size() const
    {
        return m_size;
    }

    bool validate(T n) const
    {
        n -= m_offset;

        T i = m_indices[n];
        return i < m_size && m_numbers[i] == n;
    }

private:
    std::vector<T> m_indices;
    std::vector<T> m_numbers;
    T m_capacity;
    T m_size;
    T m_offset;
};

可以按如下方式使用（请随意在测试框架中进行适当的断言）：

NumberPool<uint16_t> pool{ 5, 1000 };
uint16_t n1000 = pool.acquire();
uint16_t n1001 = pool.acquire();
uint16_t n1002 = pool.acquire();
uint16_t size3 = pool.size();

bool n1002true = pool.validate(n1002);
pool.release(n1002);
bool n1002false = pool.validate(n1002);
uint16_t size2 = pool.size();

uint16_t n1002b = pool.acquire();
uint16_t n1003 = pool.acquire();
uint16_t n1004 = pool.acquire();
uint16_t size5 = pool.size();
bool n1004true = pool.validate(n1004);

pool.clear();
uint16_t size0 = pool.size();

bool n1000false = pool.validate(n1000);
bool n1004false = pool.validate(n1004);

uint16_t n1000b = pool.acquire();
bool n1000btrue = pool.validate(n1000b);

Here's a simplified implementation I use in .NET projects, quickly translated to C++ (may want to add some bounds validation which was implicit in C#):

Can rent (T acquire()) and return (void release(T)) successive integral numbers, with an optional start (offset). You can use these as an index or seed to more complex identifiers/structures if needed.
Does not require iterations, instead using offset logic in a second index array (duplicating memory requirements).
Released numbers are returned on next acquire.
Some extra convenience methods like getting the total numbers rented (T size()) or validating them (bool validate(T)).

#include <stdexcept>
#include <vector>

template <class T>
class NumberPool
{
public:
    NumberPool(const T capacity, const T offset = 0)
        : m_capacity{ capacity }
        , m_offset{ offset }
    {
        m_indices.resize(m_capacity);
        m_numbers.resize(m_capacity);
        clear();
    }

    T acquire()
    {
        if (m_size == m_capacity)
            throw std::out_of_range("No more numbers available.");

        T i = m_size++;
        T n = m_numbers[i];
        m_indices[n] = i;

        return n + m_offset;
    }

    T capacity() const
    {
        return m_capacity;
    }

    void clear()
    {
        m_size = 0;
        for (T i = 0; i < m_capacity; ++i)
            m_numbers[i] = i;
    }

    void release(T n)
    {
        n -= m_offset;

        T last = m_numbers[--m_size];
        T i = m_indices[n];
        m_indices[last] = i;
        m_numbers[m_size] = n;
        m_numbers[i] = last;
    }

    T size() const
    {
        return m_size;
    }

    bool validate(T n) const
    {
        n -= m_offset;

        T i = m_indices[n];
        return i < m_size && m_numbers[i] == n;
    }

private:
    std::vector<T> m_indices;
    std::vector<T> m_numbers;
    T m_capacity;
    T m_size;
    T m_offset;
};

Can be used as follows (feel free to make proper asserts in a test framework):

NumberPool<uint16_t> pool{ 5, 1000 };
uint16_t n1000 = pool.acquire();
uint16_t n1001 = pool.acquire();
uint16_t n1002 = pool.acquire();
uint16_t size3 = pool.size();

bool n1002true = pool.validate(n1002);
pool.release(n1002);
bool n1002false = pool.validate(n1002);
uint16_t size2 = pool.size();

uint16_t n1002b = pool.acquire();
uint16_t n1003 = pool.acquire();
uint16_t n1004 = pool.acquire();
uint16_t size5 = pool.size();
bool n1004true = pool.validate(n1004);

pool.clear();
uint16_t size0 = pool.size();

bool n1000false = pool.validate(n1000);
bool n1004false = pool.validate(n1004);

uint16_t n1000b = pool.acquire();
bool n1000btrue = pool.validate(n1000b);

回复收藏 0 原文

~没有更多了~