非常基本的基数排序

发布于 2024-10-17 23:23:42 字数 3135 浏览 2 评论 0原文

我刚刚写了一个简单的迭代基数排序，我想知道我的想法是否正确。
递归实现似乎更为常见。

我正在对 4 字节整数进行排序（为了简单起见，无符号）。
我使用 1 字节作为“数字”。所以我有 2^8=256 个桶。
我首先对最高有效数字 (MSD) 进行排序。
每次排序后，我按照它们在存储桶中存在的顺序将它们放回到数组中，然后执行下一次排序。
所以我最终做了 4 个桶排序。
它似乎适用于一小部分数据。因为我正在做 MSD，所以我猜这并不稳定，并且可能因不同的数据而失败。

我错过了什么重要的事情吗？

#include <iostream>
#include <vector>
#include <list>

using namespace std;

void radix(vector<unsigned>&);
void print(const vector<list<unsigned> >& listBuckets);
unsigned getMaxForBytes(unsigned bytes);
void merge(vector<unsigned>& data, vector<list<unsigned> >& listBuckets);

int main()
{
    unsigned d[] = {5,3,6,9,2,11,9, 65534, 4,10,17,13, 268435455, 4294967294,4294967293, 268435454,65537};
    vector<unsigned> v(d,d+17);

    radix(v);
    return 0;
}

void radix(vector<unsigned>& data)
{
    int bytes = 1;                                  //  How many bytes to compare at a time
    unsigned numOfBuckets = getMaxForBytes(bytes) + 1;
    cout << "Numbuckets" << numOfBuckets << endl;
    int chunks = sizeof(unsigned) / bytes;

    for(int i = chunks - 1; i >= 0; --i) 
    {
        vector<list<unsigned> > buckets;            // lazy, wasteful allocation
        buckets.resize(numOfBuckets);

        unsigned mask = getMaxForBytes(bytes);
        unsigned shift = i * bytes * 8;
        mask = mask << shift;

        for(unsigned j = 0; j < data.size(); ++j)
        {
            unsigned bucket = data[j] & mask;       //  isolate bits of current chunk
            bucket = bucket >> shift;               //  bring bits down to least significant

            buckets[bucket].push_back(data[j]); 
        }

        print(buckets);

        merge(data,buckets);
    }
}

unsigned getMaxForBytes(unsigned bytes)
{
    unsigned max = 0;
    for(unsigned i = 1; i <= bytes; ++i)
    {
        max = max << 8;
        max |= 0xFF;
    }

    return max;
}

void merge(vector<unsigned>& data, vector<list<unsigned> >& listBuckets)
{
    int index = 0;
    for(unsigned i = 0; i < listBuckets.size(); ++i)
    {
        list<unsigned>& list = listBuckets[i];
        std::list<unsigned>::const_iterator it = list.begin();

        for(; it != list.end(); ++it)
        {
            data[index] = *it;
            ++index;
        }
    }
}

void print(const vector<list<unsigned> >& listBuckets)
{
    cout << "Printing listBuckets: " << endl;
    for(unsigned i = 0; i < listBuckets.size(); ++i)
    {
        const list<unsigned>& list = listBuckets[i];

        if(list.size() == 0) continue;

        std::list<unsigned>::const_iterator it = list.begin();  //  Why do I need std here!?
        for(; it != list.end(); ++it)
        {
            cout << *it << ", ";
        }

        cout << endl;
    }
}

更新：
似乎以 LSD 形式运行良好，可以通过更改基数中的块循环来修改它，如下所示：

for(int i = chunks - 1; i >= 0; --i)

原文

I just wrote a simple iterative radix sort and I'm wondering if I have the right idea.
Recursive implementations seem to be much more common.

I am sorting 4-byte integers (unsigned to keep it simple).
I am using 1-byte as the 'digit'. So I have 2^8=256 buckets.
I am sorting the most significant digit (MSD) first.
After each sort I put them back into array in the order they exist in buckets and then perform the next sort.
So I end up doing 4 bucket sorts.
It seems to work for a small set of data. Since I am doing it MSD I'm guessing that's not stable and may fail with different data.

Did I miss anything major?

#include <iostream>
#include <vector>
#include <list>

using namespace std;

void radix(vector<unsigned>&);
void print(const vector<list<unsigned> >& listBuckets);
unsigned getMaxForBytes(unsigned bytes);
void merge(vector<unsigned>& data, vector<list<unsigned> >& listBuckets);

int main()
{
    unsigned d[] = {5,3,6,9,2,11,9, 65534, 4,10,17,13, 268435455, 4294967294,4294967293, 268435454,65537};
    vector<unsigned> v(d,d+17);

    radix(v);
    return 0;
}

void radix(vector<unsigned>& data)
{
    int bytes = 1;                                  //  How many bytes to compare at a time
    unsigned numOfBuckets = getMaxForBytes(bytes) + 1;
    cout << "Numbuckets" << numOfBuckets << endl;
    int chunks = sizeof(unsigned) / bytes;

    for(int i = chunks - 1; i >= 0; --i) 
    {
        vector<list<unsigned> > buckets;            // lazy, wasteful allocation
        buckets.resize(numOfBuckets);

        unsigned mask = getMaxForBytes(bytes);
        unsigned shift = i * bytes * 8;
        mask = mask << shift;

        for(unsigned j = 0; j < data.size(); ++j)
        {
            unsigned bucket = data[j] & mask;       //  isolate bits of current chunk
            bucket = bucket >> shift;               //  bring bits down to least significant

            buckets[bucket].push_back(data[j]); 
        }

        print(buckets);

        merge(data,buckets);
    }
}

unsigned getMaxForBytes(unsigned bytes)
{
    unsigned max = 0;
    for(unsigned i = 1; i <= bytes; ++i)
    {
        max = max << 8;
        max |= 0xFF;
    }

    return max;
}

void merge(vector<unsigned>& data, vector<list<unsigned> >& listBuckets)
{
    int index = 0;
    for(unsigned i = 0; i < listBuckets.size(); ++i)
    {
        list<unsigned>& list = listBuckets[i];
        std::list<unsigned>::const_iterator it = list.begin();

        for(; it != list.end(); ++it)
        {
            data[index] = *it;
            ++index;
        }
    }
}

void print(const vector<list<unsigned> >& listBuckets)
{
    cout << "Printing listBuckets: " << endl;
    for(unsigned i = 0; i < listBuckets.size(); ++i)
    {
        const list<unsigned>& list = listBuckets[i];

        if(list.size() == 0) continue;

        std::list<unsigned>::const_iterator it = list.begin();  //  Why do I need std here!?
        for(; it != list.end(); ++it)
        {
            cout << *it << ", ";
        }

        cout << endl;
    }
}

Update:
Seems to work well in LSD form which it can be modified by changing the the chunk loop in radix as follows:

for(int i = chunks - 1; i >= 0; --i)

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

世态炎凉 2024-10-24 23:23:42

让我们看一下使用两位十进制数的示例：

49, 25, 19, 27, 87, 67, 22, 90, 47, 91

按第一位数字排序会产生

19, 25, 27, 22, 49, 47, 67, 87, 90, 91

接下来，按第二位数字排序会产生

90, 91, 22, 25, 27, 47, 67, 87, 19, 49

似乎是错误的，不是吗？或者这不是你正在做的事情？如果我弄错了，也许你可以向我们展示代码。

如果您对具有相同第一位数字的所有组进行第二次存储桶排序，则您的算法将相当于递归版本。也会很稳定。唯一的区别是您将按广度优先而不是深度优先进行存储桶排序。

Let's look at en example with two-digit decimal numbers:

49, 25, 19, 27, 87, 67, 22, 90, 47, 91

Sorting by the first digit yields

19, 25, 27, 22, 49, 47, 67, 87, 90, 91

Next, you sort by the second digit, yielding

90, 91, 22, 25, 27, 47, 67, 87, 19, 49

Seems wrong, doesn't it? Or isn't this what you are doing? Maybe you can show us the code if I got you wrong.

If you are doing the second bucket sort on all groups with the same first digit(s), your algorithm would be equivalent to the recursive version. It would be stable as well. The only difference is that you'd do the bucket sorts breadth-first instead of depth-first.

回复收藏 0 原文

前事休说 2024-10-24 23:23:42

您还需要确保在重新组装之前将每个桶从 MSD 排序到 LSD。
例子：
19,76,90,34,84,12,72,38
在 MSD 上分为 10 个桶 [0-9]
B0=[];B1=[19,12];B2=[];B3=[34,38];B4=[];B5=[];B6=[];B7=[76,72];B8 =[84];B9=[90];
如果你要重新组装然后再次排序，那就行不通了。相反，对每个存储桶进行递归排序。
B1排序为B1B2=[12];B1B9=[19]
一旦所有内容都排序完毕，您就可以正确地重新组装。