使用 IComparer 进行随机播放

发布于 2024-07-13 17:21:51 字数 1976 浏览 6 评论 0原文

首先，我确实了解费舍尔-耶茨洗牌。但为了论证起见，我想允许用户从下拉列表中选择排序选项。该列表将包括“随机”选项。根据他们的选择结果，我只想用 IComparer 实例替换我的排序。 IComparer 会是什么样子？

谷歌提出了大量有缺陷的结果，全部采用这种形式：

public class NaiveRandomizer<T> : IComparer<T>
{
    private static Random rand = new Random();

    public int Compare(T x, T y)
    {
        return (x.Equals(y))?0:rand.Next(-1, 2);
    }
}

然而，这种实现是有偏见的，甚至在某些情况下会抛出异常。可以使用以下代码来演示这种偏差：

void Test()
{
    Console.WriteLine("NaiveRandomizer Test:");
    var data = new List<int>() {1,2,3};
    var sortCounts = new Dictionary<string, int>(6);
    var randomly = new NaiveRandomizer<int>();

    for (int i=0;i<10000;i++)
    {   //always start with same list, in _the same order_.
        var dataCopy = new List<int>(data); 
        dataCopy.Sort(randomly);

        var key = WriteList(dataCopy);
        if (sortCounts.ContainsKey(key))
            sortCounts[key]++;
        else
            sortCounts.Add(key, 1);
    }

    foreach (KeyValuePair<string, int> item in sortCounts)
        Console.WriteLine(item.Key + "\t" + item.Value);
}

string WriteList<T>(List<T> list)
{
   string delim = "";
   string result = "";
   foreach(T item in list)
   {
       result += delim + item.ToString();
       delim = ", ";
   }
   return result;
}

那么如何实现随机 IComparer 来解决这些问题呢？允许要求每次调用 .Sort() 使用单独的 IComparer 实例，因为我没有看到任何其他方法可以做到这一点：项目必须进行比较使用一些其他真正随机的值，但该值对于给定排序操作中的项目也必须一致。

我有一个开始这里，但它是仓促发布的，极其慢，甚至没有返回所有可能的排序（测试表明，如果你不计算的话，它至少消除了偏见缺少的选项）。我不期望像 Fisher-Yates 那样具有 O(n) 性能，但我确实想要一些合理的东西（n log n 对于较小的 n），并且我确实希望它显示所有可能的排序。不幸的是，该链接是该问题当前接受的答案，因此我希望能够用更好的东西替换它。

如果不出意外的话，我希望这能吸引所有那些寻找 IComparable 解决方案的谷歌查询——他们最终会在这里而不是在其他地方告诉他们使用不正确的版本。

原文

First of all, I do know about the Fisher-Yates shuffle. But lets say for arguments sake that I want to allow the user to pick a sort option from a Dropdown list. This list would include a "Random" option. Based on the result of their selection I just want to substitute in an IComparer instance for my sort. What would the IComparer look like?

Google brings up a plethora of flawed results that all take this form:

public class NaiveRandomizer<T> : IComparer<T>
{
    private static Random rand = new Random();

    public int Compare(T x, T y)
    {
        return (x.Equals(y))?0:rand.Next(-1, 2);
    }
}

However, that implementation is biased and will even throw an exception in some circumstances. The bias can be demonstrated with the following code:

void Test()
{
    Console.WriteLine("NaiveRandomizer Test:");
    var data = new List<int>() {1,2,3};
    var sortCounts = new Dictionary<string, int>(6);
    var randomly = new NaiveRandomizer<int>();

    for (int i=0;i<10000;i++)
    {   //always start with same list, in _the same order_.
        var dataCopy = new List<int>(data); 
        dataCopy.Sort(randomly);

        var key = WriteList(dataCopy);
        if (sortCounts.ContainsKey(key))
            sortCounts[key]++;
        else
            sortCounts.Add(key, 1);
    }

    foreach (KeyValuePair<string, int> item in sortCounts)
        Console.WriteLine(item.Key + "\t" + item.Value);
}

string WriteList<T>(List<T> list)
{
   string delim = "";
   string result = "";
   foreach(T item in list)
   {
       result += delim + item.ToString();
       delim = ", ";
   }
   return result;
}

So how could you implement a random IComparer<T> that solved those issues? It is allowed to require each call to .Sort() to use a separate IComparer instance, as I don't see any other way to do this: items must be compared using some other, truly random value, but that value must also be consistent for an item within a given sort operation.

I have a start here, but it was posted in haste, is extremely slow, and doesn't even return all possible sorts (testing shows that it does at least eliminate bias, if you don't count the missing options). I don't expect O(n) performance like Fisher-Yates, but I do want something reasonable (n log n for a small-ish n), and I do expect it to show all possible sorts. Unfortunately, that link is the current accepted answer for it's question and so I'm hoping to be able to replace it with something a little better.

If nothing else, I want this to be a magnet for all those google queries looking for an IComparable solution- that they'll end up here instead of somewhere else telling them to use the incorrect version.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

天荒地未老 2024-07-20 17:21:51

我对此线程感到有些惊讶发布了多少个错误答案。只是为了其他提出与 OP 发布的解决方案类似的解决方案的人，以下代码看起来是正确的：

int[] nums = new int[1000];
for (int i = 0; i < nums.Length; i++)
{
    nums[i] = i;
}

Random r = new Random();
Array.Sort<int>(nums, (x, y) => r.Next(-1, 2));

foreach(var num in nums)
{
    Console.Write("{0} ", num);
}

但是，代码偶尔会抛出异常，但并非总是如此。这就是调试变得有趣的原因:)如果您运行它足够多次，或者在循环中执行排序过程 50 次左右，您将收到一条错误消息：

IComparer（或其依赖的 IComparable 方法）当 Array.Sort 调用 x 时没有返回零。比较 (x)。 x: '0' x's type: 'Int32' IComparer: ''.

换句话说，快速排序将某个数字 x 与其自身进行比较并得到非零结果。代码的明显解决方案是这样写：

Array.Sort<int>(nums, (x, y) =>
    {
        if (x == y) return 0;
        else return r.NextDouble() < 0.5 ? 1 : -1;
    });

但即使这样也不起作用，因为有时 .NET 会比较 3 个数字，而这些数字会返回不一致的结果，例如 A > > 。 B、B＞ C，并且C＞1。一个（哎呀！）。无论您使用 Guid、GetHashCode 还是任何其他随机生成的输入，像上面所示的解决方案仍然是错误的。

话虽如此，Fisher-Yates 是洗牌数组的标准方法，因此没有真正的理由首先使用 IComparer。 Fisher-Yates 的复杂度为 O(n)，而使用 IComparer 的任何实现都在幕后使用快速排序，其时间复杂度为 O(n log n)。没有充分的理由不使用众所周知的、高效的、标准的算法来解决此类问题。

但是，如果您确实坚持使用 IComparer 和 rand，请在排序之前应用随机数据。这需要将数据投影到另一个对象上，这样你就不会丢失随机数据：

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace ConsoleApplication1
{
    class Pair<T, U>
    {
        public T Item1 { get; private set; }
        public U Item2 { get; private set; }
        public Pair(T item1, U item2)
        {
            this.Item1 = item1;
            this.Item2 = item2;
        }
    }

    class Program
    {
        static void Main(string[] args)
        {
            Pair<int, double>[] nums = new Pair<int, double>[1000];
            Random r = new Random();
            for (int i = 0; i < nums.Length; i++)
            {
                nums[i] = new Pair<int, double>(i, r.NextDouble());
            }

            Array.Sort<Pair<int, double>>(nums, (x, y) => x.Item2.CompareTo(y.Item2));

            foreach (var item in nums)
            {
                Console.Write("{0} ", item.Item1);
            }

            Console.ReadKey(true);
        }
    }
}

或者用你的坏自我获取 LINQy：

Random r = new Random();
var nums = from x in Enumerable.Range(0, 1000)
           orderby r.NextDouble()
           select x;

I was somewhat surprised in this thread how many wrong answers were posted. Just for the sake of others who come up with a solution similar to the one posted by the OP, the following code looks correct:

int[] nums = new int[1000];
for (int i = 0; i < nums.Length; i++)
{
    nums[i] = i;
}

Random r = new Random();
Array.Sort<int>(nums, (x, y) => r.Next(-1, 2));

foreach(var num in nums)
{
    Console.Write("{0} ", num);
}

However, the code will throw an exception occasionally, but not always. That's what makes it fun to debug :) If you run it enough times, or execute the sort procedure in a loop 50 or so times, you'll get an error stating:

IComparer (or the IComparable methods it relies upon) did not return zero when Array.Sort called x. CompareTo(x). x: '0' x's type: 'Int32' The IComparer: ''.

In other words, the quick sort compared some number x to itself and got a non-zero result. The obvious solution to the code would be write:

Array.Sort<int>(nums, (x, y) =>
    {
        if (x == y) return 0;
        else return r.NextDouble() < 0.5 ? 1 : -1;
    });

But even this doesn't work, because there are occasions where .NET compares 3 numbers against one another which return inconsistent results, such as A > B, B > C, and C > A (oops!). No matter if you use a Guid, GetHashCode, or any other randomly generated input, a solution like the one shown above is still wrong.

With that being said, Fisher-Yates is the standard way of shuffling arrays, so there's no real reason to use IComparer in the first place. Fisher-Yates is O(n) whereas any implementation using IComparer uses a quicksort behinds the scenes which has a time-complexity of O(n log n). There's just no good reason not to use the well-known, efficient, standard algorithm to solve this kind of problem.

However, if you really insist on using an IComparer and a rand, then apply your random data before you sort. This requires a projection of the data onto another object so you don't lose your random data:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace ConsoleApplication1
{
    class Pair<T, U>
    {
        public T Item1 { get; private set; }
        public U Item2 { get; private set; }
        public Pair(T item1, U item2)
        {
            this.Item1 = item1;
            this.Item2 = item2;
        }
    }

    class Program
    {
        static void Main(string[] args)
        {
            Pair<int, double>[] nums = new Pair<int, double>[1000];
            Random r = new Random();
            for (int i = 0; i < nums.Length; i++)
            {
                nums[i] = new Pair<int, double>(i, r.NextDouble());
            }

            Array.Sort<Pair<int, double>>(nums, (x, y) => x.Item2.CompareTo(y.Item2));

            foreach (var item in nums)
            {
                Console.Write("{0} ", item.Item1);
            }

            Console.ReadKey(true);
        }
    }
}

Or get LINQy with your bad self:

Random r = new Random();
var nums = from x in Enumerable.Range(0, 1000)
           orderby r.NextDouble()
           select x;

回复收藏 0 原文

清浅ˋ旧时光 2024-07-20 17:21:51

我在其他地方得到的一个建议是创建一个单独的 IArranger 界面，该界面描述排列集合的单个操作。这可以在 IComparer/IComparable 无法工作的情况下工作，因为它对整个集合而不是单个项目进行操作。它可能看起来像这样：

public interface IArranger<T>
{
    IEnumerable<T> Arrange(IEnumerable<T> items);
}

然后我可以使用适当的 Fisher-Yates 算法从 IArranger 接口实现 Shuffle，并且还可以实现包装每个附加的 IEnumerable.Sort()/IComparable /IComparer 我关心的品种。这可能看起来像这样：

public class ComparerArranger<T> : IArranger<T>
{
    private IComparer<T> comparer;

    public ComparableArranger(IComparer<T> comparer)
    {
        this.comparer = comparer;
    }

    public IEnumerable<T> Arrange(IEnumerable<T> items)
    {
       return items.OrderBy(i => i, comparer);
    }
}

或者

//uses the default Comparer for the type (Comparer<T>.Default)
public class TypeArranger<T> : IArranger<T> 
{
    public IEnumerable<T> Arrange(IEnumerable<T> items)
    {
       return items.OrderBy(i => i);
    }
}

或者

public class ShuffleArranger<T> : IArranger<T>
{
    //naive implementation for demonstration
    // if I ever develop this more completely I would try to
    // avoid needing to call .ToArray() in here
    // and use a better prng
    private Random r = new Random();

    public IEnumerable<T> Arrange(IEnumerable<T> items)
    {
        var values = items.ToArray();

        //valid Fisher-Yates shuffle on the values array
        for (int i = values.Length; i > 1; i--)
        {
            int j = r.Next(i);
            T tmp = values[j];
            values[j] = values[i - 1];
            values[i - 1] = tmp;
        }
        foreach (var item in values) yield return item;
    }
}

对于最后一步，我通过扩展方法向任何 IEnumerable 添加对此的支持。然后，您仍然可以获得简单的运行时算法交换，您可以更好地实现洗牌算法，并且使用它的代码感觉很自然：

public static IEnumerable<T> Arrange(this IEnumerable<T> items, IArranger<T> arranger)
{
    return arranger.Arrange(items);
}

One suggestion I got elsewhere was to create a separate IArranger interface that describes a single operation to Arrange a collection. This can work where IComparer/IComparable cannot because it operates on an entire collection, instead of individual items. It might look something like this:

public interface IArranger<T>
{
    IEnumerable<T> Arrange(IEnumerable<T> items);
}

Then I could implement a Shuffle from the IArranger interface using a proper Fisher-Yates algorithm, and also have implementations that wrap each additional IEnumerable.Sort()/IComparable/IComparer varieties that I care about. That might look something like this:

public class ComparerArranger<T> : IArranger<T>
{
    private IComparer<T> comparer;

    public ComparableArranger(IComparer<T> comparer)
    {
        this.comparer = comparer;
    }

    public IEnumerable<T> Arrange(IEnumerable<T> items)
    {
       return items.OrderBy(i => i, comparer);
    }
}

//uses the default Comparer for the type (Comparer<T>.Default)
public class TypeArranger<T> : IArranger<T> 
{
    public IEnumerable<T> Arrange(IEnumerable<T> items)
    {
       return items.OrderBy(i => i);
    }
}

public class ShuffleArranger<T> : IArranger<T>
{
    //naive implementation for demonstration
    // if I ever develop this more completely I would try to
    // avoid needing to call .ToArray() in here
    // and use a better prng
    private Random r = new Random();

    public IEnumerable<T> Arrange(IEnumerable<T> items)
    {
        var values = items.ToArray();

        //valid Fisher-Yates shuffle on the values array
        for (int i = values.Length; i > 1; i--)
        {
            int j = r.Next(i);
            T tmp = values[j];
            values[j] = values[i - 1];
            values[i - 1] = tmp;
        }
        foreach (var item in values) yield return item;
    }
}

For a final step, I add support for this to any IEnumerable via an extension method. Then you still get the simple run-time algorithm swapping, you have a better implementation of the shuffle algorithm, and the code to use it feels natural:

public static IEnumerable<T> Arrange(this IEnumerable<T> items, IArranger<T> arranger)
{
    return arranger.Arrange(items);
}

回复收藏 0 原文

岛歌少女 2024-07-20 17:21:51

IComparer 在某个点（对于 T 的相同实例）要求零返回，使得在数学上不可能创建一个在统计上模拟 Fisher-Yates Shuffle 的通用 IComparer。总会有偏见。对于真正的洗牌，您永远不想强迫它返回任何特定值。

回复收藏 0 原文

千年*琉璃梦 2024-07-20 17:21:51

基于预分配随机值的隐藏字段进行排序怎么样？

回复收藏 0 原文

初见终念 2024-07-20 17:21:51

跟进 James Curran 的想法：让 IComparer 将“排序”值维护为列表；如果出现新值，则将其插入列表中的随机位置；按列表索引进行比较。通过将列表维护为平衡树或其他东西来进行优化。此类 IComparer 的每个实例都将保持一致且随机的排序顺序，因此您可以选择让随机排序始终保持相同的随机顺序或每次都不同。如果您更喜欢以这种方式“随机”阅读，较小的修改甚至允许将相同的元素“排序”到不同的排序位置。

回复收藏 0 原文

断爱 2024-07-20 17:21:51

一个有趣的尝试。最有可能的是 IComparer 的误用/滥用。

您正在尝试使用并非为此目的构建的机制来进行随机加权排序。

为什么不实现你自己的排序例程和你自己的比较器呢？我有一种感觉，即使这样也还不够。

回复收藏 0 原文

余生再见 2024-07-20 17:21:51

不要这样做。

迄今为止提出的所有算法都会在输出中引入某种偏差（有些比其他更大）。

@Princess 和@Luke 建议在数据旁边存储一个随机数。然而，由于这些随机数中的任何两个都可能具有与另一个相同的值，因此这两个项目之间的排序顺序将存在确定性偏差。

最坏的情况是排序例程“稳定”（即被认为相等的对象总是按照它们输入的顺序输出）。 Array.Sort 碰巧不稳定（它在内部使用 QuickSort），但每当两个项目具有相同的值时，仍然会出现偏差，这取决于它们在输入中的位置（特别是它们相对于 QuickSort 的位置）枢）。

随着该随机数的密钥空间增加，冲突的概率会下降（具有良好的随机性来源），但请记住，随着要排序的值的数量增加，生日悖论表明，其中至少一对碰撞起来速度很快。

对于整数键，该键有 2^32 个唯一值，即使假设随机值完全均匀分布（有 75,000 行），也有 50% 的概率发生冲突。维基百科。

您提出的加密哈希方法可能具有足够大的密钥空间（160）位，使得冲突的可能性可以忽略不计，但是您的算法在实际进行比较之前将所有随机性分解回单个整数，这否定了更大的键空间。

最好的方法是将不同的“sortOrder”值与每个数据项相关联，使用经过验证的算法对这些值进行混洗，然后按该值对结果进行排序。

如果您使用 Array.Sort，则会有一个重载，它采用“键”数组和“值”数组。键数组通常是排序的，但每当移动键数组中的值时，值数组中的相应条目也会移动。

就像是：


Something[] data;//populated somewhere
int[] keys = new int[data.Length];//or long if you might have lots of data
for(int i=0;i<keys.Length;++i) {
 keys[i] = i;
}

Shuffle(keys);

Array.Sort(keys, data);

Don't do it.

All of the algorithms proposed thus far introduce some sort of bias into the output (some bigger than others).

@Princess and @Luke propose storing a random number alongside the data. However because there is a possibility that any two of these random numbers could have the same value as another, the sort order between those two items will be deterministically biased

The worst case for this would be if the sorting routine is "stable" (that is that objects that are considered equal are always output in the same order they were input in). Array.Sort doesn't happen to be stable (it uses QuickSort internally) but there is still a bias that occurs whenever two items have the same value that depends on where they are in the input (and specifically where they are relative to the QuickSort's pivot).

As the keyspace for this random number increases, the probability of a collision goes down (with a good source of randomness), but keep in mind that as the number of values you are sorting goes up, the birthday paradox dictates that the likelyhood of at least one pair amongst them colliding goes up very quickly.

For an integer key, there are 2^32 unique values for the key and even assuming that there is a perfectly even distribution of random values, with 75,000 rows, there is a 50% probability that there will be a collision. Wikipedia.

The cryptographic hash approach that you proposed potentially has a large enough keyspace (160) bits to make the chance of a collision negligible, but your algorithm decomposes all of that randomness back down to a single int before actually doing the compare which negates the benefit of that larger keyspace.

Your best approach is to associate a distinct "sortOrder" value with each of your data items shuffle these values using a proven algorithm, and then order the results by that value.

If you are using Array.Sort, there is an overload that takes an array of "keys" and an array of "values". The keys array is sorted normally, but whenever a value in the keys array is moved, the corresponding entry in the values array is also moved.

Something like:


Something[] data;//populated somewhere
int[] keys = new int[data.Length];//or long if you might have lots of data
for(int i=0;i<keys.Length;++i) {
 keys[i] = i;
}

Shuffle(keys);

Array.Sort(keys, data);

回复收藏 0 原文

~没有更多了~