CompreTo() 中的随机数与 GetHashCode()?

发布于 2024-12-21 05:01:29 字数 1485 浏览 3 评论 0原文

当两个结构具有相同的字段值时,我在结构的 CompareTo() 中使用 Random 类以相同的概率选择其中一个结构。 Random 类使用固定种子进行实例化,以获得可重现的伪随机值序列,以确保我的程序无论使用相同的输入运行多少次,都会给出相同的精确比较结果。

我正在考虑用内存引用或 GetHashCode() 替换随机数。这样做是否可以保证:

(1)以相同的概率进行选择,以及

(2)如果再次运行该程序,我最终会得到相同的结果?

struct MyStruct : IComparable<MyStruct>
{
        private readonly float _param1;
        private readonly float _param2;
        private readonly int _randValue;

        public MyStruct(float param1, float param2)
        {
                _param1 = param1;
                _param2 = param2;
                _randValue = _random.Next();
        }

        public int CompareTo(MyStruct other)
        {
            if (_param1 < other._param1)
            {
                return -1;
            }
            else if (_param1 > other._param1)
            {
                return 1;
            }
            else if (_param2 > other._param2)
            {
                return -1;
            }
            else if (_param2 < other._param2)
            {
                return 1;
            }
            // If both params are equal, then select one of the structs with
            // equal probability
            else if (_randValue < other._randValue)
            {
                return -1;
            }
            else if (_randValue > other._randValue)
            {
                return 1;
            }

            return 0;
        }
}

谢谢 !

I'm using the Random class in my struct's CompareTo() to pick, with equal probability, one of the structs when both have the same field values. The Random class is instantiated with a fixed seed to get a reproducible sequence of pseudo-random values, to ensure that my program would give the same exact comparison results no matter how many times I run it with the same input.

I'm thinking of replacing the random numbers with a memory reference or GetHashCode() instead. Will doing so guarantee that:

(1) the selection is made with equal probability, and

(2) that I would end up with the same results if I run the program again ?

struct MyStruct : IComparable<MyStruct>
{
        private readonly float _param1;
        private readonly float _param2;
        private readonly int _randValue;

        public MyStruct(float param1, float param2)
        {
                _param1 = param1;
                _param2 = param2;
                _randValue = _random.Next();
        }

        public int CompareTo(MyStruct other)
        {
            if (_param1 < other._param1)
            {
                return -1;
            }
            else if (_param1 > other._param1)
            {
                return 1;
            }
            else if (_param2 > other._param2)
            {
                return -1;
            }
            else if (_param2 < other._param2)
            {
                return 1;
            }
            // If both params are equal, then select one of the structs with
            // equal probability
            else if (_randValue < other._randValue)
            {
                return -1;
            }
            else if (_randValue > other._randValue)
            {
                return 1;
            }

            return 0;
        }
}

Thanks !

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

童话 2024-12-28 05:01:29

当两个结构体具有相同的字段值时,我在结构体的 CompareTo() 中使用 Random 类以相同的概率选择其中一个结构体。

首先,这是一件完全奇怪的事情。这就像说“当我被要求对一堆数字进行排序时,其中两个都是 12,我随机选择 12 个中较小的一个”。这没有任何意义。这两个十二相同。你没有办法区分十二个!

你为什么要做这种奇怪的事?如果两个值相同,则说它们相同。

更仔细地阅读您的代码后,我发现您将随机数保留到结构的状态中。如果你想做这件奇怪的事情,那就是正确的方法。

我最初认为您正在随机化比较运算符本身这是一件极其危险的事情。排序算法可以强烈依赖于全序排序。 需要进行比较才能找到自洽的总排序。你绝对不能说第一个比第二个大,第二个比第三个大,第三个比第一个大。这违反了比较所需的传递性,并且当给定行为不良的比较操作时,允许排序算法进入无限循环或执行任何其他奇怪的行为。

我正在考虑用内存引用或 GetHashCode() 替换随机数。

这是一个更糟糕的想法。 GetHashCode 对一件事有用,而且只对一件事有用:平衡哈希表。如果您没有平衡哈希表并调用 GetHashCode,您就做错了

而且,要三思而后行。您所处的情况是两个结构在其他方面比较相等。 根据合同,GetHashCode 必须为任何两个比较相等的结构返回相同的结果。 GetHashCode 显然不是两个相同事物之间消歧的来源!事实上恰恰相反。

这能保证以相同的概率进行选择吗?

没有。 GetHashCode 不是随机性来源,也不保证哈希码的分布。

这是否能保证我再次运行该程序时会得到相同的结果?

绝对不是。

I'm using the Random class in my struct's CompareTo() to pick, with equal probability, one of the structs when both have the same field values.

First off, that's a completely bizarre thing to do. That's like saying "When I'm asked to sort a bunch of numbers, and two of them are both 12, I pick one of the 12's at random to be smaller". That doesn't make a lick of sense. Those two twelves are identical. You don't have a way to tell one twelve from another!

Why are you doing this strange thing? If the two values are identical then say they are identical.

Upon reading your code more carefully, I see that you're persisting the random number into the state of the structure. If you want to do this strange thing, that's the right way to go about it.

I originally thought that you were randomizing the comparison operator itself. That is an extremely dangerous thing to do. Sorting algorithms are allowed to take strong dependencies on the sort being a total order sort. A comparison is required to find a total ordering that is self consistent. You must absolutely never say that the first item is bigger than the second, the second is bigger than the third, and the third is bigger than the first. That violates the required transitivity of the comparison, and a sort algorithm is permitted to go into an infinite loop or do any other odd behaviour when given a comparison operation that is ill-behaved.

I'm thinking of replacing the random numbers with a memory reference or GetHashCode() instead.

That is an even worse idea. GetHashCode is useful for one thing and one thing only: balancing a hash table. If you're not balancing a hash table and you call GetHashCode you are doing something wrong.

Moreover, think it through. The situation you're in is that two structs otherwise compare as equal. GetHashCode is contractually required to return the same result for any two structs that compare as equal. GetHashCode is explicitly not a source of disambiguation between two identical things! It is in fact the opposite of that.

Will this guarantee that the selection is made with equal probability?

Nope. GetHashCode is not a source of randomness and no guarantees whatsoever are made about the distribution of hash codes.

Will this guarantee that I would end up with the same results if I run the program again ?

Absolutely not.

可是我不能没有你 2024-12-28 05:01:29

您的代码并不像某些人怀疑的那样危险,因为您对数字的使用是一致的(它们仅在对象创建时是随机的)。

但我不明白为什么这会带来任何好处。

考虑没有 _randValue 的情况。假设您有一个结构体(我们称之为 x),其中 _param1 等于 2.0,_param2 等于 0.12,还有另一个结构体(我们称之为将其称为 y),其中 _param1 等于 2.0,_param2 等于 0.12。

好吧,使 xy 之间任何不同的唯一方法是您添加了 _randValue他们。

因为它们是结构体,所以它们甚至在赋值和装箱之间没有持久的标识。如果我们执行MyStruct z = x,我们就没有另一个指向x的指针,我们有一个全新的MyStruct

即使除此之外,也没有什么区别。

您所做更改的唯一影响是:

  1. 您已为该结构的所有情况添加了额外的内存使用量。
  2. 你让排序变得更加昂贵。
  3. 你让建筑变得更加昂贵。
  4. 您已经使构造成为多线程瓶颈,因为您必须锁定 Random.Next()

这些都可能不是特别重要,但过早的悲观主义是奇怪现象的根源。

Your code isn't dangerous as some suspect, because you are consistent in the use of the numbers (they're random only on object creation).

What I can't see though, is why on earth this could give any benefit.

Consider the case without _randValue. Say you've one struct (we'll call it x) where _param1 equals 2.0 and _param2 equals .12, and another struct (we'll call it y) where _param1 equals 2.0 and _param2 equals .12.

Well, the only way that makes anything different between x and y is that you've added a _randValue to them.

Because they're structs, they don't even have a persistent identity between assignments and boxings. If we do MyStruct z = x we don't have another pointer to x we have a brand new MyStruct.

And even besides that, it makes no difference.

The sole effect of your changes are:

  1. You've added extra memory usage to all cases of the structure.
  2. You've made sorting more expensive.
  3. You've made construction more expensive.
  4. You've made construction a multi-threading bottleneck, because you have to lock on Random.Next().

None of these are likely to be particularly significant, but premature pessimisation is the root much weirdness.

紫﹏色ふ单纯 2024-12-28 05:01:29

“内存引用”是指结构的地址吗?如果你想要可预测性,那么你就不能使用内存地址。

你建议哈希什么?如果对相等的结构体属性进行哈希处理,则哈希码也将相等。

我想我很困惑:1)为什么 Random 不适合你,2)为什么你不只是将两个具有相等值的结构称为“相等”?

By "Memory Reference" do you mean the address of the struct? If you want predictability then you can't use memory addresses.

What are you proposing to hash? If you hash properties of the struct that are equal the hash codes will be equal as well.

I guess I'm confused by 1) why Random is not working for you and 2) why you don't just call two structs with equal values "equal"?

一个人的旅程 2024-12-28 05:01:29

由于 Random 类正在执行您想要的操作,并且您能够为其提供种子以确保每次都获得相同的值,因此为什么要更改它?

我不完全确定您打算使用内存引用做什么,但即使您可以指向相同的地址并在每次运行代码时看到相同的数据,您也无法保证内存中值的公平分配除非你用随机函数填充它。

哈希函数应该返回公平分布的值,但它并不是真正的工作工具 - 如果您想要随机数,请使用随机数生成器!

Since the Random class is doing what you want, and you're able to seed it to ensure that you get the same values every time, why do you want to change it?

I'm not entirely sure what you plan to do using a memory reference, but even if you could point at the same address and see the same data every time you run the code, you couldn't guarantee a fair distribution of values in memory unless you've filled it with with a random function anyway.

A hashing function should return a fair spread of values, but it's not really the tool for the job — if you want a random number, user a random number generator!

So尛奶瓶 2024-12-28 05:01:29

我个人更喜欢纯随机数,但要回答你的观点:

  1. 是的,它是一种哈希算法,就像 md5 或 sha 一样(尽管该算法不是专门为你描述的目的而创建的)
  2. ,该值将在程序启动之间保持不变(@henk-holterman 是正确的,但不能保证该值保持不变仅适用于字符串
  3. GetHashCode 会更快

I'd personally prefer just a pure random number, but to answer your points:

  1. Yes, it's a hash algorithm, just like md5 or sha (although this algorithm was not specifically created for the purposes you describe)
  2. Yes, the value will be sustained between program launches (@henk-holterman is correct but the value is not guaranteed to stay the same only for strings)
  3. GetHashCode will be way faster
世界和平 2024-12-28 05:01:29

我对您的代码的阅读表明您正在使用 rand 有一个决胜局。我不明白为什么你会想要区分相同的对象,甚至关心相同对象的顺序。

例如,在此列表中 -

 A
 B
 B
 C

为什么您会关心或想知道 B 的哪个实例是第一个?

我建议更好的解决方案是添加对用户有意义的细粒度字段,例如创建或修改时间戳的日期。然后你就会有一个有意义的决胜局,尽管平局仍然可能发生,但我认为这不会成为问题。

My reading of your code says that you are using rand has a tie-breaker. I can't see why you would want identical objects differentiated, or even care as to the order of identical objects.

e.g. in this list-

 A
 B
 B
 C

why would you care or want to know which instance of B is first?

I would suggest the better solution would be to add fine grained field that makes sense to the user, say a date created or modified timestamp. You would then have a meaningful tie-breaker, though ties could still occur, I just don't think they would be a problem.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文