多个双打的字典顺序

发布于 2024-09-09 02:30:42 字数 1314 浏览 7 评论 0原文

考虑一个双精度类型的类

class path_cost {
   double length;
   double time;
};

如果我想按字典顺序对 path_costs 列表进行排序，我就会遇到问题。继续阅读:)

如果我像这样使用精确相等进行相等测试，则

bool operator<(const path_cost& rhs) const {
   if (length == rhs.length) return time < rhs.time;
   return length < rhs.length;
}

结果顺序可能是错误的，因为一个小的偏差（例如，由于长度计算中的数值不准确）可能会导致长度测试失败，所以例如

{ 231.00000000000001, 40 } < { 231.00000000000002, 10 }

错误地成立。

如果我选择像这样使用容差

bool operator<(const path_cost& rhs) const {
   if (std::fabs(length-rhs.length)<1-e6)) return time < rhs.time;
   return length < rhs.length;
}

，那么排序算法可能会严重失败，因为 < 运算符不再具有传递性（也就是说，如果 a < b 且 b < c，则 a < c 可能不成立

）？解决方案？我考虑过对真实行进行分区，以便每个分区内的数字被认为是相等的，但这仍然留下太多相等测试失败但不应该失败的情况。

（James Curran 的更新，希望能解释这个问题）：给定数字：

A = {231.0000001200, 10}
B = {231.0000000500, 40}
C = {231.0000000100, 60}
- A.长度和长度B.长度相差7-e7，所以我们使用时间，并且A < B、
- B.长度& C.长度相差4-e7，所以我们使用时间，并且B < C、
- A.长度和长度C.长度相差1.1-e6，所以我们使用长度，并且A> C、

（Esben Mose Hansen 更新）这并不纯粹是理论上的。当给定非传递排序运算符时，标准排序算法往往会崩溃或更糟。这正是我一直在争论的问题（调试起来真是太有趣了；））

原文

Consider a class of type doubles

class path_cost {
   double length;
   double time;
};

If I want to lexicographically order a list of path_costs, I have a problem. Read on :)

If I use exact equal for the equality test like so

bool operator<(const path_cost& rhs) const {
   if (length == rhs.length) return time < rhs.time;
   return length < rhs.length;
}

the resulting order is likely to be wrong, because a small deviation (e.g. due to numerical inaccuracies in the calculation of the length) may cause the length test to fail, so that e.g.

{ 231.00000000000001, 40 } < { 231.00000000000002, 10 }

erroneously holds.

If I alternatively use a tolerance like so

bool operator<(const path_cost& rhs) const {
   if (std::fabs(length-rhs.length)<1-e6)) return time < rhs.time;
   return length < rhs.length;
}

then the sorting algorithm may horribly fail since the <-operator is no longer transitive (that is, if a < b and b < c then a < c may not hold)

Any ideas? Solutions? I have thought about partitioning the real line, so that numbers within each partition is considered equal, but that still leaves too many cases where the equality test fails but should not.

(UPDATE by James Curran, hopefully explaining the problem):
Given the numbers:

A = {231.0000001200, 10}
B = {231.0000000500, 40}
C = {231.0000000100, 60}
- A.Length & B.Length differ by 7-e7, so we use time, and A < B.
- B.Length & C.Length differ by 4-e7, so we use time, and B < C.
- A.Length & C.Length differ by 1.1-e6, so we use length, and A > C.

(Update by Esben Mose Hansen)
This is not purely theoretical. The standard sort algorithms tends to crash or worse when given a non-transitive sort operator. And this is exactly what I been contending with (and boy was that fun to debug ;) )

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

倾听心声的旋律 2024-09-16 02:30:42

您真的只想要一个比较功能吗？

为什么不先按长度排序，然后将这些对分组为您认为长度相同的组，然后按时间在每个组中排序？

按长度排序后，您可以应用所需的任何启发式方法来确定长度的“相等”并进行分组。

回复收藏 0 原文

野鹿林 2024-09-16 02:30:42

我不认为你能够做你想做的事。本质上你似乎是在说，在某些情况下你想忽略 a>b 的事实并假装 a=b。我非常确定您可以构造一个证明，证明当差值小于某个值时，如果 a 和 b 相等，则 a 和 b 对于 a 和 b 的所有值都是相等的。大致如下：

对于C和两个数字A和B的公差，其中不失一般性，A>1。 B 则存在 D(n) = B+n*(C/10) 其中 0<=n<=(10*(AB))/(C)这样，D(n) 基本上在 D(n-1) 和 D(n+1) 的容差范围内，因此等价于它们。另外，D(0) 是 B 并且 D((10*(AB))/(C))=A 因此 A 和 B 可以说是等价的。

我认为解决该问题的唯一方法是使用分区方法。乘以 10^6 然后转换为 int shoudl 分区就很好了，但这意味着如果你有 1.00001*10^-6 和 0.999999*10^-6 那么它们将出现在不同的分区中，这可能是不需要的。

然后问题就变成了查看您的数据以找出如何最好地对其进行分区，但我无能为力，因为我对您的数据一无所知。 :)

PS 当给定算法时或者仅仅当它们遇到特定的无法解决的情况时，算法实际上会崩溃吗？

回复收藏 0 原文