Haskell:“groupBy”的令人惊讶的行为

发布于 2024-08-03 02:09:25 字数 391 浏览 9 评论 0原文

我试图找出库函数 groupBy (来自 Data.List)的行为,它旨在通过作为第一个参数传入的“相等测试”函数对列表的元素进行分组。类型签名表明相等测试只需要具有类型

(a -> a -> Bool)

但是,当我在 GHCi 6.6 中使用 (<) 作为“相等测试”时,结果不是我所期望的:

ghci> groupBy (<) [1, 2, 3, 2, 4, 1, 5, 9]
[[1,2,3,2,4],[1,5,9]]

相反,我期望运行严格递增的数字,就像这样:

[[1,2,3],[2,4],[1,5,9]]

我错过了什么?

I'm trying to figure out the behavior of the library function groupBy (from Data.List), which purports to group elements of a list by an "equality test" function passed in as the first argument. The type signature suggests that the equality test just needs to have type

(a -> a -> Bool)

However, when I use (<) as the "equality test" in GHCi 6.6, the results are not what I expect:

ghci> groupBy (<) [1, 2, 3, 2, 4, 1, 5, 9]
[[1,2,3,2,4],[1,5,9]]

Instead I'd expect runs of strictly increasing numbers, like this:

[[1,2,3],[2,4],[1,5,9]]

What am I missing?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

画▽骨i 2024-08-10 02:09:25

查看 groupBy

groupBy                 :: (a -> a -> Bool) -> [a] -> [[a]]
groupBy _  []           =  []
groupBy eq (x:xs)       =  (x:ys) : groupBy eq zs
                           where (ys,zs) = span (eq x) xs

现在比较这两个输出:

Prelude List> groupBy (<) [1, 2, 3, 2, 4, 1, 5, 9]
[[1,2,3,2,4],[1,5,9]]
Prelude List> groupBy (<) [8, 2, 3, 2, 4, 1, 5, 9]
[[8],[2,3],[2,4],[1,5,9]]

简而言之,发生的情况是这样的:groupBy 假设给定函数(第一个参数)测试相等性,因此假设比较函数是自反传递对称(参见等价关系)。这里的问题是小于关系不是自反的,也不是对称的。


编辑:以下实现仅假设传递性:

groupBy' :: (a -> a -> Bool) -> [a] -> [[a]]
groupBy' _   []                        = []
groupBy' _   [x]                       = [[x]]
groupBy' cmp (x:xs@(x':_)) | cmp x x'  = (x:y):ys
                           | otherwise = [x]:r
  where r@(y:ys) = groupBy' cmp xs

Have a look at the ghc implementation of groupBy:

groupBy                 :: (a -> a -> Bool) -> [a] -> [[a]]
groupBy _  []           =  []
groupBy eq (x:xs)       =  (x:ys) : groupBy eq zs
                           where (ys,zs) = span (eq x) xs

Now compare these two outputs:

Prelude List> groupBy (<) [1, 2, 3, 2, 4, 1, 5, 9]
[[1,2,3,2,4],[1,5,9]]
Prelude List> groupBy (<) [8, 2, 3, 2, 4, 1, 5, 9]
[[8],[2,3],[2,4],[1,5,9]]

In short, what happens is this: groupBy assumes that the given function (the first argument) tests for equality, and thus assumes that the comparison function is reflexive, transitive and symmetric (see equivalence relation). The problem here is that the less-than relation is not reflexive, nor symmetric.


Edit: The following implementation only assumes transitivity:

groupBy' :: (a -> a -> Bool) -> [a] -> [[a]]
groupBy' _   []                        = []
groupBy' _   [x]                       = [[x]]
groupBy' cmp (x:xs@(x':_)) | cmp x x'  = (x:y):ys
                           | otherwise = [x]:r
  where r@(y:ys) = groupBy' cmp xs
昔日梦未散 2024-08-10 02:09:25

事实是“<”不是平等测试。

您可能会期望某些行为,因为您会以不同的方式实现它,但这并不是它所承诺的。

为什么它输出的是合理答案的一个例子是,如果它扫描它,那么

[1, 2, 3, 2, 4, 1, 5, 9] ->
[[1,2,3], [2,4], [1,5,9]]

Now 有 3 组相等的元素。因此,它检查它们中的任何一个实际上是否相同:

因为它知道每个组中的所有元素都相等,所以它可以只查看每个组中的第一个元素,1、2 和 1。

1 > > 2?是的!所以它合并了前两组。

1> 1?不!所以剩下最后一组。

现在它比较所有元素是否相等。

...只是,您没有传递给它它所期望的那种功能。

简而言之,当它需要相等性测试时,给它一个相等性测试

The fact that "<" isn't an equality test.

You might expect some behavior because you'd implement it differently, but that isn't what it promises.

An example of why what it outputs is a reasonable answer is if it sweeps through it, doing

[1, 2, 3, 2, 4, 1, 5, 9] ->
[[1,2,3], [2,4], [1,5,9]]

Now has 3 groups of equal elements. So it checks if any of them are in fact the same:

Since it knows all elements in each group is equal, it can just look at the first element in each, 1, 2 and 1.

1 > 2? Yes! So it merges the first two groups.

1 > 1? No! So it leaves the last group be.

And now it's compared all elements for equality.

...only, you didn't pass it the kind of function it expected.

In short, when it wants an equality test, give it an equality test.

深巷少女 2024-08-10 02:09:25

问题在于 Haskell 报告中 groupBy 的参考实现将元素与第一个元素进行比较,因此组并不是严格递增的(它们只需全部大于第一个元素)。相反,您想要的是在相邻元素上进行测试的groupBy版本,例如实现此处

The problem is that the reference implementation of groupBy in the Haskell Report compares elements against the first element, so the groups are not strictly increasing (they just have to be all bigger than the first element). What you want instead is a version of groupBy that tests on adjacent elements, like the implementation here.

这样的小城市 2024-08-10 02:09:25

我想指出的是,groupBy 函数还要求在应用之前对列表进行排序。

例如:

equalityOp :: (a, b1) -> (a, b2) -> Bool
equalityOp x y = fst x == fst y

testData = [(1, 2), (1, 4), (2, 3)]

correctAnswer = groupBy equalityOp testData == [[(1, 2), (1, 4)], [(2, 3)]]

otherTestData = [(1, 2), (2, 3), (1, 4)]

incorrectAnswer = groupBy equalityOp otherTestData == [[(1, 2)], [(2, 3)], [(1, 4)]]

出现此行为是因为 groupBy 在其定义中使用了 span。为了获得不依赖于我们以任何特定顺序拥有底层列表的合理行为,我们可以定义一个函数:

groupBy' :: (a -> a -> Bool) -> [a] -> [[a]]
groupBy' eq []     = []
groupBy' eq (x:xs) = (x:similarResults) : (groupBy' eq differentResults)
    where similarResults   = filter (eq x) xs
          differentResults = filter (not . eq x) xs

I'd just like to point out that the groupBy function also requires your list to be sorted before being applied.

For example:

equalityOp :: (a, b1) -> (a, b2) -> Bool
equalityOp x y = fst x == fst y

testData = [(1, 2), (1, 4), (2, 3)]

correctAnswer = groupBy equalityOp testData == [[(1, 2), (1, 4)], [(2, 3)]]

otherTestData = [(1, 2), (2, 3), (1, 4)]

incorrectAnswer = groupBy equalityOp otherTestData == [[(1, 2)], [(2, 3)], [(1, 4)]]

This behaviour comes about because groupBy is using span in its definition. To get reasonable behaviour which doesn't rely on us having the underlying list in any particular order we can define a function:

groupBy' :: (a -> a -> Bool) -> [a] -> [[a]]
groupBy' eq []     = []
groupBy' eq (x:xs) = (x:similarResults) : (groupBy' eq differentResults)
    where similarResults   = filter (eq x) xs
          differentResults = filter (not . eq x) xs
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文