Haskell 中 .NET 的 GroupBy 函数

发布于 2024-11-08 15:26:34 字数 974 浏览 8 评论 0原文

.NET框架中的LINQ库确实有一个非常有用的函数，称为GroupBy，我一直在使用它。它在 Haskell 中的类型如下所示

Ord b => (a-> b) -> [a] -> [(b, [a])]

它的目的是根据给定的分类函数 f 将项目分类到桶中，每个桶包含相似的项目，即 (b, l) 这样对于 l 中的任何项 x，fx == b。

它在 .NET 中的性能是 O(N)，因为它使用哈希表，但在 Haskell 中我可以接受 O(N*log(N))。

我在标准 Haskell 库中找不到类似的东西。另外，我在标准函数方面的实现有点庞大：

myGroupBy :: Ord k => (a -> k) -> [a] -> [(k, [a])]
myGroupBy f = map toFst
        . groupBy ((==) `on` fst) 
        . sortBy (comparing fst) 
        . map (\a -> (f a, a))
    where
        toFst l@((k,_):_) = (k, map snd l)

这绝对不是我想在特定于问题的代码中看到的东西。

我的问题是：如何最大限度地利用标准库来实现这个功能？

此外，似乎缺乏这样的标准函数暗示经验丰富的 Haskellers 可能很少需要它，因为他们可能知道一些更好的方法。这是真的吗？可以用什么更好的方式实现类似的功能？

另外，考虑到 groupBy 已经被采用，它的好名字是什么？ :)

原文

LINQ library in .NET framework does have a very useful function called GroupBy, which I have been using all the time.
Its type in Haskell would look like

Ord b => (a-> b) -> [a] -> [(b, [a])]

Its purpose is to classify items based on the given classification function f into buckets, with each bucket containing similar items, that is (b, l) such that for any item x in l, f x == b.

Its performance in .NET is O(N) because it uses hash-tables, but in Haskell I am OK with O(N*log(N)).

I can't find anything similar in standard Haskell libraries. Also, my implementation in terms of standard functions is somewhat bulky:

myGroupBy :: Ord k => (a -> k) -> [a] -> [(k, [a])]
myGroupBy f = map toFst
        . groupBy ((==) `on` fst) 
        . sortBy (comparing fst) 
        . map (\a -> (f a, a))
    where
        toFst l@((k,_):_) = (k, map snd l)

This is definitely not something I want to see amongst my problem-specific code.

My question is: how can I implement this function nicely exploiting standard libraries to their maximum?

Also, the seeming absence of such a standard function hints that it may rarely be needed by experienced Haskellers because they may know some better way. Is that true? What can be used to implement similar functionality in a better way?

Also, what would be the good name for it, considering groupBy is already taken? :)

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

草莓味的萝莉 2024-11-15 15:26:34

GHC.Exts.groupWith

groupWith :: Ord b => (a -> b) -> [a] -> [[a]]

作为广义列表理解的一部分引入：http://www.haskell.org/ghc/docs/7.0.2/html/users_guide/syntax-extns.html#generalized-list-compressives

GHC.Exts.groupWith

groupWith :: Ord b => (a -> b) -> [a] -> [[a]]

Introduced as part of generalised list comprehensions: http://www.haskell.org/ghc/docs/7.0.2/html/users_guide/syntax-extns.html#generalised-list-comprehensions

回复收藏 0 原文

往昔成烟 2024-11-15 15:26:34

使用 Data.Map 作为中间结构：

import Control.Arrow ((&&&))
import qualified Data.Map as M

myGroupBy f = M.toList . M.fromListWith (++) . map (f &&& return)

map 操作将输入列表转换为与包含元素的单例列表配对的键列表。 M.fromListWith (++) 将其转换为 Data.Map，当两个项目具有相同的键时连接，并且 M.toList 得到两人再次退出。

请注意，这会颠倒列表，因此如有必要，请进行调整。例如，如果您只需要每个组中元素的总和，也可以轻松地将 return 和 (++) 替换为其他类似幺半群的操作。

Using Data.Map as the intermediate structure:

import Control.Arrow ((&&&))
import qualified Data.Map as M

myGroupBy f = M.toList . M.fromListWith (++) . map (f &&& return)

The map operation turns the input list into a list of keys paired with singleton lists containing the elements. M.fromListWith (++) turns this into a Data.Map, concatenating when two items have the same key, and M.toList gets the pairs back out again.

Note that this reverses the lists, so adjust for that if necessary. It is also easy to replace return and (++) with other monoid-like operations if you for example only wanted the sum of the elements in each group.

回复收藏 0 原文

~没有更多了~