Data.Foldable 用于无序容器

发布于 2024-12-17 19:54:11 字数 2042 浏览 7 评论 0原文

我正在研究一种用于数据库操作的 Haskell-meets-SQL 语言，以及与之配套的通用类型类库，从 Hackage 中抄袭任何有意义的地方。

由于数据库查询优化器的一个重要目标是消除不必要的排序，因此保留实际上需要排序的静态表示非常重要。这让我们为折叠定义一个类型类。

Haskell 的 Data.Foldable 具有：（删除与我的观点无关的默认定义）

class Foldable t where
  -- | Combine the elements of a structure using a monoid.
  fold :: Monoid m => t m -> m

  -- | Map each element of the structure to a monoid,
  -- and combine the results.
  foldMap :: Monoid m => (a -> m) -> t a -> m

  -- | Right-associative fold of a structure.
  foldr :: (a -> b -> b) -> b -> t a -> b

  -- | Left-associative fold of a structure.
  foldl :: (a -> b -> a) -> a -> t b -> a

  -- | A variant of 'foldr' that has no base case,
  -- and thus may only be applied to non-empty structures.
  foldr1 :: (a -> a -> a) -> t a -> a

  -- | A variant of 'foldl' that has no base case,
  -- and thus may only be applied to non-empty structures.
  foldl1 :: (a -> a -> a) -> t a -> a

在我看来，这个类忽略了一个区别，出于实际目的，这个区别并不那么重要大多数 Haskell 应用程序，但对数据库设置更感兴趣。也就是说：所有 Data.Foldable 实例都带有排序。

适用于不对其元素强加排序的容器类型的这个概念的概括名称是什么？

对于 Haskell Data.Set 来说效果很好，因为实现需要一个 Ord 上下文。不过，排序要求是一个实现工件，对于许多有用的类型，所使用的排序可能没有任何域级含义。

对于更一般的集合，fold :: Monoid m => TM-> m 本身的定义基本上是正确的（foldMap 也是如此）。我说主要是因为它的类型包括结合律（通过Monoid的定义），但不包括所需的交换律。其他变体甚至不存在。

我不想介绍一些不需要的东西。我也不想在无法追踪的地方引入非确定性。我有兴趣构建一种没有 toList :: Set a -> 的语言和库。 [a] 函数随处可见，因为它引入了以下两点之间的二分法：

允许人们观察有关集合/关系如何物理存储的实现细节
失去对非确定性效果的跟踪

显然两者都是 sortBy ： : (a -> a -> 排序) ->设置-> [a] 和 shuffle:: 设置 a -> Data.Random.RVar [a] 很有用，无可争议，并且将被包含在内。事实上，sortBy 有一个更通用的类型，如 sortBy :: (TheUnorderedFoldableClassIAmTryingToName f) => (a->a->排序)->发-> [a]。

这个想法叫什么？如果我偏离了基地，我在哪里离开了基地路径？

原文

I'm working on a Haskell-meets-SQL language for database manipulations, and on a common type class library to go with it, cribbing from Hackage wherever it makes sense.

Because a significant objective of a database query optimizer is to eliminate unnecessary sorting, it's important to preserve a static representation of where sorting is in fact necessary. Which brings us to defining a typeclass for folds.

Haskell's Data.Foldable has: (eliding default definitions which aren't relevant to the point I'm making)

class Foldable t where
  -- | Combine the elements of a structure using a monoid.
  fold :: Monoid m => t m -> m

  -- | Map each element of the structure to a monoid,
  -- and combine the results.
  foldMap :: Monoid m => (a -> m) -> t a -> m

  -- | Right-associative fold of a structure.
  foldr :: (a -> b -> b) -> b -> t a -> b

  -- | Left-associative fold of a structure.
  foldl :: (a -> b -> a) -> a -> t b -> a

  -- | A variant of 'foldr' that has no base case,
  -- and thus may only be applied to non-empty structures.
  foldr1 :: (a -> a -> a) -> t a -> a

  -- | A variant of 'foldl' that has no base case,
  -- and thus may only be applied to non-empty structures.
  foldl1 :: (a -> a -> a) -> t a -> a

It seems to me that this class ignores a distinction which is, for practical purposes, not so important to most Haskell applications but of much more interest in a database setting. To wit: all Data.Foldable instances come with an ordering.

What is the name for the generalization of this concept that applies at container types which don't impose an ordering on their elements?

For Haskell Data.Sets it works out fine, because there is an Ord context required by the implementation. The ordering requirement is an implementation artifact though, and for many useful types the ordering being used may not have any domain-level meaning.

For sets more generally the fold :: Monoid m => t m -> m definition on its own is mostly right (so is foldMap). I say mostly because its type includes the associativity law (through the definition of Monoid) but not the required commutativity law. The other variants don't even exist.

I don't want to introduce sorts where they aren't needed. I also don't want to introduce non-determinism where it can't be tracked. I'm interested in building a language and library that doesn't have a toList :: Set a -> [a] function lying around anywhere, because it introduces a dichotomy between:

Allowing people to observe implementation details about how a set/relation is physically stored
Losing track of non-determinism as an effect

Obviously both sortBy :: (a -> a -> Ordering) -> Set a -> [a] and shuffle :: Set a -> Data.Random.RVar [a] are useful, unobjectionable, and will be included. In fact, sortBy has an even more general type as sortBy :: (TheUnorderedFoldableClassIAmTryingToName f) => (a -> a -> Ordering) -> f a -> [a].

What is this idea called? If I'm way off base, where did I leave the base path?

分享到QQ

分享到微博