有效地检查(大)列表的所有元素是否相同

发布于 2024-11-09 10:58:49 字数 2228 浏览 0 评论 0原文

问题

让我们假设我们有一个列表 xs(可能是一个非常大的列表),并且我们想要检查它的所有元素是否相同。

我想出了各种想法:

解决方案 0

检查 tail xs 中的所有元素是否等于 head xs

allTheSame :: (Eq a) => [a] -> Bool
allTheSame xs = and $ map (== head xs) (tail xs)

解决方案 1

检查 length xs 是等于从 xs 中获取元素而获得的列表长度,同时它们等于 head xs

allTheSame' :: (Eq a) => [a] -> Bool
allTheSame' xs = (length xs) == (length $ takeWhile (== head xs) xs)

解决方案 2

递归解决方案:allTheSame 返回 <如果 xs 的前两个元素相等,并且 allTheSamexs< 的其余部分返回 True,则 code>True /code>

allTheSame'' :: (Eq a) => [a] -> Bool
allTheSame'' xs
  | n == 0 = False
  | n == 1 = True
  | n == 2 = xs !! 0 == xs !! 1
  | otherwise = (xs !! 0 == xs !! 1) && (allTheSame'' $ snd $ splitAt 2 xs)
    where  n = length xs

解决方案 3

分而治之:

allTheSame''' :: (Eq a) => [a] -> Bool
allTheSame''' xs
  | n == 0 = False
  | n == 1 = True
  | n == 2 = xs !! 0 == xs !! 1
  | n == 3 = xs !! 0 == xs !! 1 && xs !! 1 == xs !! 2
  | otherwise = allTheSame''' (fst split) && allTheSame''' (snd split)
    where n = length xs
          split = splitAt (n `div` 2) xs

解决方案 4

我只是在写这个问题时想到了这一点:

allTheSame'''' :: (Eq a) => [a] -> Bool
allTheSame'''' xs = all (== head xs) (tail xs)

问题

  1. 我认为解决方案 0 不是很高效,至少在内存方面,因为 map将在将 and 应用于其元素之前构造另一个列表。我是对的吗?

  2. 解决方案 1 仍然不是很高效,至少在内存方面是这样,因为 takeWhile 将再次构建一个额外的列表。我对吗?

  3. 解决方案 2 是尾递归(对吗?),它应该非常高效,因为只要 (xs !! 0 == xs !! 1) 它就会返回 False 为 False。我说得对吗?

  4. 解决方案 3 应该是最好的,因为它的复杂度应该是 O(log n)

  5. 解决方案 4 看起来相当不错对我来说哈斯克尔式的(是吗?),但它可能与解决方案 0 相同,因为 all p = 和 .映射 p (来自 Prelude.hs)。我是对的吗?

  6. 还有其他更好的方式来编写allTheSame吗?现在,我希望有人会回答这个问题,告诉我有一个内置函数可以做到这一点:我已经用 hoogle 进行了搜索,但没有找到它。不管怎样,既然我正在学习 Haskell,我相信这对我来说是一个很好的练习:)

欢迎任何其他评论。谢谢你!

Problem

Let us suppose that we have a list xs (possibly a very big one), and we want to check that all its elements are the same.

I came up with various ideas:

Solution 0

checking that all elements in tail xs are equal to head xs:

allTheSame :: (Eq a) => [a] -> Bool
allTheSame xs = and $ map (== head xs) (tail xs)

Solution 1

checking that length xs is equal to the length of the list obtained by taking elements from xs while they're equal to head xs

allTheSame' :: (Eq a) => [a] -> Bool
allTheSame' xs = (length xs) == (length $ takeWhile (== head xs) xs)

Solution 2

recursive solution: allTheSame returns True if the first two elements of xs are equal and allTheSame returns True on the rest of xs

allTheSame'' :: (Eq a) => [a] -> Bool
allTheSame'' xs
  | n == 0 = False
  | n == 1 = True
  | n == 2 = xs !! 0 == xs !! 1
  | otherwise = (xs !! 0 == xs !! 1) && (allTheSame'' $ snd $ splitAt 2 xs)
    where  n = length xs

Solution 3

divide and conquer:

allTheSame''' :: (Eq a) => [a] -> Bool
allTheSame''' xs
  | n == 0 = False
  | n == 1 = True
  | n == 2 = xs !! 0 == xs !! 1
  | n == 3 = xs !! 0 == xs !! 1 && xs !! 1 == xs !! 2
  | otherwise = allTheSame''' (fst split) && allTheSame''' (snd split)
    where n = length xs
          split = splitAt (n `div` 2) xs

Solution 4

I just thought about this while writing this question:

allTheSame'''' :: (Eq a) => [a] -> Bool
allTheSame'''' xs = all (== head xs) (tail xs)

Questions

  1. I think Solution 0 is not very efficient, at least in terms of memory, because map will construct another list before applying and to its elements. Am I right?

  2. Solution 1 is still not very efficient, at least in terms of memory, because takeWhile will again build an additional list. Am I right?

  3. Solution 2 is tail recursive (right?), and it should be pretty efficient, because it will return False as soon as (xs !! 0 == xs !! 1) is False. Am I right?

  4. Solution 3 should be the best one, because it complexity should be O(log n)

  5. Solution 4 looks quite Haskellish to me (is it?), but it's probably the same as Solution 0, because all p = and . map p (from Prelude.hs). Am I right?

  6. Are there other better ways of writing allTheSame? Now, I expect someone will answer this question telling me that there's a build-in function that does this: I've searched with hoogle and I haven't found it. Anyway, since I'm learning Haskell, I believe that this was a good exercise for me :)

Any other comment is welcome. Thank you!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(9

两仪 2024-11-16 10:58:49

gatoatigrado 的回答为衡量各种解决方案的性能提供了一些很好的建议。这是一个更具象征意义的答案。

我认为解决方案 0(或者,完全等效,解决方案 4)将是最快的。请记住,Haskell 是懒惰的,因此 map 在应用 and 之前不必构造整个列表。建立对此的直觉的一个好方法是玩弄无穷大。例如:

ghci> and $ map (< 1000) [1..]
False

这询问是否所有数字都小于 1,000。如果 map 在应用 and 之前构造了整个列表,那么这个问题永远无法得到解答。即使你给列表一个非常大的右端点,表达式仍然会快速回答(也就是说,Haskell 不会根据列表是否无限执行任何“魔法”)。

为了开始我的示例,让我们使用这些定义:

and [] = True
and (x:xs) = x && and xs

map f [] = []
map f (x:xs) = f x : map f xs

True && x = x
False && x = False

这是 allTheSame [7,7,7,7,8,7,7,7] 的评估顺序。会有额外的分享,写下来太痛苦了。为了简洁起见,我还将更早地评估 head 表达式(无论如何它都会被评估,所以几乎没有什么不同)。

allTheSame [7,7,7,7,8,7,7,7]
allTheSame (7:7:7:7:8:7:7:7:[])
and $ map (== head (7:7:7:7:8:7:7:7:[])) (tail (7:7:7:7:8:7:7:7:[]))
and $ map (== 7)  (tail (7:7:7:7:8:7:7:7:[]))
and $ map (== 7)          (7:7:7:8:7:7:7:[])
and $ (== 7) 7 : map (== 7) (7:7:8:7:7:7:[])
(== 7) 7 && and (map (== 7) (7:7:8:7:7:7:[]))
True     && and (map (== 7) (7:7:8:7:7:7:[]))
            and (map (== 7) (7:7:8:7:7:7:[]))
(== 7) 7 && and (map (== 7)   (7:8:7:7:7:[]))
True     && and (map (== 7)   (7:8:7:7:7:[]))
            and (map (== 7)   (7:8:7:7:7:[]))
(== 7) 7 && and (map (== 7)     (8:7:7:7:[]))
True     && and (map (== 7)     (8:7:7:7:[]))
            and (map (== 7)     (8:7:7:7:[]))
(== 7) 8 && and (map (== 7)       (7:7:7:[]))
False    && and (map (== 7)       (7:7:7:[]))
False

看看我们甚至不需要检查最后 3 个 7 吗?这是一种惰性求值,使得列表的工作方式更像是一个循环。所有其他解决方案都使用昂贵的函数,例如 length (必须一直走到列表末尾才能给出答案),因此它们的效率会较低,而且也无法在无限情况下工作列表。在 Haskell 中,处理无限列表和提高效率通常是相辅相成的。

gatoatigrado's answer gives some nice advice for measuring the performance of various solutions. Here is a more symbolic answer.

I think solution 0 (or, exactly equivalently, solution 4) will be the fastest. Remember that Haskell is lazy, so map will not have to construct the whole list before and is applied. A good way to build intuition about this is to play with infinity. So for example:

ghci> and $ map (< 1000) [1..]
False

This asks whether all numbers are less than 1,000. If map constructed the entire list before and were applied, then this question could never be answered. The expression will still answer quickly even if you give the list a very large right endpoint (that is, Haskell is not doing any "magic" depending on whether a list is infinite).

To start my example, let's use these definitions:

and [] = True
and (x:xs) = x && and xs

map f [] = []
map f (x:xs) = f x : map f xs

True && x = x
False && x = False

Here is the evaluation order for allTheSame [7,7,7,7,8,7,7,7]. There will be extra sharing that is too much of a pain to write down. I will also evaluate the head expression earlier than it would be for conciseness (it would have been evaluated anyway, so it's hardly different).

allTheSame [7,7,7,7,8,7,7,7]
allTheSame (7:7:7:7:8:7:7:7:[])
and $ map (== head (7:7:7:7:8:7:7:7:[])) (tail (7:7:7:7:8:7:7:7:[]))
and $ map (== 7)  (tail (7:7:7:7:8:7:7:7:[]))
and $ map (== 7)          (7:7:7:8:7:7:7:[])
and $ (== 7) 7 : map (== 7) (7:7:8:7:7:7:[])
(== 7) 7 && and (map (== 7) (7:7:8:7:7:7:[]))
True     && and (map (== 7) (7:7:8:7:7:7:[]))
            and (map (== 7) (7:7:8:7:7:7:[]))
(== 7) 7 && and (map (== 7)   (7:8:7:7:7:[]))
True     && and (map (== 7)   (7:8:7:7:7:[]))
            and (map (== 7)   (7:8:7:7:7:[]))
(== 7) 7 && and (map (== 7)     (8:7:7:7:[]))
True     && and (map (== 7)     (8:7:7:7:[]))
            and (map (== 7)     (8:7:7:7:[]))
(== 7) 8 && and (map (== 7)       (7:7:7:[]))
False    && and (map (== 7)       (7:7:7:[]))
False

See how we didn't even have to check the last 3 7's? This is lazy evaluation making a list work more like a loop. All your other solutions use expensive functions like length (which have to walk all the way to the end of the list to give an answer), so they will be less efficient and also they will not work on infinite lists. Working on infinite lists and being efficient often go together in Haskell.

一向肩并 2024-11-16 10:58:49

首先,我认为您不想使用列表。你的很多算法都依赖于计算长度,这是不好的。您可能需要考虑 向量 包,与 O(n) 相比,它将为您提供 O(1) 长度列表。向量的内存效率也更高,特别是如果您可以使用未装箱或可存储的变体。

话虽如此,您确实需要考虑代码中的遍历和使用模式。如果 Haskell 的列表可以按需生成并使用一次,那么它们将非常高效。这意味着您不应该保留对列表的引用。像这样的事情:

average xs = sum xs / length xs

要求将整个列表保留在内存中(通过 sumlength),直到两次遍历完成。如果您可以一步完成列表遍历,那么效率会更高。

当然,您可能仍然需要保留列表,例如检查所有元素是否相等,如果不相等,则对数据执行其他操作。在这种情况下,对于任何大小的列表,您可能最好使用更紧凑的数据结构(例如向量)。

现在这已经不再是问题了,下面我们来看看这些函数中的每一个。在我显示核心的地方,它是使用 ghc-7.0.3 -O -ddump-simpl 生成的。另外,不要费心去判断使用 -O0 编译时的 Haskell 代码性能。使用您实际用于生产代码的标志来编译它,通常至少是 -O,也许还有其他选项。

解决方案 0

allTheSame :: (Eq a) => [a] -> Bool
allTheSame xs = and $ map (== head xs) (tail xs)

GHC 生成了这个核心:

Test.allTheSame
  :: forall a_abG. GHC.Classes.Eq a_abG => [a_abG] -> GHC.Bool.Bool
[GblId,
 Arity=2,
 Str=DmdType LS,
 Unf=Unf{Src=<vanilla>, TopLvl=True, Arity=2, Value=True,
         ConLike=True, Cheap=True, Expandable=True,
         Guidance=IF_ARGS [3 3] 16 0}]
Test.allTheSame =
  \ (@ a_awM)
    ($dEq_awN :: GHC.Classes.Eq a_awM)
    (xs_abH :: [a_awM]) ->
    case xs_abH of _ {
      [] ->
        GHC.List.tail1
        `cast` (CoUnsafe (forall a1_axH. [a1_axH]) GHC.Bool.Bool
                :: (forall a1_axH. [a1_axH]) ~ GHC.Bool.Bool);
      : ds1_axJ xs1_axK ->
        letrec {
          go_sDv [Occ=LoopBreaker] :: [a_awM] -> GHC.Bool.Bool
          [LclId, Arity=1, Str=DmdType S]
          go_sDv =
            \ (ds_azk :: [a_awM]) ->
              case ds_azk of _ {
                [] -> GHC.Bool.True;
                : y_azp ys_azq ->
                  case GHC.Classes.== @ a_awM $dEq_awN y_azp ds1_axJ of _ {
                    GHC.Bool.False -> GHC.Bool.False; GHC.Bool.True -> go_sDv ys_azq
                  }
              }; } in
        go_sDv xs1_axK
    }

实际上,这看起来相当不错。它会产生一个空列表的错误,但这很容易修复。这是 _ { [] -> 的 xs_abH 情况。在此 GHC 执行工作者/包装器转换之后,递归工作者函数是 letrec { go_sDv 绑定。工人检查其论点。如果[],则已到达列表末尾并返回 True。否则,它将剩余元素的头部与第一个元素进行比较,然后返回 False 或检查列表的其余部分。

其他三个功能。

  1. 地图完全融合了
    并且不分配临时的
    列表。
  2. 接近定义的顶部
    请注意 Cheap=True 语句。
    这意味着 GHC 认为
    功能“便宜”,因此
    内联的候选者。在通话时
    站点,如果具体参数类型
    可以确定,GHC可能会
    内联 allTheSame 并生成
    非常紧密的内环,完全
    绕过 Eq 字典
    抬头。
  3. 工人函数是
    尾递归。

结论:非常有力的竞争者。

解决方案 1

allTheSame' :: (Eq a) => [a] -> Bool
allTheSame' xs = (length xs) == (length $ takeWhile (== head xs) xs)

即使不查看核心,我也知道这不会那么好。该列表被遍历多次,首先遍历 length xs,然后遍历 length $ takeWhile。不仅会产生多次遍历的额外开销,而且意味着第一次遍历后列表必须保留在内存中并且不能被 GC 回收。对于一个大列表来说,这是一个严重的问题。

Test.allTheSame'
  :: forall a_abF. GHC.Classes.Eq a_abF => [a_abF] -> GHC.Bool.Bool
[GblId,
 Arity=2,
 Str=DmdType LS,
 Unf=Unf{Src=<vanilla>, TopLvl=True, Arity=2, Value=True,
         ConLike=True, Cheap=True, Expandable=True,
         Guidance=IF_ARGS [3 3] 20 0}]
Test.allTheSame' =
  \ (@ a_awF)
    ($dEq_awG :: GHC.Classes.Eq a_awF)
    (xs_abI :: [a_awF]) ->
    case GHC.List.$wlen @ a_awF xs_abI 0 of ww_aC6 { __DEFAULT ->
    case GHC.List.$wlen
           @ a_awF
           (GHC.List.takeWhile
              @ a_awF
              (let {
                 ds_sDq :: a_awF
                 [LclId, Str=DmdType]
                 ds_sDq =
                   case xs_abI of _ {
                     [] -> GHC.List.badHead @ a_awF; : x_axk ds1_axl -> x_axk
                   } } in
               \ (ds1_dxa :: a_awF) ->
                 GHC.Classes.== @ a_awF $dEq_awG ds1_dxa ds_sDq)
              xs_abI)
           0
    of ww1_XCn { __DEFAULT ->
    GHC.Prim.==# ww_aC6 ww1_XCn
    }
    }

仅仅看核心并不能说明更多的事情。但是,请注意以下几行:

case GHC.List.$wlen @ a_awF xs_abI 0 of ww_aC6 { __DEFAULT ->
        case GHC.List.$wlen

这是列表遍历发生的地方。第一个获取外部列表的长度并将其绑定到 ww_aC6。第二个获取内部列表的长度,但直到接近底部时才会发生绑定,在

of ww1_XCn { __DEFAULT ->
GHC.Prim.==# ww_aC6 ww1_XCn

长度(都是 Int)可以通过 primop 拆箱并比较,但这是一个很小的值引入开销后的安慰。

结论:不好。

解决方案2

allTheSame'' :: (Eq a) => [a] -> Bool
allTheSame'' xs
  | n == 0 = False
  | n == 1 = True
  | n == 2 = xs !! 0 == xs !! 1
  | otherwise = (xs !! 0 == xs !! 1) && (allTheSame'' $ snd $ splitAt 2 xs)
    where  n = length xs

这和解决方案1有同样的问题。链表被遍历多次,并且不能被GC。但这里情况更糟,因为现在要计算每个子列表的长度。我预计这在任何大尺寸的列表中都会有最差的性能。另外,当您期望列表很大时,为什么要对 1 和 2 元素的列表进行特殊处理?

结论:想都别想。

方案3

allTheSame''' :: (Eq a) => [a] -> Bool
allTheSame''' xs
  | n == 0 = False
  | n == 1 = True
  | n == 2 = xs !! 0 == xs !! 1
  | n == 3 = xs !! 0 == xs !! 1 && xs !! 1 == xs !! 2
  | otherwise = allTheSame''' (fst split) && allTheSame''' (snd split)
    where n = length xs
          split = splitAt (n `div` 2) xs

与方案2有同样的问题,即按length多次遍历列表。我不确定分而治之的方法是解决这个问题的好选择,它最终可能比简单的扫描花费更长的时间。但这取决于数据,并且值得测试。

结论:也许,如果你使用不同的数据结构。

解决方案 4

allTheSame'''' :: (Eq a) => [a] -> Bool
allTheSame'''' xs = all (== head xs) (tail xs)

这基本上是我的第一个想法。我们再检查一下核心。

Test.allTheSame''''
  :: forall a_abC. GHC.Classes.Eq a_abC => [a_abC] -> GHC.Bool.Bool
[GblId,
 Arity=2,
 Str=DmdType LS,
 Unf=Unf{Src=<vanilla>, TopLvl=True, Arity=2, Value=True,
         ConLike=True, Cheap=True, Expandable=True,
         Guidance=IF_ARGS [3 3] 10 0}]
Test.allTheSame'''' =
  \ (@ a_am5)
    ($dEq_am6 :: GHC.Classes.Eq a_am5)
    (xs_alK :: [a_am5]) ->
    case xs_alK of _ {
      [] ->
        GHC.List.tail1
        `cast` (CoUnsafe (forall a1_axH. [a1_axH]) GHC.Bool.Bool
                :: (forall a1_axH. [a1_axH]) ~ GHC.Bool.Bool);
      : ds1_axJ xs1_axK ->
        GHC.List.all
          @ a_am5
          (\ (ds_dwU :: a_am5) ->
             GHC.Classes.== @ a_am5 $dEq_am6 ds_dwU ds1_axJ)
          xs1_axK
    }

好吧,还不错。与解决方案 1 类似,这在空列表上会出错。列表遍历隐藏在 GHC.List.all 中,但它可能会在调用站点扩展为良好的代码。

结论:又一个强有力的竞争者。

因此,在所有这些列表中,我希望解决方案 0 和 4 是唯一值得使用的,而且它们几乎是相同的。在某些情况下我可能会考虑选项 3。

编辑:在这两种情况下,空列表上的错误可以像@augustss的答案一样简单地修复。

下一步是使用 criterion 进行一些时间分析。

First of all, I don't think you want to be working with lists. A lot of your algorithms rely upon calculating the length, which is bad. You may want to consider the vector package, which will give you O(1) length compared to O(n) for a list. Vectors are also much more memory efficient, particularly if you can use Unboxed or Storable variants.

That being said, you really need to consider traversals and usage patterns in your code. Haskell's lists are very efficient if they can be generated on demand and consumed once. This means that you shouldn't hold on to references to a list. Something like this:

average xs = sum xs / length xs

requires that the entire list be retained in memory (by either sum or length) until both traversals are completed. If you can do your list traversal in one step, it'll be much more efficient.

Of course, you may need to retain the list anyway, such as to check if all the elements are equal, and if they aren't, do something else with the data. In this case, with lists of any size you're probably better off with a more compact data structure (e.g. vector).

Now that this is out of they way, here's a look at each of these functions. Where I show core, it was generated with ghc-7.0.3 -O -ddump-simpl. Also, don't bother judging Haskell code performance when compiled with -O0. Compile it with the flags you would actually use for production code, typically at least -O and maybe other options too.

Solution 0

allTheSame :: (Eq a) => [a] -> Bool
allTheSame xs = and $ map (== head xs) (tail xs)

GHC produces this Core:

Test.allTheSame
  :: forall a_abG. GHC.Classes.Eq a_abG => [a_abG] -> GHC.Bool.Bool
[GblId,
 Arity=2,
 Str=DmdType LS,
 Unf=Unf{Src=<vanilla>, TopLvl=True, Arity=2, Value=True,
         ConLike=True, Cheap=True, Expandable=True,
         Guidance=IF_ARGS [3 3] 16 0}]
Test.allTheSame =
  \ (@ a_awM)
    ($dEq_awN :: GHC.Classes.Eq a_awM)
    (xs_abH :: [a_awM]) ->
    case xs_abH of _ {
      [] ->
        GHC.List.tail1
        `cast` (CoUnsafe (forall a1_axH. [a1_axH]) GHC.Bool.Bool
                :: (forall a1_axH. [a1_axH]) ~ GHC.Bool.Bool);
      : ds1_axJ xs1_axK ->
        letrec {
          go_sDv [Occ=LoopBreaker] :: [a_awM] -> GHC.Bool.Bool
          [LclId, Arity=1, Str=DmdType S]
          go_sDv =
            \ (ds_azk :: [a_awM]) ->
              case ds_azk of _ {
                [] -> GHC.Bool.True;
                : y_azp ys_azq ->
                  case GHC.Classes.== @ a_awM $dEq_awN y_azp ds1_axJ of _ {
                    GHC.Bool.False -> GHC.Bool.False; GHC.Bool.True -> go_sDv ys_azq
                  }
              }; } in
        go_sDv xs1_axK
    }

This looks pretty good, actually. It will produce an error with an empty list, but that's easily fixed. This is the case xs_abH of _ { [] ->. After this GHC performed a worker/wrapper transformation, the recursive worker function is the letrec { go_sDv binding. The worker examines its argument. If [], it's reached the end of the list and returns True. Otherwise it compares the head of the remaining to the first element and either returns False or checks the rest of the list.

Three other features.

  1. The map was entirely fused away
    and doesn't allocate a temporary
    list.
  2. Near the top of the definition
    notice the Cheap=True statement.
    This means GHC considers the
    function "cheap", and thus a
    candidate for inlining. At a call
    site, if a concrete argument type
    can be determined, GHC will probably
    inline allTheSame and produce a
    very tight inner loop, completely
    bypassing the Eq dictionary
    lookup.
  3. The worker function is
    tail-recursive.

Verdict: Very strong contender.

Solution 1

allTheSame' :: (Eq a) => [a] -> Bool
allTheSame' xs = (length xs) == (length $ takeWhile (== head xs) xs)

Even without looking at core I know this won't be as good. The list is traversed more than once, first by length xs then by length $ takeWhile. Not only do you have the extra overhead of multiple traversals, it means that the list must be retained in memory after the first traversal and can't be GC'd. For a big list, this is a serious problem.

Test.allTheSame'
  :: forall a_abF. GHC.Classes.Eq a_abF => [a_abF] -> GHC.Bool.Bool
[GblId,
 Arity=2,
 Str=DmdType LS,
 Unf=Unf{Src=<vanilla>, TopLvl=True, Arity=2, Value=True,
         ConLike=True, Cheap=True, Expandable=True,
         Guidance=IF_ARGS [3 3] 20 0}]
Test.allTheSame' =
  \ (@ a_awF)
    ($dEq_awG :: GHC.Classes.Eq a_awF)
    (xs_abI :: [a_awF]) ->
    case GHC.List.$wlen @ a_awF xs_abI 0 of ww_aC6 { __DEFAULT ->
    case GHC.List.$wlen
           @ a_awF
           (GHC.List.takeWhile
              @ a_awF
              (let {
                 ds_sDq :: a_awF
                 [LclId, Str=DmdType]
                 ds_sDq =
                   case xs_abI of _ {
                     [] -> GHC.List.badHead @ a_awF; : x_axk ds1_axl -> x_axk
                   } } in
               \ (ds1_dxa :: a_awF) ->
                 GHC.Classes.== @ a_awF $dEq_awG ds1_dxa ds_sDq)
              xs_abI)
           0
    of ww1_XCn { __DEFAULT ->
    GHC.Prim.==# ww_aC6 ww1_XCn
    }
    }

Looking at the core doesn't tell much beyond that. However, note these lines:

case GHC.List.$wlen @ a_awF xs_abI 0 of ww_aC6 { __DEFAULT ->
        case GHC.List.$wlen

This is where the list traversals happen. The first gets the length of the outer list and binds it to ww_aC6. The second gets the length of the inner list, but the binding doesn't happen until near the bottom, at

of ww1_XCn { __DEFAULT ->
GHC.Prim.==# ww_aC6 ww1_XCn

The lengths (both Ints) can be unboxed and compared by a primop, but that's a small consolation after the overhead that's been introduced.

Verdict: Not good.

Solution 2

allTheSame'' :: (Eq a) => [a] -> Bool
allTheSame'' xs
  | n == 0 = False
  | n == 1 = True
  | n == 2 = xs !! 0 == xs !! 1
  | otherwise = (xs !! 0 == xs !! 1) && (allTheSame'' $ snd $ splitAt 2 xs)
    where  n = length xs

This has the same problem as solution 1. The list is traversed multiple times, and it can't be GC'd. It's worse here though, because now the length is calculated for each sub-list. I'd expect this to have the worst performance of all on lists of any significant size. Also, why are you special-casing lists of 1 and 2 elements when you're expecting the list to be big?

Verdict: Don't even think about it.

Solution 3

allTheSame''' :: (Eq a) => [a] -> Bool
allTheSame''' xs
  | n == 0 = False
  | n == 1 = True
  | n == 2 = xs !! 0 == xs !! 1
  | n == 3 = xs !! 0 == xs !! 1 && xs !! 1 == xs !! 2
  | otherwise = allTheSame''' (fst split) && allTheSame''' (snd split)
    where n = length xs
          split = splitAt (n `div` 2) xs

This has the same problem as Solution 2. Namely, the list is traversed multiple times by length. I'm not certain a divide-and-conquer approach is a good choice for this problem, it could end up taking longer than a simple scan. It would depend on the data though, and be worth testing.

Verdict: Maybe, if you used a different data structure.

Solution 4

allTheSame'''' :: (Eq a) => [a] -> Bool
allTheSame'''' xs = all (== head xs) (tail xs)

This was basically my first thought. Let's check the core again.

Test.allTheSame''''
  :: forall a_abC. GHC.Classes.Eq a_abC => [a_abC] -> GHC.Bool.Bool
[GblId,
 Arity=2,
 Str=DmdType LS,
 Unf=Unf{Src=<vanilla>, TopLvl=True, Arity=2, Value=True,
         ConLike=True, Cheap=True, Expandable=True,
         Guidance=IF_ARGS [3 3] 10 0}]
Test.allTheSame'''' =
  \ (@ a_am5)
    ($dEq_am6 :: GHC.Classes.Eq a_am5)
    (xs_alK :: [a_am5]) ->
    case xs_alK of _ {
      [] ->
        GHC.List.tail1
        `cast` (CoUnsafe (forall a1_axH. [a1_axH]) GHC.Bool.Bool
                :: (forall a1_axH. [a1_axH]) ~ GHC.Bool.Bool);
      : ds1_axJ xs1_axK ->
        GHC.List.all
          @ a_am5
          (\ (ds_dwU :: a_am5) ->
             GHC.Classes.== @ a_am5 $dEq_am6 ds_dwU ds1_axJ)
          xs1_axK
    }

Ok, not too bad. Like solution 1, this will error on empty lists. The list traversal is hidden in GHC.List.all, but it will probably be expanded to good code at a call site.

Verdict: Another strong contender.

So between all of these, with lists I'd expect that Solutions 0 and 4 are the only ones worth using, and they are pretty much the same. I might consider Option 3 in some cases.

Edit: in both cases, the errors on empty lists can be simply fixed as in @augustss's answer.

The next step would be to do some time profiling with criterion.

小帐篷 2024-11-16 10:58:49

使用连续对的解决方案:

allTheSame xs = and $ zipWith (==) xs (tail xs)

A solution using consecutive pairs:

allTheSame xs = and $ zipWith (==) xs (tail xs)
雨后咖啡店 2024-11-16 10:58:49

Q1 - 是的,我认为你的简单解决方案很好,没有内存泄漏。 Q4 - 解决方案 3 不是 log(n),通过一个非常简单的论点,您需要查看所有列表元素以确定它们是否相同,并且查看 1 个元素需要 1 个时间步长。 Q5——是的。 Q6,见下文。

解决这个问题的方法是输入并运行它

main = do
    print $ allTheSame (replicate 100000000 1)

,然后运行 ​​ghc -O3 -optc-O3 --make Main.hs &&时间./Main.我最喜欢最后一个解决方案(您也可以使用模式匹配来清理它),

allTheSame (x:xs) = all (==x) xs

打开 ghci 并在这些东西上运行“:step fcn”。它将教会您很多关于惰性求值正在扩展的内容。一般来说,当您匹配构造函数时,例如“x:xs”,这是常数时间。当您调用“length”时,Haskell 需要计算列表中的所有元素(尽管它们的值仍然是“待计算”),因此解决方案 1 和 2 很糟糕。

编辑 1

抱歉,如果我之前的回答有点肤浅。似乎手动扩展东西确实有一点帮助(尽管与其他选项相比,这是一个微不足道的改进),

{-# LANGUAGE BangPatterns #-}
allTheSame [] = True
allTheSame ((!x):xs) = go x xs where
    go !x [] = True
    go !x (!y:ys) = (x == y) && (go x ys)

似乎 ghc 已经专门化了该功能,但您也可以查看专门化编译指示,以防它没有为您的代码工作 [链接]。

Q1 -- Yeah, I think your simple solution is fine, there is no memory leak. Q4 -- Solution 3 is not log(n), via the very simple argument that you need to look at all list elements to determine whether they are the same, and looking at 1 element takes 1 time step. Q5 -- yes. Q6, see below.

The way to go about this is to type it in and run it

main = do
    print $ allTheSame (replicate 100000000 1)

then run ghc -O3 -optc-O3 --make Main.hs && time ./Main. I like the last solution best (you can also use pattern matching to clean it up a little),

allTheSame (x:xs) = all (==x) xs

Open up ghci and run ":step fcn" on these things. It will teach you a lot about what lazy evaluation is expanding. In general, when you match a constructor, e.g. "x:xs", that's constant time. When you call "length", Haskell needs to compute all of the elements in the list (though their values are still "to-be-computed"), so solution 1 and 2 are bad.

edit 1

Sorry if my previous answer was a bit shallow. It seems like expanding things manually does help a little (though compared to the other options, it's a trivial improvement),

{-# LANGUAGE BangPatterns #-}
allTheSame [] = True
allTheSame ((!x):xs) = go x xs where
    go !x [] = True
    go !x (!y:ys) = (x == y) && (go x ys)

It seems that ghc is specializing the function already, but you can look at the specialize pragma too, in case it doesn't work for your code [ link ].

时间海 2024-11-16 10:58:49

这是另一个版本(不需要遍历整个列表,以防某些内容不匹配):

allTheSame [] = True
allTheSame (x:xs) = isNothing $ find (x /= ) xs

这在语法上可能不正确,但我希望您明白了。

Here is another version (don't need to traverse whole list in case something doesn't match):

allTheSame [] = True
allTheSame (x:xs) = isNothing $ find (x /= ) xs

This may not be syntactically correct , but I hope you got the idea.

甜妞爱困 2024-11-16 10:58:49

这是另一种有趣的方法:

{-# INLINABLE allSame #-}
allSame :: Eq a => [a] -> Bool
allSame xs = foldr go (`seq` True) xs Nothing where
  go x r Nothing = r (Just x)
  go x r (Just prev) = x == prev && r (Just x)

通过跟踪前一个元素而不是第一个元素,可以轻松更改此实现以实现增加减少。要对照第一个检查所有内容,您可以将 prev 重命名为 first,并将 Just x 替换为 Just first.


这将如何优化?我没有详细查看,但我将根据我所了解的有关 GHC 优化的一些知识来讲述一个好故事。

首先假设列表融合没有发生。然后 foldr 将被内联,给出类似于

allSame xs = allSame' xs Nothing where
  allSame' [] = (`seq` True)
  allSame' (x : xs) = go x (allSame' xs)

Eta 扩展的东西,然后产生

allSame' [] acc = acc `seq` True
allSame' (x : xs) acc = go x (allSame' xs) acc

内联 go

allSame' [] acc = acc `seq` True
allSame' (x : xs) Nothing = allSame' xs (Just x)
allSame' (x : xs) (Just prev) =
  x == prev && allSame' xs (Just x)

现在 GHC 可以识别出 Maybe 值始终是 只需在递归调用上,并使用工作者包装器转换来利用这一点:

allSame' [] acc = acc `seq` True
allSame' (x : xs) Nothing = allSame'' xs x
allSame' (x : xs) (Just prev) = x == prev && allSame'' xs x

allSame'' [] prev = True
allSame'' (x : xs) prev = x == prev && allSame'' xs x

现在请记住,

allSame xs = allSame' xs Nothing

并且 allSame' 不再是递归的,因此它可以进行 beta 缩减:

allSame [] = True
allSame (x : xs) = allSame'' xs x

allSame'' [] _ = True
allSame'' (x : xs) prev = x == prev && allSame'' xs x

所以这高阶代码已转变为高效的递归代码,无需额外分配。

使用 -O2 -ddump-simpl -dsuppress-all -dno-suppress-type-signatures 编译定义 allSame 的模块会产生以下结果(我已经对其进行了一些清理) ):

allSame :: forall a. Eq a => [a] -> Bool
allSame =
  \ (@ a) ($dEq_a :: Eq a) (xs0 :: [a]) ->
    let {
      equal :: a -> a -> Bool
      equal = == $dEq_a } in
    letrec {
      go :: [a] -> a -> Bool
      go =
        \ (xs :: [a]) (prev :: a) ->
          case xs of _ {
            [] -> True;
            : y ys ->
              case equal y prev of _ {
                False -> False;
                True -> go ys y
              }
          }; } in
    case xs0 of _ {
      [] -> True;
      : x xs -> go xs x
    }

正如你所看到的,这和我描述的结果本质上是一样的。 equal = == $dEq_a 位是从 Eq 字典中提取相等方法并将其保存在变量中的位置,因此只需提取一次。


如果列表融合确实发生怎么办?这里提醒一下定义:

allSame xs = foldr go (`seq` True) xs Nothing where
  go x r Nothing = r (Just x)
  go x r (Just prev) = x == prev && r (Just x)

如果我们调用 allSame (build g)foldr 将根据规则 build 融合>foldr cn (build g) = gc n, yielding

allSame (build g) = g go (`seq` True) Nothing

除非 g 已知,否则这不会让我们有任何有趣的地方。所以让我们选择一些简单的东西:

replicate k0 a = build $ \c n ->
  let
    rep 0 = n
    rep k = a `c` rep (k - 1)
  in rep k0

所以如果h = allSame (replicate k0 a)h变成

let
  rep 0 = (`seq` True)
  rep k = go a (rep (k - 1))
in rep k0 Nothing

Eta扩展,

let
  rep 0 acc = acc `seq` True
  rep k acc = go a (rep (k - 1)) acc
in rep k0 Nothing

内联go

let
  rep 0 acc = acc `seq` True
  rep k Nothing = rep (k - 1) (Just a)
  rep k (Just prev) = a == prev && rep (k - 1) (Just a)
in rep k0 Nothing

同样,GHC可以看到递归调用始终是 Just,因此

let
  rep 0 acc = acc `seq` True
  rep k Nothing = rep' (k - 1) a
  rep k (Just prev) = a == prev && rep' (k - 1) a
  rep' 0 _ = True
  rep' k prev = a == prev && rep' (k - 1) a
in rep k0 Nothing

由于 rep 不再递归,GHC 可以减少它:

let
  rep' 0 _ = True
  rep' k prev = a == prev && rep' (k - 1) a
in
  case k0 of
    0 -> True
    _ -> rep' (k - 1) a

如您所见,这可以在没有任何分配的情况下运行!显然,这是一个愚蠢的例子,但在许多更有趣的情况下也会发生类似的情况。例如,如果您编写一个导入 allSame 函数并按

foo :: Int -> Bool
foo n = allSame [0..n]

上述方式定义和编译它的 AllSameTest 模块,您将得到以下内容(未清理)。

$wfoo :: Int# -> Bool
$wfoo =
  \ (ww_s1bY :: Int#) ->
    case tagToEnum# (># 0 ww_s1bY) of _ {
      False ->
        letrec {
          $sgo_s1db :: Int# -> Int# -> Bool
          $sgo_s1db =
            \ (sc_s1d9 :: Int#) (sc1_s1da :: Int#) ->
              case tagToEnum# (==# sc_s1d9 sc1_s1da) of _ {
                False -> False;
                True ->
                  case tagToEnum# (==# sc_s1d9 ww_s1bY) of _ {
                    False -> $sgo_s1db (+# sc_s1d9 1) sc_s1d9;
                    True -> True
                  }
              }; } in
        case ww_s1bY of _ {
          __DEFAULT -> $sgo_s1db 1 0;
          0 -> True
        };
      True -> True
    }

foo :: Int -> Bool
foo =
  \ (w_s1bV :: Int) ->
    case w_s1bV of _ { I# ww1_s1bY -> $wfoo ww1_s1bY }

这可能看起来很恶心,但您会注意到任何地方都没有 : 构造函数,并且 Int 都已拆箱,因此该函数可以以零分配运行。

Here's another fun way:

{-# INLINABLE allSame #-}
allSame :: Eq a => [a] -> Bool
allSame xs = foldr go (`seq` True) xs Nothing where
  go x r Nothing = r (Just x)
  go x r (Just prev) = x == prev && r (Just x)

By keeping track of the previous element, rather than the first one, this implementation can easily be changed to implement increasing or decreasing. To check all of them against the first instead, you could rename prev to first, and replace Just x with Just first.


How will this be optimized? I haven't checked in detail, but I'm going to tell a good story based on some things I know about GHC's optimizations.

Suppose first that list fusion does not occur. Then foldr will be inlined, giving something like

allSame xs = allSame' xs Nothing where
  allSame' [] = (`seq` True)
  allSame' (x : xs) = go x (allSame' xs)

Eta expansion then yields

allSame' [] acc = acc `seq` True
allSame' (x : xs) acc = go x (allSame' xs) acc

Inlining go,

allSame' [] acc = acc `seq` True
allSame' (x : xs) Nothing = allSame' xs (Just x)
allSame' (x : xs) (Just prev) =
  x == prev && allSame' xs (Just x)

Now GHC can recognize that the Maybe value is always Just on the recursive call, and use a worker-wrapper transformation to take advantage of this:

allSame' [] acc = acc `seq` True
allSame' (x : xs) Nothing = allSame'' xs x
allSame' (x : xs) (Just prev) = x == prev && allSame'' xs x

allSame'' [] prev = True
allSame'' (x : xs) prev = x == prev && allSame'' xs x

Remember now that

allSame xs = allSame' xs Nothing

and allSame' is no longer recursive, so it can be beta-reduced:

allSame [] = True
allSame (x : xs) = allSame'' xs x

allSame'' [] _ = True
allSame'' (x : xs) prev = x == prev && allSame'' xs x

So the higher-order code has turned into efficient recursive code with no extra allocation.

Compiling the module defining allSame using -O2 -ddump-simpl -dsuppress-all -dno-suppress-type-signatures yields the following (I've cleaned it up a bit):

allSame :: forall a. Eq a => [a] -> Bool
allSame =
  \ (@ a) ($dEq_a :: Eq a) (xs0 :: [a]) ->
    let {
      equal :: a -> a -> Bool
      equal = == $dEq_a } in
    letrec {
      go :: [a] -> a -> Bool
      go =
        \ (xs :: [a]) (prev :: a) ->
          case xs of _ {
            [] -> True;
            : y ys ->
              case equal y prev of _ {
                False -> False;
                True -> go ys y
              }
          }; } in
    case xs0 of _ {
      [] -> True;
      : x xs -> go xs x
    }

As you can see, this is essentially the same as the result I described. The equal = == $dEq_a bit is where the equality method is extracted from the Eq dictionary and saved in a variable so it only needs to be extracted once.


What if list fusion does occur? Here's a reminder of the definition:

allSame xs = foldr go (`seq` True) xs Nothing where
  go x r Nothing = r (Just x)
  go x r (Just prev) = x == prev && r (Just x)

If we call allSame (build g), the foldr will fuse with the build according to the rule foldr c n (build g) = g c n, yielding

allSame (build g) = g go (`seq` True) Nothing

That doesn't get us anywhere interesting unless g is known. So let's choose something simple:

replicate k0 a = build $ \c n ->
  let
    rep 0 = n
    rep k = a `c` rep (k - 1)
  in rep k0

So if h = allSame (replicate k0 a), h becomes

let
  rep 0 = (`seq` True)
  rep k = go a (rep (k - 1))
in rep k0 Nothing

Eta expanding,

let
  rep 0 acc = acc `seq` True
  rep k acc = go a (rep (k - 1)) acc
in rep k0 Nothing

Inlining go,

let
  rep 0 acc = acc `seq` True
  rep k Nothing = rep (k - 1) (Just a)
  rep k (Just prev) = a == prev && rep (k - 1) (Just a)
in rep k0 Nothing

Again, GHC can see the recursive call is always Just, so

let
  rep 0 acc = acc `seq` True
  rep k Nothing = rep' (k - 1) a
  rep k (Just prev) = a == prev && rep' (k - 1) a
  rep' 0 _ = True
  rep' k prev = a == prev && rep' (k - 1) a
in rep k0 Nothing

Since rep is no longer recursive, GHC can reduce it:

let
  rep' 0 _ = True
  rep' k prev = a == prev && rep' (k - 1) a
in
  case k0 of
    0 -> True
    _ -> rep' (k - 1) a

As you can see, this can run with no allocation whatsoever! Obviously, it's a silly example, but something similar will happen in many more interesting cases. For example, if you write an AllSameTest module importing the allSame function and defining

foo :: Int -> Bool
foo n = allSame [0..n]

and compile it as described above, you'll get the following (not cleaned up).

$wfoo :: Int# -> Bool
$wfoo =
  \ (ww_s1bY :: Int#) ->
    case tagToEnum# (># 0 ww_s1bY) of _ {
      False ->
        letrec {
          $sgo_s1db :: Int# -> Int# -> Bool
          $sgo_s1db =
            \ (sc_s1d9 :: Int#) (sc1_s1da :: Int#) ->
              case tagToEnum# (==# sc_s1d9 sc1_s1da) of _ {
                False -> False;
                True ->
                  case tagToEnum# (==# sc_s1d9 ww_s1bY) of _ {
                    False -> $sgo_s1db (+# sc_s1d9 1) sc_s1d9;
                    True -> True
                  }
              }; } in
        case ww_s1bY of _ {
          __DEFAULT -> $sgo_s1db 1 0;
          0 -> True
        };
      True -> True
    }

foo :: Int -> Bool
foo =
  \ (w_s1bV :: Int) ->
    case w_s1bV of _ { I# ww1_s1bY -> $wfoo ww1_s1bY }

That may look disgusting, but you'll note that there are no : constructors anywhere, and that the Ints are all unboxed, so the function can run with zero allocation.

戒ㄋ 2024-11-16 10:58:49

我想我可能只是实现 find 并重做这个。不过,我认为了解其内部结构是有启发性的。 (请注意解决方案如何依赖于相等性的传递性,但还要注意问题如何要求相等性具有传递性才能保持一致。)

sameElement x:y:xs = if x /= y then Nothing else sameElement y:xs
sameElement [x] = Just x
allEqual [] = True
allEqual xs = isJust $ sameElement xs

我喜欢 sameElement 如何查看第一个 O (1) 列表的元素,然后返回结果或在列表的某些后缀(特别是尾部)上递归。我对这个结构没什么好说的,我只是喜欢它:-)

我想我做了与 this< 相同的比较/a>.相反,如果我使用 sameElement x:xs 进行递归,我会像解决方案 0 中那样将输入列表的头部与每个元素进行比较。

切线:如果愿意,可以通过以下方式报告两个不匹配的元素:将 Nothing 替换为 Left (x, y),将 Just x 替换为 Right xisJust 与 任一 (const False) (const True)

I think I might just be implementing find and redoing this. I think it's instructive, though, to see the innards of it. (Note how the solution depends on equality being transitive, though note also how the problem requires equality to be transitive to be coherent.)

sameElement x:y:xs = if x /= y then Nothing else sameElement y:xs
sameElement [x] = Just x
allEqual [] = True
allEqual xs = isJust $ sameElement xs

I like how sameElement peeks at the first O(1) elements of the list, then either returns a result or recurses on some suffix of the list, in particular the tail. I don't have anything smart to say about that structure, I just like it :-)

I think I do the same comparisons as this. If instead I had recursed with sameElement x:xs, I would compare the head of the input list to each element like in solution 0.

Tangent: one could, if one wanted, report the two mismatching elements by replacing Nothing with Left (x, y) and Just x with Right x and isJust with either (const False) (const True).

南…巷孤猫 2024-11-16 10:58:49

这个实现是优越的。

allSame [ ] = True
allSame (h:t) = aux h t

aux x1 [ ]                 = True
aux x1 (x2:xs) | x1==x2    = aux x2 xs 
               | otherwise = False

考虑到 (==) 运算符的传递性,假设 Eq 的实例已得到很好的实现,如果您希望确保表达式链的相等性,例如 a = b = c = d,则只需确保 a= b、b=c、c=d,并且d=a,而不是上面提供的技术,例如a=b、a=c、a=d、b=c、b=d、c=d。

我提出的解决方案随着您希望测试的元素数量线性增长,即使您引入常数因子以希望提高其效率,后者也是二次方。

它也优于使用组的解决方案,因为您最终不必使用长度。

你也可以用逐点的方式写得很好,但我不会用如此琐碎的细节来烦你。

This implementation is superior.

allSame [ ] = True
allSame (h:t) = aux h t

aux x1 [ ]                 = True
aux x1 (x2:xs) | x1==x2    = aux x2 xs 
               | otherwise = False

Given the transitivity of the (==) operator, assuming the instance of Eq is well implemented if you wish to assure the equality of a chain of expressions, eg a = b = c = d, you will only need to assure that a=b, b=c, c=d, and that d=a, Instead of the provided techniques above, eg a=b, a=c, a=d, b=c , b=d, c=d.

The solution I proposed grows linearly with the number of elements you wish to test were's the latter is quadratic even if you introduce constant factors in hopes of improving its efficiency.

It's also superior to the solution using group since you don't have to use length in the end.

You can also write it nicely in pointwise fashion but I won't bore you with such trivial details.

美人骨 2024-11-16 10:58:49

虽然效率不是很高(即使前两个元素不匹配,它也会遍历整个列表),但这里有一个厚颜无耻的解决方案:

import Data.List (group)

allTheSame :: (Eq a) => [a] -> Bool
allTheSame = (== 1) . length . group

只是为了好玩。

While not very efficient (it will traverse the whole list even if the first two elements don't match), here's a cheeky solution:

import Data.List (group)

allTheSame :: (Eq a) => [a] -> Bool
allTheSame = (== 1) . length . group

Just for fun.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文