在 Haskell 中,如何对无限字符串列表进行排序?

发布于 2024-08-23 22:24:04 字数 461 浏览 7 评论 0原文

所以基本上,如果我有一个(有限或无限)字符串列表的(有限或无限)列表,是否可以先按长度排序列表,然后按字典顺序排序,排除重复项?输入/输出示例如下:

输入:

[["a", "b",...], ["a", "aa", "aaa"], ["b", "bb", "bbb ",...], ...]

输出:

["a", "b", "aa", "bb", "aaa", "bbb", ...]

我知道输入列表不是一个有效的 haskell 表达式,但假设有这样的输入。我尝试使用合并算法,但它往往会依赖于我提供的输入。有人可以解释并展示一个可以做到这一点的像样的排序功能吗?如果没有这样的功能,你能解释一下为什么吗?

如果有人不明白我所说的排序顺序是什么意思,我的意思是首先对最短长度的字符串进行排序,并且如果一个或多个字符串具有相同的长度,那么它们将使用 << 进行排序。操作员。

谢谢!

So basically, if I have a (finite or infinite) list of (finite or infinite) lists of strings, is it possible to sort the list by length first and then by lexicographic order, excluding duplicates? A sample input/output would be:

Input:

[["a", "b",...], ["a", "aa", "aaa"], ["b", "bb", "bbb",...], ...]

Output:

["a", "b", "aa", "bb", "aaa", "bbb", ...]

I know that the input list is not a valid haskell expression but suppose that there is an input like that. I tried using merge algorithm but it tends to hang on the inputs that I give it. Can somebody explain and show a decent sorting function that can do this? If there isn't any function like that, can you explain why?

In case somebody didn't understand what I meant by the sorting order, I meant that shortest length strings are sorted first AND if one or more strings are of same length then they are sorted using < operator.

Thanks!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

秋凉 2024-08-30 22:24:04

最终,您无法对无限列表进行排序,因为列表尾部的项目可能会一直渗透到结果的前面,因此在看到最后一项之前,您无法完成对无限列表的排序,但你的清单是无限的,所以你永远无法到达那里。

您甚至可以尝试对无限列表进行排序的唯一方法将需要对列表中的居民进行限制。如果列表项的值来自有充分依据的集合并且列表是唯一的,那么您至少可以在将元素返回列表的初始元素方面取得一些进展。例如,如果列表由不同的自然数组成,您可以返回您看到的第一个 0,然后返回第一个 1,依此类推。但在您看到 2 之前,无论列表的位置有多远,您都无法在结果中取得任何进展你去了。最终,如果您由于源中不存在而跳过了集合中的某个元素,那么您将停止生成新的输出元素,直到您掌握了整个输入。

您可以对字符串做同样的事情,因为它们是有根据的,但如果您计划返回所有可能的字符串,那么这只是远程可行的。

简而言之,如果您需要这个,那么您将以错误的方式解决您遇到的问题。对于您想要使用的任何解决方案来说,这都不是一条容易处理的路径。

正如 yairchu 所指出的,合并有限数量的已排序无限列表效果很好。

Ultimately, you can't sort an infinite list, because items at the tail of the list could percolate all the way to the front of the result, so you can't finish sorting an infinite list until you've seen the last item, but your list is infinite, so you'll never get there.

The only way that you could even try to sort an infinite list would require constraints on the inhabitants of the list. If the values of the list items comes from a well-founded set and the contents of the list are unique then you could at least make some progress in returning elements the initial elements of the list. For instance if the list was of distinct natural numbers, you could return the first 0 you see, then the first 1, etc. but you couldn't make any headway in the result until you saw 2, no matter how far down the list you went. Ultimately, if you ever skipped an element in the set because it wasn't present in the source, you'd cease to produce new output elements until you had the entire input in hand.

You can do the same thing with strings, because they are well founded, but that is only even remotely viable if you plan on returning all possible strings.

In short, if you need this, you're going about solving the problem you have in the wrong way. This isn't a tractable path to any solution you will want to use.

As yairchu noted, merging a finite number of sorted infinite lists works fine though.

友欢 2024-08-30 22:24:04
  • 一般来说,对无限列表进行排序是不可能的。因为最小的项可能位于无限的位置,我们必须在输出它之前找到它。

  • 合并无限排序列表是可能的。

  • 一般来说,合并无限列表的排序列表是不可能的。出于与对它们进行排序相同的原因。

  • 合并已排序列表的无限列表,这些列表按头排序 (forall i j. i < j => head (lists !! i) <= head ( >

所以我猜测你真正想要的是合并排序列表的排序无限列表。这甚至是一项有意义的任务。甚至还有 一些现有代码使用它的 ,在那里实现了单子列表 - 有点丑陋的语法等。所以这是一个普通列表的版本:

mergeOnSortedHeads :: Ord b => (a -> b) -> [[a]] -> [a]
mergeOnSortedHeads _ [] = []
mergeOnSortedHeads f ([]:xs) = mergeOnSortedHeads f xs
mergeOnSortedHeads f ((x:xs):ys) =
  x : mergeOnSortedHeads f (bury xs ys)
  where
    bury [] ks = ks
    bury js [] = [js]
    bury js ([]:ks) = bury js ks
    bury jj@(j:js) ll@(kk@(k:ks):ls)
      | f j <= f k = jj : ll
      | otherwise = kk : bury jj ls

ghci> take 20 $ mergeOnSortedHeads id $ [[0,4,6], [2,3,9], [3,5..], [8]] ++ map repeat [12..]
[0,2,3,3,4,5,6,7,8,9,9,11,12,12,12,12,12,12,12,12]

顺便说一句:你需要这个做什么?

  • In general it is impossible to sort infinite lists. Because the smallest item could be at infinite position and we must find it before we output it.

  • Merging infinite sorted lists is possible.

  • In general, merging an infinite list of sorted lists is impossible. For same reason that sorting them is.

  • Merging an infinite list of sorted lists, which are sorted by heads (forall i j. i < j => head (lists !! i) <= head (lists !! j)), is possible.

So I'm guessing that what you really want is to merge a sorted infinite list of sorted lists. It's even a task that makes some sense. There's even some existing code that uses it, implemented for monadic lists there - kinda ugly syntax-wise etc. So here's a version for plain lists:

mergeOnSortedHeads :: Ord b => (a -> b) -> [[a]] -> [a]
mergeOnSortedHeads _ [] = []
mergeOnSortedHeads f ([]:xs) = mergeOnSortedHeads f xs
mergeOnSortedHeads f ((x:xs):ys) =
  x : mergeOnSortedHeads f (bury xs ys)
  where
    bury [] ks = ks
    bury js [] = [js]
    bury js ([]:ks) = bury js ks
    bury jj@(j:js) ll@(kk@(k:ks):ls)
      | f j <= f k = jj : ll
      | otherwise = kk : bury jj ls

ghci> take 20 $ mergeOnSortedHeads id $ [[0,4,6], [2,3,9], [3,5..], [8]] ++ map repeat [12..]
[0,2,3,3,4,5,6,7,8,9,9,11,12,12,12,12,12,12,12,12]

btw: what do you need this for?

嗼ふ静 2024-08-30 22:24:04

好吧,我将忽略您对无限数据进行排序的请求。

要按子列表的长度排序,然后按字典顺序排序,我们可以很容易地做到这一点。哦,你想删除重复项。

我们将从一个示例开始:

> s
[["a","b"],["a","aa","aaa"],["b","bb","bbb"]]

然后逐步构建程序。

首先按长度排序(使用 Data.Ord.comparing 构建排序主体):

> sortBy (comparing length) s
[["a","b"],["a","aa","aaa"],["b","bb","bbb"]]

好的。看起来很合理。因此,让我们先连接,然后按长度排序,然后 alpha:

> sortBy (comparing length) . nub . concat $ s
["a","b","aa","bb","aaa","bbb"]

如果您的输入已排序。否则,您将需要一个稍微不同的主体来排序。

Well, I'm going to ignore your request for sorting infinite data.

To sort by length of the sublists, then by lexicographic order, we can do this pretty easily. Oh, and you want duplicates removed.

We'll start with a sample:

> s
[["a","b"],["a","aa","aaa"],["b","bb","bbb"]]

And then build the program incrementally.

First sorting on length (using Data.Ord.comparing to build the sort body):

> sortBy (comparing length) s
[["a","b"],["a","aa","aaa"],["b","bb","bbb"]]

Ok. That looks reasonable. So let's just concat, and sortBy length then alpha:

> sortBy (comparing length) . nub . concat $ s
["a","b","aa","bb","aaa","bbb"]

If your input is sorted. Otherwise you'll need a sligtly different body to sortBy.

潜移默化 2024-08-30 22:24:04

感谢大家的投入,并对迟回复表示歉意。事实证明我只是以错误的方式处理这个问题。我试图做 Yairchu 展示的事情,但我使用内置函数 length 来进行合并,但由于显而易见的原因,length 不适用于无限列表。不管怎样,我通过在旅途中创建列表时排序来解决我的问题,而不是最终结果。我想知道还有哪些其他语言提供无限列表?这是一个奇怪但有用的概念。

Thanks to everyone for their inputs and sorry for the late reply. Turns out I was just approaching the problem in a wrong way. I was trying to do what Yairchu showed but I was using the built in function length to do the merging but length doesnt work on an infinite list for obvious reasons. Anyways, I solved my problem by sorting as I created the list on the go, not on the end result. I wonder what other languages offer infinite lists? Such a weird but useful concept.

雨的味道风的声音 2024-08-30 22:24:04

这是一个让您在线排序的算法:

它效率不高,但它足够懒惰,可以让您进入不同的排序代,即使您对无限列表进行排序。这是一个不错的噱头,但不太好用。例如,对无限列表 [10,9..] 进行排序:

*Main> take 10 $ sortingStream [10,9..] !! 0
[9,8,7,6,5,4,3,2,1,0]
*Main> take 10 $ sortingStream [10,9..] !! 1
[8,7,6,5,4,3,2,1,0,-1]
*Main> take 10 $ sortingStream [10,9..] !! 2
[7,6,5,4,3,2,1,0,-1,-2]
*Main> take 10 $ sortingStream [10,9..] !! 3
[6,5,4,3,2,1,0,-1,-2,-3]
*Main> take 10 $ sortingStream [10,9..] !! 4
[5,4,3,2,1,0,-1,-2,-3,-4]
*Main> take 10 $ sortingStream [10,9..] !! 1000
[-991,-992,-993,-994,-995,-996,-997,-998,-999,-1000]

如您所见,每一代的排序都会有所改进。代码:

produce :: ([a] -> [a]) -> [a] -> [[a]]
produce f xs = f xs : (produce f (f xs))


sortingStream :: (Ord a) => [a] -> [[a]]
sortingStream = produce ss

ss :: (Ord a) => [a] -> [a]
ss [] = []
ss [x] = [x]
ss [x,y]    | x <= y = [x,y]
            | otherwise = [y,x]
ss (x:y:xs) | x <= y  =  x: (ss (y:xs))
            | otherwise =  y:(ss (x:xs))

Here is an algorithm that let you online sort:

it is not efficient, but it is lazy enough to let you goto different sort generations, even if you sort infinite lists. It is a nice gimmick, but not very usable. For example sorting the infinite list [10,9..]:

*Main> take 10 $ sortingStream [10,9..] !! 0
[9,8,7,6,5,4,3,2,1,0]
*Main> take 10 $ sortingStream [10,9..] !! 1
[8,7,6,5,4,3,2,1,0,-1]
*Main> take 10 $ sortingStream [10,9..] !! 2
[7,6,5,4,3,2,1,0,-1,-2]
*Main> take 10 $ sortingStream [10,9..] !! 3
[6,5,4,3,2,1,0,-1,-2,-3]
*Main> take 10 $ sortingStream [10,9..] !! 4
[5,4,3,2,1,0,-1,-2,-3,-4]
*Main> take 10 $ sortingStream [10,9..] !! 1000
[-991,-992,-993,-994,-995,-996,-997,-998,-999,-1000]

As you can see the sorting improves each generation. The code:

produce :: ([a] -> [a]) -> [a] -> [[a]]
produce f xs = f xs : (produce f (f xs))


sortingStream :: (Ord a) => [a] -> [[a]]
sortingStream = produce ss

ss :: (Ord a) => [a] -> [a]
ss [] = []
ss [x] = [x]
ss [x,y]    | x <= y = [x,y]
            | otherwise = [y,x]
ss (x:y:xs) | x <= y  =  x: (ss (y:xs))
            | otherwise =  y:(ss (x:xs))
川水往事 2024-08-30 22:24:04

是否可以完成在很大程度上取决于输入数据的性质。如果当您看到较长的列表时可以“停止查找”特定长度的列表并且每种长度的列表数量有限,那么您可以按升序浏览长度排序、排序并连接结果。像这样的东西应该可以工作:(

listsUptoLength n xss = takeWhile (\xs -> length xs <= n) $ xss 
listsUptoLength' n [] = []
listsUptoLength' n (xss:xsss) = case listsUptoLength n xss of
    [] -> []
    xss' -> xss' : listsUptoLength' n xsss
listsOfLength n xsss = concatMap (\xss -> (filter (\xs -> length xs == n) xss)) (listsUptoLength' n xsss) 

sortInfinite xsss = concatMap (\n -> sort . nub $ (listsOfLength n xsss)) [0..] 

f xs y = [xs ++ replicate n y | n <- [1..]]
test = [ map (\x -> [x]) ['a'..'e'], f "" 'a', f "" 'b', f "b" 'a', f "a" 'b' ] ++ [f start 'c' | start <- f "" 'a'] 

名称可能会选择得更有启发性:)

我猜你正在使用正则表达式,所以我认为这样的东西可以工作;我再次要求提供更多背景信息!

Whether it can be done depends very much on the nature of your input data. If you can 'stop looking' for lists of a certain length when you've seen a longer one and there are only a finite number of lists of each length, then you can go through the lengths in ascending order, sort those and concatenate the results. Something like this should work:

listsUptoLength n xss = takeWhile (\xs -> length xs <= n) $ xss 
listsUptoLength' n [] = []
listsUptoLength' n (xss:xsss) = case listsUptoLength n xss of
    [] -> []
    xss' -> xss' : listsUptoLength' n xsss
listsOfLength n xsss = concatMap (\xss -> (filter (\xs -> length xs == n) xss)) (listsUptoLength' n xsss) 

sortInfinite xsss = concatMap (\n -> sort . nub $ (listsOfLength n xsss)) [0..] 

f xs y = [xs ++ replicate n y | n <- [1..]]
test = [ map (\x -> [x]) ['a'..'e'], f "" 'a', f "" 'b', f "b" 'a', f "a" 'b' ] ++ [f start 'c' | start <- f "" 'a'] 

(The names could probably be chosen to be more illuminating :)

I'm guessing you're working with regular expressions, so I think something like this could be made to work; I repeat the request for more background!

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文