文本分割器背后的 Haskell 模式匹配概念

发布于 2025-01-03 00:31:37 字数 364 浏览 2 评论 0原文

我想知道此代码片段背后的模式匹配概念：

 split :: String -> Char -> [String]
 split [] delim = [""]
 split (c:cs) delim
     | c == delim = "" : rest
     | otherwise = (c : head rest) : tail rest
       where
         rest = split cs delim

我知道 head 返回列表的第一个元素，tail 返回其余元素。但我仍然无法理解它的功能。这需要一个字符串并将其分解为给定字符的字符串列表。

原文

I want to know the pattern matching concept behind this code snippet:

 split :: String -> Char -> [String]
 split [] delim = [""]
 split (c:cs) delim
     | c == delim = "" : rest
     | otherwise = (c : head rest) : tail rest
       where
         rest = split cs delim

I know that head returns the 1st element of the list and tail returns the rest. But I still cannot understand the functionality of this. This takes a string and breaks it into a list of strings from a given character.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

幸福丶如此 2025-01-10 00:31:37

也许用下面的形式更清楚：

split [] delim = [""]    -- a list containing only an empty String
split (c:cs) delim = let (firstWord:moreWords) = split cs delim
                     in if c == delim
                           then "" : firstWord : moreWords
                           else (c:firstWord) : moreWords

该函数遍历输入字符串，将每个字符与分隔符进行比较。如果当前字符不是分隔符，则将其添加到分割字符串剩余部分所产生的第一个单词（可能为空）的前面，如果当前字符是分隔符，则将在前面添加一个空字符串分割余数的结果。

例如，split "abc cde" ' ' 的评估过程类似于

split "abc cde" ' '
    ~> 'a' == ' ' ? No, next guard
    ~> ('a' : something) : somethingElse

其中 something 和 somethingElse 将稍后通过拆分余数来确定>“BC CDE”。查看第一个字符后，可以确定无论最终结果是什么，它的第一个条目都是以'a'`开头的。继续确定其余部分，

split "bc cde" ' '
    ~> ('b' : something1) : somethingElse1
       where (something1 : somethingElse1) = split "c cde" ' '

因此现在结果第一个条目的前两个字符已知。然后从下一步开始确定 something1 以 'c' 开头。最后我们到达一个分隔符，即在不参考后面的递归调用的情况下确定结果的第一个元素，并且在递归中只找到结果的其余部分。

制定算法的另一种方法是（感谢@dave4420的建议）

split input delim = foldr combine [""] input
  where
    combine c rest@(~(wd : wds))
        | c == delim = "" : rest
        | otherwise  = (c : wd) : wds

Maybe it's clearer in the following form:

split [] delim = [""]    -- a list containing only an empty String
split (c:cs) delim = let (firstWord:moreWords) = split cs delim
                     in if c == delim
                           then "" : firstWord : moreWords
                           else (c:firstWord) : moreWords

The function traverses the input string, comparing each character with the delimiter. If the current character is not the delimiting character, it is tacked on the front of the first word (which may be empty) resulting from splitting the remainder of the string, if it is the delimiting character, it adds an empty string to the front of the result of splitting the remainder.

For example, the evaluation of split "abc cde" ' ' proceeds like

split "abc cde" ' '
    ~> 'a' == ' ' ? No, next guard
    ~> ('a' : something) : somethingElse

where something and somethingElse will be determined later by splitting the remainder "bc cde". After looking at the first character, it's been determined that whatever the final result is, its first entry starts with'a'`. Going on to determine the rest,

split "bc cde" ' '
    ~> ('b' : something1) : somethingElse1
       where (something1 : somethingElse1) = split "c cde" ' '

So now the first two characters of the first entry of the result are known. Then from the next step it is determined that something1 starts with 'c'. Then finally we reach a delimiter, that is the case where the first element of the result is determined without reference to later recursive calls, and only the remainder of the result remains to be found in the recursion.

Another way of formulating the algorithm is (thanks @dave4420 for the suggestion)

split input delim = foldr combine [""] input
  where
    combine c rest@(~(wd : wds))
        | c == delim = "" : rest
        | otherwise  = (c : wd) : wds

回复收藏 0 原文

~没有更多了~