Haskell 字节串:如何模式匹配?

发布于 2024-09-30 01:13:11 字数 1221 浏览 2 评论 0原文

我是 Haskell 新手,在弄清楚如何模式匹配 ByteString 时遇到了一些麻烦。我的函数的 [Char] 版本如下所示:

dropAB :: String -> String
dropAB []       = []
dropAB (x:[])   = x:[]
dropAB (x:y:xs) = if x=='a' && y=='b'
                  then dropAB xs
                  else x:(dropAB $ y:xs) 

正如预期的那样,这会从字符串中过滤掉所有出现的“ab”。但是,我在尝试将其应用于 ByteString 时遇到问题。

天真的版本

dropR :: BS.ByteString -> BS.ByteString
dropR []         = []
dropR (x:[])     = [x]
<...>

产生

Couldn't match expected type `BS.ByteString'
       against inferred type `[a]'
In the pattern: []
In the definition of `dropR': dropR [] = []

[] 显然是罪魁祸首,因为它是针对常规 String 而不是 ByteString。在 BS.empty 中进行替换似乎是正确的事情,但给出了“绑定位置的限定名称:BS.empty”。让我们尝试一下,

dropR :: BS.ByteString -> BS.ByteString
dropR empty              = empty        
dropR (x cons empty)     = x cons empty
<...>

这会导致 (x cons empty) 出现“模式解析错误”。我真的不知道我还能在这里做什么。

附带说明一下,我尝试使用此函数执行的操作是从某些文本中过滤掉特定的 UTF16 字符。如果有一种干净的方法来实现这一点,我很想听听,但这种模式匹配错误似乎是新手 Haskeller 应该真正理解的东西。

I'm a Haskell newbie, and having a bit of trouble figuring out how to pattern match a ByteString. The [Char] version of my function looks like:

dropAB :: String -> String
dropAB []       = []
dropAB (x:[])   = x:[]
dropAB (x:y:xs) = if x=='a' && y=='b'
                  then dropAB xs
                  else x:(dropAB $ y:xs) 

As expected, this filters out all occurrences of "ab" from a string. However, I have problems trying to apply this to a ByteString.

The naive version

dropR :: BS.ByteString -> BS.ByteString
dropR []         = []
dropR (x:[])     = [x]
<...>

yields

Couldn't match expected type `BS.ByteString'
       against inferred type `[a]'
In the pattern: []
In the definition of `dropR': dropR [] = []

[] is clearly the culprit, as it is for a regular String not a ByteString. Subbing in BS.empty seems like the right thing but gives "Qualified name in the binding position: BS.empty." Leaving us to try

dropR :: BS.ByteString -> BS.ByteString
dropR empty              = empty        
dropR (x cons empty)     = x cons empty
<...>

this gives "parse error in pattern" for (x cons empty). I don't really know what else I can do here.

As a side note, what I'm trying to do with this function is to filter out a specific UTF16 character from some text. If there's a clean way to accomplish that, I'd love to hear it, but this pattern matching error seems like something that a newbie haskeller should really understand.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

音栖息无 2024-10-07 01:13:11

您可以使用 查看模式 来实现此类操作

{-# LANGUAGE ViewPatterns #-}    
import Data.ByteString (ByteString, cons, uncons, singleton, empty)
import Data.ByteString.Internal (c2w) 

dropR :: ByteString -> ByteString
dropR (uncons -> Nothing) = empty
dropR (uncons -> Just (x,uncons -> Nothing)) = singleton x
dropR (uncons -> Just (x,uncons -> Just(y,xs))) =
    if x == c2w 'a' && y == c2w 'b'
    then dropR xs
    else cons x (dropR $ cons y xs)

You can use view patterns for such things

{-# LANGUAGE ViewPatterns #-}    
import Data.ByteString (ByteString, cons, uncons, singleton, empty)
import Data.ByteString.Internal (c2w) 

dropR :: ByteString -> ByteString
dropR (uncons -> Nothing) = empty
dropR (uncons -> Just (x,uncons -> Nothing)) = singleton x
dropR (uncons -> Just (x,uncons -> Just(y,xs))) =
    if x == c2w 'a' && y == c2w 'b'
    then dropR xs
    else cons x (dropR $ cons y xs)
给我一枪 2024-10-07 01:13:11

最新版本的 GHC (7.8) 有一个称为模式同义词的功能,可以将其添加到 gawi 的示例中:

{-# LANGUAGE ViewPatterns, PatternSynonyms #-}

import Data.ByteString (ByteString, cons, uncons, singleton, empty)
import Data.ByteString.Internal (c2w)

infixr 5 :<

pattern b :< bs <- (uncons -> Just (b, bs))
pattern Empty   <- (uncons -> Nothing)

dropR :: ByteString -> ByteString
dropR Empty          = empty
dropR (x :< Empty)   = singleton x
dropR (x :< y :< xs)
  | x == c2w 'a' && y == c2w 'b' = dropR xs
  | otherwise                    = cons x (dropR (cons y xs))

进一步,您可以将其抽象为在任何类型类上工作(当/如果我们得到 关联模式同义词)。模式定义保持不变:

{-# LANGUAGE ViewPatterns, PatternSynonyms, TypeFamilies #-}

import qualified Data.ByteString as BS
import Data.ByteString (ByteString, singleton)
import Data.ByteString.Internal (c2w)
import Data.Word

class ListLike l where
  type Elem l

  empty  :: l
  uncons :: l -> Maybe (Elem l, l)
  cons   :: Elem l -> l -> l

instance ListLike ByteString where
  type Elem ByteString = Word8

  empty  = BS.empty
  uncons = BS.uncons
  cons   = BS.cons

instance ListLike [a] where
  type Elem [a] = a

  empty         = []
  uncons []     = Nothing
  uncons (x:xs) = Just (x, xs)
  cons          = (:)

在这种情况下,dropR 可以在 [Word8]ByteString 上工作:

-- dropR :: [Word8]    -> [Word8]
-- dropR :: ByteString -> ByteString
dropR :: (ListLike l, Elem l ~ Word8) => l -> l
dropR Empty          = empty
dropR (x :< Empty)   = cons x empty
dropR (x :< y :< xs)
  | x == c2w 'a' && y == c2w 'b' = dropR xs
  | otherwise                    = cons x (dropR (cons y xs))

而且最重要的是:

import Data.ByteString.Internal (w2c)

infixr 5 :•    
pattern b :• bs <- (w2c -> b) :< bs

dropR :: (ListLike l, Elem l ~ Word8) => l -> l
dropR Empty              = empty
dropR (x   :< Empty)     = cons x empty
dropR ('a' :• 'b' :• xs) = dropR xs
dropR (x   :< y   :< xs) = cons x (dropR (cons y xs))

您可以在我关于模式同义词的帖子中查看更多信息。

The latest version of GHC (7.8) has a feature called pattern synonyms which can be added to gawi's example:

{-# LANGUAGE ViewPatterns, PatternSynonyms #-}

import Data.ByteString (ByteString, cons, uncons, singleton, empty)
import Data.ByteString.Internal (c2w)

infixr 5 :<

pattern b :< bs <- (uncons -> Just (b, bs))
pattern Empty   <- (uncons -> Nothing)

dropR :: ByteString -> ByteString
dropR Empty          = empty
dropR (x :< Empty)   = singleton x
dropR (x :< y :< xs)
  | x == c2w 'a' && y == c2w 'b' = dropR xs
  | otherwise                    = cons x (dropR (cons y xs))

Going further you can abstract this to work on any type class (this will look nicer when/if we get associated pattern synonyms). The pattern definitions stay the same:

{-# LANGUAGE ViewPatterns, PatternSynonyms, TypeFamilies #-}

import qualified Data.ByteString as BS
import Data.ByteString (ByteString, singleton)
import Data.ByteString.Internal (c2w)
import Data.Word

class ListLike l where
  type Elem l

  empty  :: l
  uncons :: l -> Maybe (Elem l, l)
  cons   :: Elem l -> l -> l

instance ListLike ByteString where
  type Elem ByteString = Word8

  empty  = BS.empty
  uncons = BS.uncons
  cons   = BS.cons

instance ListLike [a] where
  type Elem [a] = a

  empty         = []
  uncons []     = Nothing
  uncons (x:xs) = Just (x, xs)
  cons          = (:)

in which case dropR can work on both [Word8] and ByteString:

-- dropR :: [Word8]    -> [Word8]
-- dropR :: ByteString -> ByteString
dropR :: (ListLike l, Elem l ~ Word8) => l -> l
dropR Empty          = empty
dropR (x :< Empty)   = cons x empty
dropR (x :< y :< xs)
  | x == c2w 'a' && y == c2w 'b' = dropR xs
  | otherwise                    = cons x (dropR (cons y xs))

And for the hell of it:

import Data.ByteString.Internal (w2c)

infixr 5 :•    
pattern b :• bs <- (w2c -> b) :< bs

dropR :: (ListLike l, Elem l ~ Word8) => l -> l
dropR Empty              = empty
dropR (x   :< Empty)     = cons x empty
dropR ('a' :• 'b' :• xs) = dropR xs
dropR (x   :< y   :< xs) = cons x (dropR (cons y xs))

You can see more on my post on pattern synonyms.

冬天的雪花 2024-10-07 01:13:11

模式使用数据构造函数。 http://book.realworldhaskell.org/read/defining-types-streamlined -functions.html

你的empty只是第一个参数的绑定,它可能是x并且它不会改变任何东西。

您无法在模式中引用普通函数,因此 (x cons empty) 不合法。注意:我猜 (cons xempty) 确实是您的意思,但这也是非法的。

ByteStringString 有很大不同。 String[Char] 的别名,因此它是一个真实的列表,并且 : 运算符可以在模式中使用。

ByteString 是 Data.ByteString.Internal.PS !(GHC.ForeignPtr.ForeignPtr GHC.Word.Word8) !Int !Int (即指向本机 char* + 偏移量 + 长度的指针)。由于 ByteString 的数据构造函数是隐藏的,因此您必须使用函数而不是模式来访问数据。


这是使用 text 包解决 UTF-16 过滤器问题的解决方案(当然不是最好的解决方案):

module Test where

import Data.ByteString as BS
import Data.Text as T
import Data.Text.IO as TIO
import Data.Text.Encoding

removeAll :: Char -> Text -> Text
removeAll c t =  T.filter (/= c) t

main = do
  bytes <- BS.readFile "test.txt"
  TIO.putStr $ removeAll 'c' (decodeUtf16LE bytes)

Patterns use data constructors. http://book.realworldhaskell.org/read/defining-types-streamlining-functions.html

Your empty is just a binding for the first parameter, it could have been x and it would not change anything.

You can't reference a normal function in your pattern so (x cons empty) is not legal. Note: I guess (cons x empty) is really what you meant but this is also illegal.

ByteString is quite different from String. String is an alias of [Char], so it's a real list and the : operator can be used in patterns.

ByteString is Data.ByteString.Internal.PS !(GHC.ForeignPtr.ForeignPtr GHC.Word.Word8) !Int !Int (i.e. a pointer to a native char* + offset + length). Since the data constructor of ByteString is hidden, you must use functions to access the data, not patterns.


Here a solution (surely not the best one) to your UTF-16 filter problem using the text package:

module Test where

import Data.ByteString as BS
import Data.Text as T
import Data.Text.IO as TIO
import Data.Text.Encoding

removeAll :: Char -> Text -> Text
removeAll c t =  T.filter (/= c) t

main = do
  bytes <- BS.readFile "test.txt"
  TIO.putStr $ removeAll 'c' (decodeUtf16LE bytes)
2024-10-07 01:13:11

为此,我将对 uncons :: ByteString -> 的结果进行模式匹配。也许(Word8,ByteString)。

Haskell 中的模式匹配仅适用于使用“data”或“newtype”声明的构造函数。 ByteString 类型不会导出其无法进行模式匹配的构造函数。

For this, I would pattern match on the result of uncons :: ByteString -> Maybe (Word8, ByteString).

Pattern matching in Haskell only works on constructors declared with 'data' or 'newtype.' The ByteString type doesn't export its constructors you cannot pattern match.

优雅的叶子 2024-10-07 01:13:11

只是为了解决您收到的错误消息及其含义:

Couldn't match expected type `BS.ByteString'
       against inferred type `[a]'
In the pattern: []
In the definition of `dropR': dropR [] = []

因此编译器期望您的函数类型为:BS.ByteString -> BS.ByteString 因为您在签名中指定了该类型。然而它推断(通过查看函数的主体)该函数实际上是 [a] -> 类型。 [a]。那里存在不匹配,因此编译器会抱怨。

问题是您将 (:) 和 [] 视为语法糖,而实际上它们只是列表类型的构造函数(与 ByteString 非常不同)。

Just to address the error message you received and what it means:

Couldn't match expected type `BS.ByteString'
       against inferred type `[a]'
In the pattern: []
In the definition of `dropR': dropR [] = []

So the compiler expected your function to be of type: BS.ByteString -> BS.ByteString because you gave it that type in your signature. Yet it inferred (by looking at the body of your function) that the function is actually of type [a] -> [a]. There is a mismatch there so the compiler complains.

The trouble is you are thinking of (:) and [] as syntactic sugar, when they are actually just the constructors for the list type (which is VERY different from ByteString).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文