在 ByteString(而不是 Word8 或 Char)上拆分 ByteString

发布于 2024-08-04 10:08:11 字数 471 浏览 5 评论 0原文

我知道我已经有了 Haskell Data.ByteString.Lazy 函数来将 CSV 拆分为单个字符,例如:

split :: Word8 -> ByteString -> [ByteString]

但我想拆分多字符 ByteString(如拆分字符串而不是字符):

split :: ByteString -> ByteString -> [ByteString]

我有我需要解析一个类似 csv 的文本文件中的多字符分隔符,并且各个字符本身出现在某些字段中,因此仅选择一个分隔符并丢弃其他字符会污染数据导入。

我对如何做到这一点有一些想法,但它们看起来有点老套(例如,采用三个 Word8,测试它们是否是分隔符组合,如果是,则开始一个新字段,进一步递归),我想我会无论如何都要重新发明轮子。有没有办法在不从头开始重建函数的情况下做到这一点?

I know I already have the Haskell Data.ByteString.Lazy function to split a CSV on a single character, such as:

split :: Word8 -> ByteString -> [ByteString]

But I want to split on a multi-character ByteString (like splitting on a String instead of a Char):

split :: ByteString -> ByteString -> [ByteString]

I have multi-character separators in a csv-like text file that I need to parse, and the individual characters themselves appear in some of the fields, so choosing just one separator character and discarding the others would contaminate the data import.

I've had some ideas on how to do this, but they seem kind of hacky (e.g. take three Word8s, test if they're the separator combination, start a new field if they are, recurse further), and I imagine I would be reinventing a wheel anyway. Is there a way to do this without rebuilding the function from scratch?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

七色彩虹 2024-08-11 10:08:11

Bytestrings 的文档 breakSubstring 包含一个执行您所要求的功能的函数:

tokenise x y = h : if null t then [] else tokenise x (drop (length x) t)
    where (h,t) = breakSubstring x y

The documentation of Bytestrings breakSubstring contains a function that does what you are asking for:

tokenise x y = h : if null t then [] else tokenise x (drop (length x) t)
    where (h,t) = breakSubstring x y
仲春光 2024-08-11 10:08:11

bytestring 中有一些用于分割子序列的函数:

breakSubstring :: ByteString -> ByteString -> (ByteString,ByteString)

还有一个

There are a few functions in bytestring for splitting on subsequences:

breakSubstring :: ByteString -> ByteString -> (ByteString,ByteString)

There's also a

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文