Haskell 计算满足查询的列表元素

发布于 2024-12-20 14:52:03 字数 650 浏览 0 评论 0原文

我目前正在处理 CSV 文件，将它们解析为 [[String]] 该数组中的第一个 [String] 是一个头文件，例如：

["Code","Address","Town"]

其余的是信息数组

["ABA","12,east road", "London"]

我想创建一个查询系统，其中输入和结果将如下所示

>count "Town"="*London*" @1="A*"
14 rows

列名可以作为字符串放入或者作为带有列索引的@ 我有一个大小写开关来识别第一个单词输入，因为我要扩展我的 CSV 阅读器以实现不同的功能。当它看到字数时，它将转到一个返回行数的函数。我不确定如何开始解析查询。起初，我想我可以将字数统计后的结果字符串拆分为每个查询的字符串列表，执行一个查询并使用满足此查询的列表再次检查下一个查询，留下一个所有查询都在其中的列表满意，然后计算条目数量并将其返回。还会有一个大小写开关来识别第一个输入是字符串还是@符号。 * 用于表示单词后面的零个或任何字符。我不确定如何开始实施这个，或者如果我错过了我的解决方案可能遇到的问题。我将非常乐意为我的起步提供任何帮助。我对 Haskell 的了解不是很深入（因为我刚刚开始），所以我也希望保持简单。谢谢

原文

I'm currently working with CSV files where I'm parsing them into a [[String]]
The first [String] in that array is a header file eg:

["Code","Address","Town"]

and the rest are arrays of information

["ABA","12,east road", "London"]

I would like to create a query system where input and the result will look something like this

>count "Town"="*London*" @1="A*"
14 rows

The column name could be put in as a string Or as a @ with the index of the column
I have a case switch to recognise the first word input since Im going to expand my CSV reader for different functions.
When It sees the word count it will go to a function that will return a count of rows. Im not sure how to start doing the parsing of the query.
At first I thought I might split the resulting string after the word count into a list of strings with each query, perform one and use the list that satisfied this query to be checked again for the next, leaving with a list for which all queries are satisfied, then counting amount of entries and returning them. There would be a case switch also to recognise if the first input is a string or an @ symbol.
The * are used to represent zero or any character following the word.
I am not sure how to start implementing this or if im missing a problem I might encounter with my solution. I will be great full for any help with starting me off. Im not very advanced with Haskell(since Im just starting), so I would also appreciate keeping it simple. Thank you

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

離殇 2024-12-27 14:52:03

这是一种可能的方法。

首先，让我们稍微远离字符串列表表示形式，让我们将记录表示为键/值对，这样数据库只是记录列表：

type Field  = (String, String) -- key, value                                    
type Record = [Field]
type Db     = [Record]

然后读取表示形式中的 CSV 数据变成：

type Csv = [[String]]

fromCsv :: Csv -> Db
fromCsv []         = []
fromCsv (ks : vss) = map (zip ks) vss

现在，让我们谈谈查询。在您的设置中，查询本质上是一个过滤器列表，其中每个过滤器标识一个字段并匹配一组值：

type Query  = [Filter]
type Filter = (Selector, ValueFilter)

按名称或基于一的（！）索引选择字段：

data Selector = FieldName String | FieldIndex Int

通过应用序列来匹配值简单解析器的组成，其中解析器要么识别单个字符，要么识别零个或多个任意字符的序列：

type ValueFilter = [Parser]
data Parser      = Char Char | Wildcard

可以使用成功列表方法来实现解析，其中每个成功表示剩余输入，即输入不是由解析器消耗。剩余输入的空列表表示失败。（因此，请注意以下情况下生成的结果中 [] 和 [[]] 之间的差异。）

parse :: Parser -> String -> [String]
parse (Char c) (c' : cs') | c == c' = [cs']
parse Wildcard []                   = [[]]
parse Wildcard cs@(_ : cs')         = cs : parse Wildcard cs'
parse _ _                           = []

过滤值然后发展为回溯：

filterValue :: ValueFilter -> String -> Bool
filterValue ps cs = any null (go ps cs)
  where
    go [] cs       = [cs]
    go (p : ps) cs = concatMap (go ps) (parse p cs)

值选择很简单：

select :: Selector -> Record -> Maybe String
select (FieldName s) r                           = lookup s r
select (FieldIndex n) r | n > 0 && n <= length r = Just (snd (r !! (n - 1)))
                        | otherwise              = Nothing

现在应用记录过滤器相当于在记录上构建谓词：

apply :: Filter -> Record -> Bool
apply (s, vf) r = case select s r of
  Nothing -> False
  Just v  -> filterValue vf v

最后，为了执行完整的查询，我们

exec :: Query -> Db -> [Record]
exec = (flip . foldl . flip) (filter . apply)

（我将查询本身的解析作为练习：

readQuery :: String -> Maybe Query
readQuery = ...

但我建议使用解析器组合器库，例如 parsec 或 uulib。）

现在，让我们测试一下。首先，我们引入一个 CSV 格式的小型数据库：

csv :: Csv
csv =
  [ ["Name" , "City"      ]
     -------  ------------                                                      
  , ["Will" , "London"    ]
  , ["John" , "London"    ]
  , ["Chris", "Manchester"]
  , ["Colin", "Liverpool" ]
  , ["Nick" , "London"    ]
  ]

然后，我们构造一个简单的查询：

-- "Name"="*i*" @2="London"                                                     
query :: Query
query =
  [ (FieldName "Name", [Wildcard, Char 'i', Wildcard])
  , (FieldIndex 2,
      [Char 'L', Char 'o', Char 'n', Char 'd', Char 'o', Char 'n'])
  ]

事实上，对数据库运行查询会产生：

> exec query (fromCsv csv)
[[("Name","Will"),("City","London")],[("Name","Nick"),("City","London")]]

或者，如果您只是在计算查询结果之后：

> length $ exec query (fromCsv csv)
2

当然，这这只是一种方法，当然人们可以想到许多替代方法。正如我们上面所做的那样，将问题分解为小函数的一个好处是，您可以轻松地单独测试和试验解决方案的小块。

Here's one possible approach.

First, let us move away from your list-of-list-of-string representation a bit and let us represents records as key/value pairs, such that a database is just a list of records:

type Field  = (String, String) -- key, value                                    
type Record = [Field]
type Db     = [Record]

Reading in CSV data in your representation then becomes:

type Csv = [[String]]

fromCsv :: Csv -> Db
fromCsv []         = []
fromCsv (ks : vss) = map (zip ks) vss

Now, let us talk about queries. In your setting, a query is essentially a list of filters, where each filter identifies a field and matches a set of values:

type Query  = [Filter]
type Filter = (Selector, ValueFilter)

Fields are selected either by name or by a one-based (!) index:

data Selector = FieldName String | FieldIndex Int

Values are matched by applying a sequence of simple parsers, where a parser either recognises a single character or otherwise a sequence of zero or more arbitrary characters:

type ValueFilter = [Parser]
data Parser      = Char Char | Wildcard

Parsing can be implemented using the list-of-successes method, where each success denotes the remaining input, i.e., the part of the input that was not consumed by the parser. An empty list of remaining inputs denotes failure. (So, note the difference between [] and [[]] in the produced results in the cases below.)

parse :: Parser -> String -> [String]
parse (Char c) (c' : cs') | c == c' = [cs']
parse Wildcard []                   = [[]]
parse Wildcard cs@(_ : cs')         = cs : parse Wildcard cs'
parse _ _                           = []

Filtering values then develops into backtracking:

filterValue :: ValueFilter -> String -> Bool
filterValue ps cs = any null (go ps cs)
  where
    go [] cs       = [cs]
    go (p : ps) cs = concatMap (go ps) (parse p cs)

Value selection is straightforward:

select :: Selector -> Record -> Maybe String
select (FieldName s) r                           = lookup s r
select (FieldIndex n) r | n > 0 && n <= length r = Just (snd (r !! (n - 1)))
                        | otherwise              = Nothing

Applying a record filter now amounts to constructing a predicate over records:

apply :: Filter -> Record -> Bool
apply (s, vf) r = case select s r of
  Nothing -> False
  Just v  -> filterValue vf v

Finally, for executing a complete query, we have

exec :: Query -> Db -> [Record]
exec = (flip . foldl . flip) (filter . apply)

(I leave the parsing of queries themselves as an exercise:

readQuery :: String -> Maybe Query
readQuery = ...

but I recommend using a parser-combinator library such as parsec or uulib.)

Now, let's test. First, we introduce a small database in CSV-format:

csv :: Csv
csv =
  [ ["Name" , "City"      ]
     -------  ------------                                                      
  , ["Will" , "London"    ]
  , ["John" , "London"    ]
  , ["Chris", "Manchester"]
  , ["Colin", "Liverpool" ]
  , ["Nick" , "London"    ]
  ]

Then, we construct a simple query:

-- "Name"="*i*" @2="London"                                                     
query :: Query
query =
  [ (FieldName "Name", [Wildcard, Char 'i', Wildcard])
  , (FieldIndex 2,
      [Char 'L', Char 'o', Char 'n', Char 'd', Char 'o', Char 'n'])
  ]

And, indeed, running our query against the database yields:

> exec query (fromCsv csv)
[[("Name","Will"),("City","London")],[("Name","Nick"),("City","London")]]

Or, if you are only after counting the results of your query:

> length $ exec query (fromCsv csv)
2

Of course, this is just one approach and for sure one can think of many alternatives. A nice aspect of breaking the problem down in small functions, as we have done above, is that you can easily test and experiment with small chunks of the solution in isolation.

回复收藏 0 原文

凯凯我们等你回来 2024-12-27 14:52:03

我对 Haskell 也不是那么精通......但我会这样处理它：你想要的本质上是：

f $ filter g list

其中“f”可以是类似“count”的东西（实际上是长度），而“g”是过滤功能与您的查询相对应。首先，您将输入分为“头”和“尾”（这将是列表）；然后你可以使用 Parsec 来解析查询。您的秒差距解析器只会返回一个元组；第一个是函数“f”（如果遇到“count”，则可能是“length”）；第二个只会返回 true/false；你会有这些类型：

f :: [String] -> Int
g :: [String] -> Bool

使用 parsec< 构建 'f' 和 'g' 非常容易/a>.我想如果您稍微研究一下链接页面上的示例，您就会自己弄清楚。

I am not that much proficient in Haskell either...but I would approach it this way: what you want is essentially:

f $ filter g list

Where 'f' can be something like 'count' (that would be actually length), and 'g' is the filtering function correspondign to your query. First, you would split the input into 'head' and 'tail' (that would be the list); then you could use Parsec to parse the query. Your parsec parser would simply return a tuple; first would be a function 'f' (that could be 'length' if it encounters 'count'); the second would simply return true/false; you would have these types:

f :: [String] -> Int
g :: [String] -> Bool

Building the 'f' and 'g' is quite easy with parsec. I think if you play a little with the examples on the linked page, you'll figure it out yourself.

回复收藏 0 原文

~没有更多了~

关于作者

她说她爱他

暂无简介

文章

25 人气

关注发私信

友情链接

文江博客

Haskell 计算满足查询的列表元素

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

佚名

羁客

天天爱笑的徐老师

星

夏日落

隐诗

友情链接

Haskell 计算满足查询的列表元素

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

佚名

羁客

天天爱笑的徐老师

星

夏日落

隐诗

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。