Haskell 计算满足查询的列表元素
我目前正在处理 CSV 文件,将它们解析为 [[String]] 该数组中的第一个 [String] 是一个头文件,例如:
["Code","Address","Town"]
其余的是信息数组
["ABA","12,east road", "London"]
我想创建一个查询系统,其中输入和结果将如下所示
>count "Town"="*London*" @1="A*"
14 rows
列名可以作为字符串放入或者作为带有列索引的@ 我有一个大小写开关来识别第一个单词输入,因为我要扩展我的 CSV 阅读器以实现不同的功能。 当它看到字数时,它将转到一个返回行数的函数。我不确定如何开始解析查询。 起初,我想我可以将字数统计后的结果字符串拆分为每个查询的字符串列表,执行一个查询并使用满足此查询的列表再次检查下一个查询,留下一个所有查询都在其中的列表满意,然后计算条目数量并将其返回。还会有一个大小写开关来识别第一个输入是字符串还是@符号。 * 用于表示单词后面的零个或任何字符。 我不确定如何开始实施这个,或者如果我错过了我的解决方案可能遇到的问题。我将非常乐意为我的起步提供任何帮助。我对 Haskell 的了解不是很深入(因为我刚刚开始),所以我也希望保持简单。谢谢
I'm currently working with CSV files where I'm parsing them into a [[String]]
The first [String] in that array is a header file eg:
["Code","Address","Town"]
and the rest are arrays of information
["ABA","12,east road", "London"]
I would like to create a query system where input and the result will look something like this
>count "Town"="*London*" @1="A*"
14 rows
The column name could be put in as a string Or as a @ with the index of the column
I have a case switch to recognise the first word input since Im going to expand my CSV reader for different functions.
When It sees the word count it will go to a function that will return a count of rows. Im not sure how to start doing the parsing of the query.
At first I thought I might split the resulting string after the word count into a list of strings with each query, perform one and use the list that satisfied this query to be checked again for the next, leaving with a list for which all queries are satisfied, then counting amount of entries and returning them. There would be a case switch also to recognise if the first input is a string or an @ symbol.
The * are used to represent zero or any character following the word.
I am not sure how to start implementing this or if im missing a problem I might encounter with my solution. I will be great full for any help with starting me off. Im not very advanced with Haskell(since Im just starting), so I would also appreciate keeping it simple. Thank you
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
这是一种可能的方法。
首先,让我们稍微远离字符串列表表示形式,让我们将记录表示为键/值对,这样数据库只是记录列表:
然后读取表示形式中的 CSV 数据变成:
现在,让我们谈谈查询。在您的设置中,查询本质上是一个过滤器列表,其中每个过滤器标识一个字段并匹配一组值:
按名称或基于一的(!)索引选择字段:
通过应用序列来匹配值简单解析器的组成,其中解析器要么识别单个字符,要么识别零个或多个任意字符的序列:
可以使用成功列表方法来实现解析,其中每个成功表示剩余输入,即输入不是由解析器消耗。剩余输入的空列表表示失败。 (因此,请注意以下情况下生成的结果中
[]
和[[]]
之间的差异。)过滤值然后发展为回溯:
值选择很简单:
现在应用记录过滤器相当于在记录上构建谓词:
最后,为了执行完整的查询,我们
(我将查询本身的解析作为练习:
但我建议使用解析器组合器库,例如 parsec 或 uulib。)
现在,让我们测试一下。首先,我们引入一个 CSV 格式的小型数据库:
然后,我们构造一个简单的查询:
事实上,对数据库运行查询会产生:
或者,如果您只是在计算查询结果之后:
当然,这这只是一种方法,当然人们可以想到许多替代方法。正如我们上面所做的那样,将问题分解为小函数的一个好处是,您可以轻松地单独测试和试验解决方案的小块。
Here's one possible approach.
First, let us move away from your list-of-list-of-string representation a bit and let us represents records as key/value pairs, such that a database is just a list of records:
Reading in CSV data in your representation then becomes:
Now, let us talk about queries. In your setting, a query is essentially a list of filters, where each filter identifies a field and matches a set of values:
Fields are selected either by name or by a one-based (!) index:
Values are matched by applying a sequence of simple parsers, where a parser either recognises a single character or otherwise a sequence of zero or more arbitrary characters:
Parsing can be implemented using the list-of-successes method, where each success denotes the remaining input, i.e., the part of the input that was not consumed by the parser. An empty list of remaining inputs denotes failure. (So, note the difference between
[]
and[[]]
in the produced results in the cases below.)Filtering values then develops into backtracking:
Value selection is straightforward:
Applying a record filter now amounts to constructing a predicate over records:
Finally, for executing a complete query, we have
(I leave the parsing of queries themselves as an exercise:
but I recommend using a parser-combinator library such as parsec or uulib.)
Now, let's test. First, we introduce a small database in CSV-format:
Then, we construct a simple query:
And, indeed, running our query against the database yields:
Or, if you are only after counting the results of your query:
Of course, this is just one approach and for sure one can think of many alternatives. A nice aspect of breaking the problem down in small functions, as we have done above, is that you can easily test and experiment with small chunks of the solution in isolation.
我对 Haskell 也不是那么精通......但我会这样处理它:你想要的本质上是:
其中“f”可以是类似“count”的东西(实际上是长度),而“g”是过滤功能与您的查询相对应。首先,您将输入分为“头”和“尾”(这将是列表);然后你可以使用 Parsec 来解析查询。您的秒差距解析器只会返回一个元组;第一个是函数“f”(如果遇到“count”,则可能是“length”);第二个只会返回 true/false;你会有这些类型:
使用 parsec< 构建 'f' 和 'g' 非常容易/a>.我想如果您稍微研究一下链接页面上的示例,您就会自己弄清楚。
I am not that much proficient in Haskell either...but I would approach it this way: what you want is essentially:
Where 'f' can be something like 'count' (that would be actually length), and 'g' is the filtering function correspondign to your query. First, you would split the input into 'head' and 'tail' (that would be the list); then you could use Parsec to parse the query. Your parsec parser would simply return a tuple; first would be a function 'f' (that could be 'length' if it encounters 'count'); the second would simply return true/false; you would have these types:
Building the 'f' and 'g' is quite easy with parsec. I think if you play a little with the examples on the linked page, you'll figure it out yourself.