F# 如何标记用户输入:分隔数字、单位、单词?
我对 F# 相当陌生,但最近几周一直在阅读参考资料。我希望处理用户提供的输入字符串,识别并分隔组成元素。例如,对于此输入:
XYZ 酒店:6 晚 220 欧元/晚 加 17.5% 税
输出应该类似于元组列表:
[(“XYZ”,字); (“酒店:”,Word);
(“6”,数字); (“夜晚”,Word);
(“at”,操作员); (“220”,数字);
(“欧元”,货币代码); (“/”, 操作员); (“夜晚”,词);
(“加号”,运算符); (“17.5”, 数字); (“%”,百分比); (“税”, 字)]
由于我正在处理用户输入,因此它可以是任何内容。因此,期望用户遵守语法是不可能的。我想识别数字(可以是整数、浮点数、负数...)、测量单位(可选,但可以包括 SI 或英制物理单位、货币代码、计数,例如我的示例中的“night/s”) 、数学运算符(作为数学符号或作为单词,包括“at”、“per”、“of”、“discount”等)以及所有其他单词。
我的印象是我应该使用主动模式匹配——这是正确的吗? ——但我不太确定如何开始。任何指向适当的参考材料或类似示例的指针都会很棒。
I am fairly new to F#, but have spent the last few weeks reading reference materials. I wish to process a user-supplied input string, identifying and separating the constituent elements. For example, for this input:
XYZ Hotel: 6 nights at 220EUR / night
plus 17.5% tax
the output should resemble something like a list of tuples:
[ ("XYZ", Word); ("Hotel:", Word);
("6", Number); ("nights", Word);
("at", Operator); ("220", Number);
("EUR", CurrencyCode); ("/",
Operator); ("night", Word);
("plus", Operator); ("17.5",
Number); ("%", PerCent); ("tax",
Word) ]
Since I'm dealing with user input, it could be anything. Thus, expecting users to comply with a grammar is out of the question. I want to identify the numbers (could be integers, floats, negative...), the units of measure (optional, but could include SI or Imperial physical units, currency codes, counts such as "night/s" in my example), mathematical operators (as math symbols or as words including "at" "per", "of", "discount", etc), and all other words.
I have the impression that I should use active pattern matching -- is that correct? -- but I'm not exactly sure how to start. Any pointers to appropriate reference material or similar examples would be great.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我使用 FParsec 库整理了一个示例。该示例一点也不健壮,但它很好地展示了如何使用 FParsec。
I put together an example using the FParsec library. The example is not robust at all but it gives a pretty good picture of how to use FParsec.
听起来你真正想要的只是一个词法分析器。 FSParsec 的一个很好的替代方案是 FSLex。 (很好的介绍教程,虽然有点过时,可以在我的旧博客 此处。)使用 FSLex,您可以获取输入文本:
并将其正确标记为:
一旦您有了标记列表,下一步就是进行某种形式的模式匹配/ 分析以提取语义信息(我认为这就是您真正想要的)。使用规范化的令牌流,它应该像这样简单:
这至少应该让您开始。但请注意,随着您的输入变得更加“正式”,您的词法分析的有用性也会变得更加“正式”。 (如果您只接受非常具体的输入,那么您可以使用适当的解析器并完成它!)
It sounds like what you really want is just a lexer. A good alternative to FSParsec would be FSLex. (Good intro tutorial, albiet somewhat dated, can be found on my old blog here.) Using FSLex you can take your input text:
And get it properly tokenized into something like:
The next step, once you have an List of tokens, is to do some form of pattern matching / analysis to extract semantic information (which I assume is what you are really after). With the normalized token stream, it should be as simple as:
That should at least get you started. But note that as your input gets more 'formal', so will the usefulness of your lexing. (And if you only accept a very specific input, then you can use a proper parser and be done with it!)