解析数字列表的最佳方法
我有一个问题,我需要处理一个数字列表,该列表将在英语句子中。它可以采用以下格式:
项目 1、2 和 3
项目 2 到 5
项目 1 到 20
项目 4 或 8
我最初的本能是编写一个简单的状态机来解析它,但我想知道是否有更好的(更简单)的方式,比如可能是一些正则表达式。有什么建议吗?
I have a problem in that I need to process a list of numbers, which will be in an English sentence. It could be in the following formats:
items 1, 2 and 3
items 2 through 5
items 1 to 20
items 4 or 8
My initial instinct is to write a simple state machine to parse it, but I was wondering if there is any better (simpler) way, such as maybe some regular expression. Any advice?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
如果您有 C++11,以下解析器 (AXE) 将解析您的所有格式(我没有测试它):
如果您没有 C++11,您可以使用 提升::精神。与使用正则表达式相比,编写和调试此类解析器更容易、更短,并且您在创建解析规则和语义操作方面也获得了很大的灵活性。
If you have C++11, the following parser (AXE) will parse all your formats (I didn't test it):
If you don't have C++11, you can write a similar parser in C++ using boost::spirit. It's easier and shorter to write and debug such parser than using regular expressions, and you also get a lot of flexibility in creating parsing rules and semantic actions.
如果您热衷于 Java,请使用正则表达式功能。
http://download.oracle.com/javase/tutorial/essential/regex/
但如果您不这样做,sed 脚本最适合简单的文本处理。
If you're wedded to Java, use the Regular Expression functionality.
http://download.oracle.com/javase/tutorial/essential/regex/
But if you're not, a sed script works best for simple text processing.
使用针对每种情况的正则表达式或针对每种情况使用带有替代方案的单个表达式来为这些字符串编写解析器似乎非常简单。您需要使用类似
\d+
的内容来匹配数字。我还将每组类似的组合器分组(例如将“and”/“or”和“to”/“through”)合并为单个替代项,以便更轻松地处理结果。
It seems very simple to write a parser for those strings using a regular expression for each case, or a single expression with an alternative for each. You need to use something like
\d+
to match the numbers. I would also group each set of similar combinators (like"and"/"or" and "to"/"through") into a single alternative to make it easier to process the results.