创建列表词法分析器/解析器
我需要创建一个词法分析器/解析器来处理可变长度和结构的输入数据。
假设我有一个保留关键字列表:
keyWordList = ['command1', 'command2', 'command3']
和一个用户输入字符串:
userInput = 'The quick brown command1 fox jumped over command2 the lazy dog command 3'
userInputList = userInput.split()
我将如何编写这个函数:
INPUT:
tokenize(userInputList, keyWordList)
OUTPUT:
[['The', 'quick', 'brown'], 'command1', ['fox', 'jumped', 'over'], 'command 2', ['the', 'lazy', 'dog'], 'command3']
我已经编写了一个可以识别关键字的分词器,但无法找到一种有效的方法来嵌入非组-将关键字放入更深层次的列表中。
RE 解决方案是受欢迎的,但我真的很想看到底层算法,因为我可能会将应用程序扩展到其他对象的列表,而不仅仅是字符串。
I need to create a lexer/parser which deals with input data of variable length and structure.
Say I have a list of reserved keywords:
keyWordList = ['command1', 'command2', 'command3']
and a user input string:
userInput = 'The quick brown command1 fox jumped over command2 the lazy dog command 3'
userInputList = userInput.split()
How would I go about writing this function:
INPUT:
tokenize(userInputList, keyWordList)
OUTPUT:
[['The', 'quick', 'brown'], 'command1', ['fox', 'jumped', 'over'], 'command 2', ['the', 'lazy', 'dog'], 'command3']
I've written a tokenizer that can identify keywords, but have been unable to figure out an efficent way to embed groups of non-keywords into lists that are a level deeper.
RE solutions are welcome, but I would really like to see the underlying algorithm as I am probably going to extend the application to lists of other objects and not just strings.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
像这样的事情:
这会返回一个生成器,因此将您的调用包装在一个列表中。
Something like this:
This returns a generator, so wrap your call in one to
list
.使用一些正则表达式很容易做到这一点:
现在您只需拆分每个元组的第一个元素即可。
对于不止一层的深度,正则表达式可能不是一个好的答案。
此页面上有一些不错的解析器供您选择: http://wiki.python.org/moin/ LanguageParsing
我认为 Lepl 是一个不错的。
That is easy to do with some regex:
Now you just have to split the first element of each tuple.
For more than one level of deepness, regex may not be a good answer.
There are some nice parsers for you to choose on this page: http://wiki.python.org/moin/LanguageParsing
I think Lepl is a good one.
试试这个:
Try this:
或者看看 PyParsing。相当不错的小 lex 解析器组合
Or have a look at PyParsing. Quite a nice little lex parser combination