Yes, I've used them. Yes, you can do things with out them--but any time you choose the wrong tool for the job, you'll be making needless pain for yourself.
Some example of the non-standard uses I've personally put the technology to:
scraping data from reports generated by legacy systems
picking out patterns in data too complex for a regexp
protocol analysis
text based adventure games
the metaprograming API that ate Toledo (not it's real name)
Syntax highlighting. The Scite text editor allows you to write your own lexer (in C++) to provide syntax highlighting for any custom language. I wrote my own custom lexer for Scite as a refresher on this topic (I studied it a while ago at my university).
Regular Expressions are often used as an alternative for pattern matching and simple language processing. This is even more common of recent years thanks to the improved RegEx support in frameworks such as .NET. In many cases developers may not even know of lexing/parsing techniques and so fall into usng Regex by default.
However, as another answer says, Regex can quickly become inefficient, slow and difficult to maintain for anything more than a simple grammar/language. In that situation parser/lexers are generally the best choice.
是的,我已经在现实世界中使用过它们 - 但大多数情况下,您使用词法分析器和解析器创建的自定义语言已被 XML 中定义的语言所取代。 更详细,但你不必做所有这些工作......
Yes, I've used them in real world stuff - but mostly the creation of custom languages that you use lexers and parsers for, has been supplanted by languages defined in XML. More verbose but then you don't have to do all that work...
Yes, I've used them. I'm a big fan of ANTLR. I give some tips and tricks on using ANTLR here and a brief endorsement of it here. It's possible to hand write your own parser using ad hoc methods but it is a lot harder and will take a lot longer to figure out how to make changes when you need to grow the language that your parser is supposed to parse.
Any place you handle text input ends up using some kind of lexer/parser although some times they end up being the degenerate case (lex anything but a comma as one token type and a comma as another. Parse A number, a name, a number and an end of line. That sort of thing) In one way of looking at it sscanf could be considered to be the most degenerate cases of a lexer/parser generator.
As for a full blown lex/yacc operation? I expect that gets used mostly for GPLs and for things that fall under the loose definition of DSLs
Any time there exists a static document (e.g., a file), or a dynamic document (e.g., a stream occuring over time), and that document has any kind of structure, you will find yourself needing some kind of parser. For simple enough structures, you can get by with ad hoc parsing (string hacking, regexes, etc.). For structures which do not nest, you can get by with a finite state machine; here a lexer generator is often helpful. For complex structures, you pretty much an organized parser. You can write parsers by hand if you are familiar with recursive descent style parsing. For really complex structures, a parser generator is almost always a big win.
If you want to process a computer langauge, you pretty much need lexers and parsers as a starting place. They aren't enough; you have to do something with the parser result.
A great example of a lexer/parser that is in use in many systems exists in Apache Lucene (an open source Search Index library). Both the query parser and the document tokenizer use these techs. While I guess you could categorize the query parser in Lucene as a dsl parser, it is still being used to help solve a real world problem.
For that matter, I'm sure that Google is employing some sort of lexer/parser for it's own query syntax and document parsing.
I just wrote a lexer/parser by hand to allow simple string-based query expressions to be handled by an IBindingListView implementation. That was the first useful thing outside of code that I have actually been able to use it for, and not just heard about it.
Pretty pedestrian example, but I'm pretty pedestrian in my experience with them.
I have not used one of the big guys to do any lexical analysis yet, I have however written my own lexer by hand for a project I worked on. We had to parse data that came back from a Near Space project's data computer and it was written to the SD card in binary. I had to pull the bits apart, convert them from binary to decimal and then write the entire contents out in a comma separated file.
It is a lot of fun to sit and think through it logically and write a state machine for the task at hand!
Yes! The team I work with has implemented a document generation framework, which among other things allows (mostly arithmetic) expressions to be evaluated. We're using a parser to extract expressions from the inputs/definitions for the generated documents and create expression trees for them. Afterwards those trees are evaluated and the evaluated results are written to the final document.
发布评论
评论(10)
是的,我用过它们。 是的,没有它们你也可以做事——但是任何时候你选择了错误的工具来完成工作,你都会给自己带来不必要的痛苦。
我个人将这项技术用于非标准用途的一些例子:
Yes, I've used them. Yes, you can do things with out them--but any time you choose the wrong tool for the job, you'll be making needless pain for yourself.
Some example of the non-standard uses I've personally put the technology to:
语法突出显示。 Scite 文本编辑器允许您编写自己的词法分析器(用 C++ 编写)以提供语法突出显示任何自定义语言。 我为 Scite 编写了自己的自定义词法分析器,作为该主题的复习(我不久前在大学学习过)。
正则表达式通常用作模式匹配和简单语言处理的替代方法。 近年来,由于 .NET 等框架中改进的正则表达式支持,这种情况更加常见。 在许多情况下,开发人员甚至可能不知道词法分析/解析技术,因此默认使用正则表达式。
然而,正如另一个答案所说,除了简单的语法/语言之外,正则表达式很快就会变得低效、缓慢且难以维护。 在这种情况下,解析器/词法分析器通常是最佳选择。
Syntax highlighting. The Scite text editor allows you to write your own lexer (in C++) to provide syntax highlighting for any custom language. I wrote my own custom lexer for Scite as a refresher on this topic (I studied it a while ago at my university).
Regular Expressions are often used as an alternative for pattern matching and simple language processing. This is even more common of recent years thanks to the improved RegEx support in frameworks such as .NET. In many cases developers may not even know of lexing/parsing techniques and so fall into usng Regex by default.
However, as another answer says, Regex can quickly become inefficient, slow and difficult to maintain for anything more than a simple grammar/language. In that situation parser/lexers are generally the best choice.
是的,我已经在现实世界中使用过它们 - 但大多数情况下,您使用词法分析器和解析器创建的自定义语言已被 XML 中定义的语言所取代。 更详细,但你不必做所有这些工作......
Yes, I've used them in real world stuff - but mostly the creation of custom languages that you use lexers and parsers for, has been supplanted by languages defined in XML. More verbose but then you don't have to do all that work...
是的,我用过它们。 我是 ANTLR 的忠实粉丝。 我在此处提供了一些使用 ANTLR 的提示和技巧以及此处对其进行简短认可。 可以使用临时方法手动编写自己的解析器,但是当您需要扩展解析器应该解析的语言时,这要困难得多,并且需要更长的时间才能弄清楚如何进行更改。
Yes, I've used them. I'm a big fan of ANTLR. I give some tips and tricks on using ANTLR here and a brief endorsement of it here. It's possible to hand write your own parser using ad hoc methods but it is a lot harder and will take a lot longer to figure out how to make changes when you need to grow the language that your parser is supposed to parse.
处理文本输入的任何地方最终都会使用某种词法分析器/解析器,尽管有时它们最终会成为退化的情况(lex 除了逗号之外的任何内容作为一种标记类型,而逗号作为另一种标记类型。解析数字、名称、数字以及行尾之类的事情)从某种角度来看,
sscanf
可以被认为是词法分析器/解析器生成器的最退化情况。至于完整的 lex/yacc 操作? 我预计它主要用于 GPL 以及属于松散定义的事物DSL
Any place you handle text input ends up using some kind of lexer/parser although some times they end up being the degenerate case (lex anything but a comma as one token type and a comma as another. Parse A number, a name, a number and an end of line. That sort of thing) In one way of looking at it
sscanf
could be considered to be the most degenerate cases of a lexer/parser generator.As for a full blown lex/yacc operation? I expect that gets used mostly for GPLs and for things that fall under the loose definition of DSLs
任何时候存在静态文档(例如,文件)或动态文档(例如,随时间发生的流),并且该文档具有任何类型的结构,您都会发现自己需要某种解析器。 对于足够简单的结构,您可以使用临时解析(字符串黑客、正则表达式等)。 对于不嵌套的结构,可以使用有限状态机; 词法分析器生成器通常很有用。 对于复杂的结构,你几乎是一个有组织的解析器。 如果您熟悉递归下降风格解析,则可以手动编写解析器。 对于真正复杂的结构,解析器生成器几乎总是一个巨大的胜利。
如果您想处理计算机语言,则非常需要词法分析器和解析器作为起点。 它们还不够; 你必须对解析器结果做一些事情。
我们所做的词法分析和解析的一个非常引人注目的用法是翻译 JOVIAL,
20 世纪 60 年代的语言,转换为 C,用于 B-2 隐形轰炸机。
请参阅http://www.semdesigns.com/Products/Services/NorthropGrummanB2.html
Any time there exists a static document (e.g., a file), or a dynamic document (e.g., a stream occuring over time), and that document has any kind of structure, you will find yourself needing some kind of parser. For simple enough structures, you can get by with ad hoc parsing (string hacking, regexes, etc.). For structures which do not nest, you can get by with a finite state machine; here a lexer generator is often helpful. For complex structures, you pretty much an organized parser. You can write parsers by hand if you are familiar with recursive descent style parsing. For really complex structures, a parser generator is almost always a big win.
If you want to process a computer langauge, you pretty much need lexers and parsers as a starting place. They aren't enough; you have to do something with the parser result.
A really spectacular usage of lexing and parsing that we did is to translate JOVIAL,
a 1960s language, into C, for the B-2 stealth bomber.
See http://www.semdesigns.com/Products/Services/NorthropGrummanB2.html
Apache Lucene(一个开源搜索索引库)是许多系统中使用的词法分析器/解析器的一个很好的例子。 查询解析器和文档标记器都使用这些技术。 虽然我猜您可以将 Lucene 中的查询解析器归类为 dsl 解析器,但它仍然被用来帮助解决现实世界的问题。
就此而言,我确信 Google 正在使用某种词法分析器/解析器来实现自己的查询语法和文档解析。
A great example of a lexer/parser that is in use in many systems exists in Apache Lucene (an open source Search Index library). Both the query parser and the document tokenizer use these techs. While I guess you could categorize the query parser in Lucene as a dsl parser, it is still being used to help solve a real world problem.
For that matter, I'm sure that Google is employing some sort of lexer/parser for it's own query syntax and document parsing.
这很有趣 -
我只是手工编写了一个词法分析器/解析器,以允许 IBindingListView 实现处理简单的基于字符串的查询表达式。 这是代码之外的第一个有用的东西,我实际上能够使用它,而不仅仅是听说它。
很平淡的例子,但我对他们的经历却很平淡。
This is interesting -
I just wrote a lexer/parser by hand to allow simple string-based query expressions to be handled by an IBindingListView implementation. That was the first useful thing outside of code that I have actually been able to use it for, and not just heard about it.
Pretty pedestrian example, but I'm pretty pedestrian in my experience with them.
我还没有使用任何大佬来进行任何词法分析,但是我已经为我从事的项目手动编写了自己的词法分析器。 我们必须解析从近太空项目的数据计算机返回的数据,并将其以二进制形式写入 SD 卡。 我必须将这些位分开,将它们从二进制转换为十进制,然后将整个内容写在逗号分隔的文件中。
坐下来逻辑地思考并为手头的任务编写一个状态机是非常有趣的!
I have not used one of the big guys to do any lexical analysis yet, I have however written my own lexer by hand for a project I worked on. We had to parse data that came back from a Near Space project's data computer and it was written to the SD card in binary. I had to pull the bits apart, convert them from binary to decimal and then write the entire contents out in a comma separated file.
It is a lot of fun to sit and think through it logically and write a state machine for the task at hand!
是的! 我合作的团队已经实现了一个文档生成框架,其中允许对(主要是算术)表达式进行求值。 我们使用解析器从生成文档的输入/定义中提取表达式,并为它们创建表达式树。 然后对这些树进行评估,并将评估结果写入最终文档。
Yes! The team I work with has implemented a document generation framework, which among other things allows (mostly arithmetic) expressions to be evaluated. We're using a parser to extract expressions from the inputs/definitions for the generated documents and create expression trees for them. Afterwards those trees are evaluated and the evaluated results are written to the final document.