正则表达式理论上足够强大,可以用来做什么?
如果您询问有关使用正则表达式解析 HTML 的问题,您肯定会参考这个著名的 咆哮。尽管没有规范的说法,但我也被告知正则表达式的功能不足以解析 SQL。
我是一个自学成才的程序员,所以从理论角度来说我对语言了解不多。实际上,正则表达式总能成功解析的语言或语法有哪些示例?
具体来说,我真的很想要一些现实世界中使用的符合常规语言类别的语言示例,而不是一些公理或等效条件等。
If you ask a question about parsing HTML with regex, you will certainly be referenced to this famous rant. Though there is not a canonical rant for it, I've also been told that regex aren't powerful enough to parse SQL.
I'm a self-taught programmer, so I don't know much about languages from a theoretical perspective. Practically speaking, what are examples of languages or grammars that regex can always parse successfully?
To be specific, I'd really like a few examples of languages that are used in the real world that fit in the category of regular languages, rather than some axioms or equivalent conditions, etc.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
它们非常适合输入验证。
它们非常适合解析结构良好的数据文件。
它们不太适合解析 html 或 sql 等语言,但可用于将语言拆分为相关标记。
正则表达式经常被滥用,并且以难以使用和理解而闻名。这种声誉大部分是靠得来的,但并非全部。
将它们用于简单的情况。在简单的情况下熟悉它们,更复杂的情况会更有意义。先走再跑。
They are great for input validations.
They are great for parsing well structured data files.
They are not great for parsing a language like html or sql, but they can be used for splitting a language into the relevent tokens.
Regex are often misused and they have a reputation for being difficult to use and understand. Much of this reputation is well earned but not all of it.
Use them for simple cases. Get comfortable with them in the simple cases, and the more complex cases will make more sense. Walk before you run.
我广泛使用正则表达式进行报告处理。 PERL(实用提取和报告语言)的反义词已被广泛用于解析来自 *nix 系统的报告。几十年来,我广泛使用 AWK(它与纯正则表达式语言非常接近)来解析日志、报告等。
正则表达式与任何其他计算机语言/函数一样,是工具箱中的一个工具。它可以解析 HTML,它可以解析 SQL,但是正则表达式编码到什么级别以及如何好。没有任何工具是完美的,但如果您使用正确的工具来完成正确的工作,那么您将始终拥有大量可用的工具。
I have used regex extensively for report processing. PERL backronymed to (Practical Extraction and Report Language) has been used extensively to parse reports from *nix systems. I have used AWK extensively (which is about as close to a regex only language as you can get) for decades to parse out logs, reports, etc.
Regex, like any other computer language/function is a tool in a toolbox. It can parse HTML, it can Parse SQL but to what level and how well was the regex coded. No tool is going to be perfect but if you use the right tool for the right job you'll always have a plethora of them available.
简而言之,正则表达式无法解析嵌套级别未知的结构(如 HTML)。因为大多数正则表达式引擎都是基于有限状态机。这限制了您的表达式只能处理预定义的状态数量。
您仍然可以使用正则表达式解析 HTML,但无法获取树中元素的当前路径之类的信息。
In short, regexps can't parse structures with unknown level of nesting (like HTML). Because most regexp engines are based on finite state machine. This limits you expression to address only predefined number of states.
You can still parse HTML with regexp, but you can't get things like current path to element in the tree.
正则表达式非常适合解析仅重复的内容。当你有递归形式时,它们就会出错。我认为最有用的是显示它无法解析的最简单的语言:
n 左括号后跟 n 右括号,例如:
(()) 和 ((((()))))
如果您知道自己无法解析它,那么您可以轻松得出结论,您无法解析大多数编程语言。
所以我认为你可以解析基本的 SQL(尽管如果你允许子查询之类的东西就不行)。正则表达式可解析字符串的其他主要示例是网址、电子邮件地址、电话号码等。
如果您正在寻找可以使用正则表达式解析的实际编程语言,您不会找到很多(尽管我认为(根据我有限的知识)然而,大多数用途都是在解析简单的字符串或行中。
Regexes are great for parsing things with only repetitions. They go wrong when you have forms of recursion. I think most useful is showing the simplest language it can't parse:
n open parenthesis followed by n close parenthesis, so for instance:
(()) and ((((()))))
If you know you cannot parse that, you can easily conclude that you cannot parse most programming languages.
So I think you could parse basic SQL (though not if you would allow stuff like subqueries). Other prime examples of regex-parseable strings are web adresses, email-adresses, phonenumbers, etc.
If you're looking for actual programming languages which one can parse using regexes you won't find many (though I think (from my limited knowledge of it) parsing assembly should be possible. Most uses however are found in parsing simple strings, or lines.