为什么正则表达式不能使用关键字而不是字符?
好吧,我几乎不了解 RegEx 基础知识,但为什么他们不能将其设计为使用关键字(如 SQL)而不是一些神秘的通配符和符号呢?
由于正则表达式是在运行时解释/解析的,所以是为了性能吗? (未编译)
或者也许是为了写作速度? 考虑到当您学习一些“简单”字符组合时,输入 1 个字符而不是关键字会变得更容易吗?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(14)
你真的想要这个吗?
好吧,但是这是你的葬礼,伙计。
在此处下载执行此操作的库:
http://flimflan.com/blog/ReadableRegularExpressions.aspx
You really want this?
Ok, but it's your funeral, man.
Download the library that does this here:
http://flimflan.com/blog/ReadableRegularExpressions.aspx
正则表达式具有数学(实际上是语言理论)背景,其编码有点像数学公式。 您可以通过一组规则来定义它们,例如
a
和b
是正则表达式,则表示其自身,则a?< /code>、
a|b
和ab
也是正则表达式使用基于关键字的语言对于简单的正则表达式来说将是一个很大的负担。 大多数时候,您只会使用简单的文本字符串作为搜索模式:
或者可能非常简单的模式:
一旦您习惯了正则表达式,这种语法就非常清晰和精确。 在更复杂的情况下,您可能会使用其他东西,因为大的正则表达式显然难以阅读。
Regular expressions have a mathematical (actually, language theory) background and are coded somewhat like a mathematical formula. You can define them by a set of rules, for example
a
andb
are regular expressions, thena?
,a|b
andab
are regular expressions, tooUsing a keyword-based language would be a great burden for simple regular expressions. Most of the time, you will just use a simple text string as search pattern:
Or maybe very simple patterns:
Once you get used to regular expressions, this syntax is very clear and precise. In more complicated situations you will probably use something else since a large regular expression is obviously hard to read.
Perl 6 在正则表达式可读性方面向前迈出了相当革命性的一步。 考虑以下形式的地址:
100 E Main St Springfield MA 01234
这是一个可读性中等的 Perl 5 兼容正则表达式来解析(许多极端情况未处理):
此 Perl 6 正则表达式具有相同的行为:
Perl 6 语法是一个类,并且标记都是可调用的方法。 像这样使用它:
这个例子来自我在演讲 ="http://www.frozen-perl.org/mpw2009/" rel="nofollow noreferrer">Frozen Perl 2009 研讨会。 Perl 6 的 Rakudo 实现足够完整,这个示例现在仍然可以运行。
Perl 6 is taking a pretty revolutionary step forward in regex readability. Consider an address of the form:
100 E Main St Springfield MA 01234
Here's a moderately-readable Perl 5 compatible regex to parse that (many corner cases not handled):
This Perl 6 regex has the same behavior:
A Perl 6 grammar is a class, and the tokens are all invokable methods. Use it like this:
This example comes from a talk I presented at the Frozen Perl 2009 workshop. The Rakudo implementation of Perl 6 is complete enough that this example works today.
那么,如果您有关键字,您如何轻松地将它们与实际匹配的文本区分开来? 你会如何处理空白?
源文本
公司:A 部门:B
标准正则表达式:
或者甚至:
关键字正则表达式(非常努力地没有找到稻草人......)
或者简化:
不,这可能不会更好。
Well, if you had keywords, how would you easily differentiate them from actually matched text? How would you handle whitespace?
Source text
Company: A Dept.: B
Standard regex:
Or even:
Keyword regex (trying really hard not get a strawman...)
Or simplified:
No, it's probably not better.
因为它对应于形式语言理论和数学符号。
Because it corresponds to formal language theory and it's mathematic notation.
这是珀尔的错......!
实际上,更具体地说,正则表达式来自早期的 Unix 开发,当时简洁的语法更加受重视。 存储、处理时间、物理终端等都非常有限,与今天不同。
维基百科上正则表达式的历史解释了更多信息。
正则表达式还有其他替代方案,但我不确定是否有任何替代方案真正流行起来。
编辑:John Saunders 更正:正则表达式由 Unix 流行,但首先由 QED 编辑器。 同样的设计限制也适用于早期的系统,甚至更是如此。
It's Perl's fault...!
Actually, more specifically, Regular Expressions come from early Unix development, and concise syntax was a lot more highly valued then. Storage, processing time, physical terminals, etc were all very limited, rather unlike today.
The history of Regular Expressions on Wikipedia explains more.
There are alternatives to Regex, but I'm not sure any have really caught on.
EDIT: Corrected by John Saunders: Regular Expressions were popularised by Unix, but first implemented by the QED editor. The same design constraints applied, even more so, to earlier systems.
事实上,不,世界并不是从 Unix 开始的。 如果你阅读维基百科文章,你会发现
Actually, no, the world did not begin with Unix. If you read the Wikipedia article, you'll see that
这比 PERL 早得多。 关于正则表达式的维基百科条目将正则表达式的第一个实现归功于 UNIX 的 Ken Thompson名声大噪,他在 QED 中实现了它们,然后在 ed 编辑器中实现了它们。 我猜想这些命令出于性能原因而具有简短的名称,但早在客户端之前。 掌握正则表达式是一本关于正则表达式的好书,它提供了注释正则表达式的选项(使用 /x标志)以使其更易于阅读和理解。
This is much earlier than PERL. The Wikipedia entry on Regular Expressions attributes the first implementations of regular expressions to Ken Thompson of UNIX fame, who implemented them in the QED and then the ed editor. I guess that the commands had short names for performance reasons, but much before being client-side. Mastering Regular Expressions is a great book about regular expressions, which offers the option to annotate a regular expression (with the /x flag) to make it easier to read and understand.
因为正则表达式的理念(就像许多源自 UNIX 的东西一样)是简洁的,注重简洁性而不是可读性。 这其实是一件好事。 我最终编写了 15 行长的正则表达式(与我更好的判断相反)。 如果它有详细的语法,那么它就不是正则表达式,而是一个程序。
Because the idea of regular expressions--like many things that originate from UNIX--is that they are terse, favouring brevity over readability. This is actually a good thing. I've ended up writing regular expressions (against my better judgement) that are 15 lines long. If that had a verbose syntax it wouldn't be a regex, it'd be a program.
实际上,实现“更冗长”形式的正则表达式非常容易 - 请在此处查看我的答案。 简而言之:编写一些返回正则表达式字符串的函数(并在必要时接受参数)。
It's actually pretty easy to implement a "wordier" form of regex -- please see my answer here. In a nutshell: write a handful of functions that return regex strings (and take parameters if necessary).
我认为关键字不会带来任何好处。 正则表达式本身很复杂,但也非常强大。
我认为更令人困惑的是,每个支持库都发明了自己的语法,而不是使用(或扩展)经典的 Perl 正则表达式(例如 \1、$1、{1}、...用于替换和更多示例)。
I don't think keywords would give any benefit. Regular expressions as such are complex but also very powerful.
What I think is more confusing is that every supporting library invents its own syntax instead of using (or extending) the classic Perl regex (e.g. \1, $1, {1}, ... for replacements and many more examples).
我知道它以错误的方式回答你的问题,但是 RegExBuddy 有一个功能可以用简单的英语解释你的正则表达式。 这可能会让学习变得更容易一些。
I know its answering your question the wrong way around, but RegExBuddy has a feature that explains your regexpression in plain english. This might make it a bit easier to learn.
如果您使用的语言支持 Posix 正则表达式,您就可以使用它们。
一个例子:
与 相同
括号符号对于它匹配的内容更加清晰。 我仍然会学习“神秘的通配符和符号,因为您仍然会在其他人的代码中看到它们并且需要理解它们。
正则表达式.info 页面上的表格。
If the language you are using supports Posix regexes, you can use them.
An example:
would be the same as
The bracket notation is much clearer on what it is matching. I would still learn the "cryptic wildcard characters and symbols, since you will still see them in other people's code and need to understand them.
There are more examples in the table on regular-expressions.info's page.
由于某种原因,我之前的回答被删除了。 无论如何,我认为 ruby regexp 机器符合要求,位于 http://www.rubyregexp.sf.net。 这是我自己的项目,但我认为它应该可行。
For some reason, my previous answer got deleted. Anyway, i thing ruby regexp machine would fit the bill, at http://www.rubyregexp.sf.net. It is my own project, but i think it should work.