了解正则表达式对于构建语言有好处吗?

发布于 2024-08-12 21:49:58 字数 147 浏览 5 评论 0原文

我正在阅读Flex &来自 O'Reilly 的 Bison,想知道预先学习正则表达式是否有助于开发编程语言?

I'm reading Flex & Bison from O'Reilly, and would like to know if learning Regular Expressions beforehand will help in developing a programming language?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

梦开始←不甜 2024-08-19 21:49:59

正则表达式可以是使用形式语言理论定义,所以它们是互补的概念。

在开始构建一种语言之前,最好充分理解正则表达式和形式语言理论。

因此,回答您的布尔问题:

Regular expressions can be defined using formal language theory, so they're complementary concepts.

It would be a good idea to have a good understanding of both regular expressions and formal language theory before starting to build a language.

So to answer your Boolean question: Yes.

晨敛清荷 2024-08-19 21:49:59

传统编程语言的正则表达式语法非常简单,因此,严格来说,您不需要成为正则表达式专家来编写编译器。另一方面,正则表达式属于基本编程技能,所以我想说你需要了解它们......几乎所有事情。

Regular expressions for conventional programming languages' syntax are quite simple, so, strictly speaking, you don't need to be a regexp expert to write a compiler. On the other side, regexp belongs to basic programming skills, so I'd say you need to know them... for pretty much everything.

太傻旳人生 2024-08-19 21:49:59

我会这么说。听起来您好像在 Flex & 的示例 1.3 中遇到过 Flex 扫描仪。 Bison(第 5 页):

/* recognize tokens for the calculator and print them out */
%%
"+"      { printf("PLUS\n"); }
"-"      { printf("MINUS\n"); }
"*"      { printf("TIMES\n"); }
"/"      { printf("DIVIDE\n"); }
"|"      { printf("ABS\n"); }
[0-9]+   { printf("NUMBER %s\n", yytext); }
\n       { printf("NEWLINE\n"); }
[ \t]    { }
.        { printf("Mystery character %s\n", yytext); }
%%

正如您所见,数字、空格和神秘字符是使用简单的正则表达式定义的(好吧,其他的也是如此,但它们不是很有趣)。您的编程语言无疑会使用其他正则表达式(例如,考虑十六进制文字、八进制文字、浮点/双精度和 C/C++/Java 中的注释的标记)。它们对于一般编程来说也是一种有用的技术,所以我现在就继续学习一些关于它们的知识。

I'd say so. Sounds like you've run across the Flex scanner in Example 1.3 of Flex & Bison (p. 5):

/* recognize tokens for the calculator and print them out */
%%
"+"      { printf("PLUS\n"); }
"-"      { printf("MINUS\n"); }
"*"      { printf("TIMES\n"); }
"/"      { printf("DIVIDE\n"); }
"|"      { printf("ABS\n"); }
[0-9]+   { printf("NUMBER %s\n", yytext); }
\n       { printf("NEWLINE\n"); }
[ \t]    { }
.        { printf("Mystery character %s\n", yytext); }
%%

As you've seen, NUMBER, whitespace, and mystery character are defined using simple regular expressions (well, the others are too, but they're not very interesting). Your programming language will doubtless use other regexes (eg, think about tokens for hex literals, octal literals, float/doubles and comments in C/C++/Java). They're also a useful technique for programming in general, so I'd go ahead and learn something about them now.

后来的我们 2024-08-19 21:49:59

如果您正在创建解释性语言,则可以使用正则表达式来识别一行代码中的各种原子。

If you were creating an interpreted language, you can use regex to identify the various atoms in a line of code.

私藏温柔 2024-08-19 21:49:59

也许我偏离了轨道,因为其他回答者认为你在问 PCRE 或者其他什么。但如果你谈论的是发明一种语言,那么正则表达式与语法和其他任何东西一样重要。

正则表达式是乔姆斯基层次结构上下推自动机和确定性有限自动机之间的一个步骤,是需要了解的非常重要的内容,并且在解析任何内容(尤其是代码)时非常必要。

Maybe I'm off track because the other answerers think you're asking about PCRE or something. But if you're talking about inventing a language, then regular expressions are about as important as the syntax and anything else.

Regular Expressions are a step on the Chomsky Hierarchy between Push Down Automata and Deterministic Finite Automata, very important stuff to know about and exceptionally necessary when parsing anything, especially code.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文