Perl、Python、AWK 和 sed 之间有什么区别?
它们之间的主要区别是什么? 每种语言在哪些典型场景下使用效果更好?
What are the main differences among them? And in which typical scenarios is it better to use each language?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
按照出现的顺序,这些语言是
sed
、awk
、perl
、python
。sed
程序是一个流编辑器,旨在将脚本中的操作应用到输入文件的每一行(或者更一般地说,应用到指定的行范围)。 它的语言基于Unix编辑器ed
,虽然它有条件等,但很难处理复杂的任务。 你可以用它创造一些小奇迹——但代价是你头上的头发。 然而,在尝试其职责范围内的任务时,它可能是最快的程序。 (它具有所讨论的程序中最不强大的正则表达式 - 足以满足多种目的,但肯定不是 PCRE - Perl 兼容的正则表达式)awk
程序(名称来自其作者的姓名首字母 - Aho, Weinberger 和 Kernighan)最初是一个用于格式化报告的工具。 它可以用作增强版sed
; 在其最新版本中,它在计算上是完整的。 它使用了一个有趣的想法 - 该程序基于“模式匹配”和“模式匹配时采取的操作”。 这些模式相当强大(扩展正则表达式)。 操作的语言与 C 类似。awk
的主要功能之一是它自动将输入拆分为记录,并将每个记录拆分为字段。Perl 的一部分是作为 awk 杀手和 sed 杀手编写的。 它提供的两个程序是
a2p
和s2p
,用于将awk
脚本和sed
脚本转换为 Perl。 Perl 是最早的下一代脚本语言之一(Tcl/Tk 可能占据主导地位)。 它具有强大的集成正则表达式处理功能和更强大的语言。 它提供对几乎所有系统调用的访问,并具有 CPAN 模块的可扩展性。 (awk
和sed
都不可扩展。)Perl 的座右铭之一是“TMTOWTDI - 实现它的方法不止一种”(发音为“tim-toady”)。 Perl 有“对象”,但它更多的是一个附加组件,而不是该语言的基本部分。Python 是最后编写的,可能部分是对 Perl 的反应。 它有一些有趣的语法思想(缩进来指示级别 - 没有大括号或等效项)。 它比 Perl 更本质上是面向对象的; 它与 Perl 一样可扩展。
好的 - 何时使用每一个?
我不知道有什么 Perl 可以做而 Python 不能做的事情,反之亦然。 两者之间的选择将取决于其他因素。 我在 Python 出现之前就学过 Perl,所以我倾向于使用它。 Python 的语法较少,通常更容易学习。 Perl 6 当它可用时,将是一个令人着迷的发展。
(请注意,特别是 Perl 和 Python 的“概述”非常不完整;关于这个主题可以写整本书。)
In order of appearance, the languages are
sed
,awk
,perl
,python
.The
sed
program is a stream editor and is designed to apply the actions from a script to each line (or, more generally, to specified ranges of lines) of the input file or files. Its language is based oned
, the Unix editor, and although it has conditionals and so on, it is hard to work with for complex tasks. You can work minor miracles with it - but at a cost to the hair on your head. However, it is probably the fastest of the programs when attempting tasks within its remit. (It has the least powerful regular expressions of the programs discussed - adequate for many purposes, but certainly not PCRE - Perl-Compatible Regular Expressions)The
awk
program (name from the initials of its authors - Aho, Weinberger, and Kernighan) is a tool initially for formatting reports. It can be used as a souped-upsed
; in its more recent versions, it is computationally complete. It uses an interesting idea - the program is based on 'patterns matched' and 'actions taken when the pattern matches'. The patterns are fairly powerful (Extended Regular Expressions). The language for the actions is similar to C. One of the key features ofawk
is that it splits the input automatically into records and each record into fields.Perl was written in part as an awk-killer and sed-killer. Two of the programs provided with it are
a2p
ands2p
for convertingawk
scripts andsed
scripts into Perl. Perl is one of the earliest of the next generation of scripting languages (Tcl/Tk can probably claim primacy). It has powerful integrated regular expression handling with a vastly more powerful language. It provides access to almost all system calls and has the extensibility of the CPAN modules. (Neitherawk
norsed
is extensible.) One of Perl's mottos is "TMTOWTDI - There's more than one way to do it" (pronounced "tim-toady"). Perl has 'objects', but it is more of an add-on than a fundamental part of the language.Python was written last, and probably in part as a reaction to Perl. It has some interesting syntactic ideas (indenting to indicate levels - no braces or equivalents). It is more fundamentally object-oriented than Perl; it is just as extensible as Perl.
OK - when to use each?
I'm not aware of anything that Perl can do that Python can't, nor vice versa. The choice between the two would depend on other factors. I learned Perl before there was a Python, so I tend to use it. Python has less accreted syntax and is generally somewhat simpler to learn. Perl 6, when it becomes available, will be a fascinating development.
(Note that the 'overviews' of Perl and Python, in particular, are woefully incomplete; whole books could be written on the topic.)
在掌握了几十种语言之后,人们厌倦了针对工具的绝对推荐,例如这个答案中关于
sed< /code> 和
awk
。Sed 是极其简单的命令行管道的最佳工具。 在 sed 大师手中,它适用于任意复杂的一次性情况,但除了非常简单的替换管道之外,它不应该在生产代码中使用。 像“这个/那个/”之类的东西。
当只有一个输入源和一个输出(或者顺序写入的多个输出)时,Gawk(GNU awk)是迄今为止复杂数据重新格式化的最佳选择。 由于现实世界中的大量工作都符合这种描述,而且一个优秀的程序员可以在两个小时内学会 gawk,所以它是最好的选择。 在这个星球上,越简单、越快越好!
当您有非常复杂的输入/输出场景时,Perl 或 Python 比任何版本的 awk 或 sed 都要好得多。 从维护和可读性的角度来看,问题越复杂,使用 python 就越好。 但是请注意,一个好的程序员可以用任何语言编写可读的代码,而一个糟糕的程序员可以用任何有用的语言编写无法维护的废话,因此如果程序员是 Perl 或 Python 的选择可以安全地留给程序员的偏好熟练而聪明。
After mastering a few dozen languages, one gets tired of absolute recommendations against tools, like in this answer regarding
sed
andawk
.Sed is the best tool for extremely simple command-line pipelines. In the hands of a sed master, it's suitable for one-offs of arbitrary complexity, but it should not be used in production code except in very simple substitution pipelines. Stuff like 's/this/that/.'
Gawk (the GNU awk) is by far the best choice for complex data reformatting when there is only a single input source and a single output (or, multiple outputs sequentially written). Since a great deal of real-world work conforms to this description, and a good programmer can learn gawk in two hours, it is the best choice. On this planet, simpler and faster is better!
Perl or Python are far better than any version of awk or sed when you have very complex input/output scenarios. The more complex the problem is, the better off you are using python, from a maintenance and readability standpoint. Note, however, that a good programmer can write readable code in any language, and a bad programmer can write unmaintainable crap in any useful language, so the choice of perl or python can safely be left to the preferences of the programmer if said programmer is skilled and clever.
我不会将 sed 称为成熟的编程语言,它是一种流编辑器,其语言结构旨在以编程方式编辑文本文件。
Awk 更像是一种通用语言,但它仍然最适合文本处理。
Perl 和 Python 是成熟的通用编程语言。 Perl 起源于文本处理,并具有许多类似 awk 的结构(网上甚至流传着 awk-to-perl 脚本)。 Perl 和 Python 之间有很多差异,您最好的选择可能是在 Wikipedia 等网站上阅读这两种语言的摘要,以很好地掌握它们的含义。
I wouldn't call sed a fully-fledged programming language, it is a stream editor with language constructs aimed at editing text files programmatically.
Awk is a little more of a general purpose language but it is still best suited for text processing.
Perl and Python are fully fledged, general purpose programming languages. Perl has its roots in text processing and has a number of awk-like constructs (there is even an awk-to-perl script floating around on the net). There are many differences between Perl and Python, your best bet is probably to read the summaries of both languages on something like Wikipedia to get a good grasp on what they are.
首先,列表中有两个不相关的东西“Perl、Python awk 和 sed”。
第 1 件事 - 简单的文本操作工具。
sed. 它有一个固定的、相对简单的工作范围,其定义是读取和检查文件的每一行。 sed 的设计目的不是特别可读。 它被设计得非常小,并且在非常小的 UNIX 服务器上非常高效。
awk. 它的工作范围稍微不太固定、不太简单。 然而,awk 程序的主循环是通过隐式读取源文件的行来定义的。
这些不是“完整的”编程语言。 虽然您可以通过一些工作在 awk 中编写相当复杂的程序,但它很快就会变得复杂且难以阅读。
第 2 件事——通用编程语言。 它们具有丰富的语句类型、大量的内置数据结构,并且没有内置的假设或捷径可言。
Perl。
Python。
何时使用它们。
sed. 绝不。 在内存超过32K的现代计算机中确实没有任何价值。 Perl 或 Python 可以更清楚地完成相同的操作。
awk. 绝不。 与 sed 一样,它反映了早期的计算时代。 与其维护这种语言(除了成功系统所需的所有其他语言之外),简单地用一种令人愉快的语言完成所有事情会更令人愉快。
Perl。 任何类型的任何编程问题。 如果您喜欢自由思考的语法,即有很多很多方法可以完成同一件事,那么 Perl 很有趣。
Python。 任何类型的任何编程问题。 如果您喜欢相当有限的语法,其中选择较少,不那么微妙,而且(也许)更清晰。 Python 面向对象的本质使其更适合解决大型、复杂的问题。
背景——我并不是因为无知而抨击 sed 和 awk。 我 20 多年前就学会了 awk。 用它做了很多事情; 用于将其作为核心 Unix 技能进行教授。 我大约 15 年前学习了 Perl。 用它做了很多复杂的事情。 我把两者都抛在了后面,因为我可以用 Python 做同样的事情——而且它更简单、更清晰。
sed 和 awk 有两个严重的问题,这两个问题都不是它们的年龄。
其实施不完整。 sed 和 awk 所做的一切都可以用 Python 或 Perl 完成,通常更简单,有时也更快。 shell 管道由于其多处理而具有一些性能优势。 Python 提供了一个
subprocess
模块来让我恢复这些优势。需要学习另一种语言。 通过使用 Python(或 Perl)执行操作,您的实现依赖于更少的语言,从而提高了清晰度。
First, there are two unrelated things in the list "Perl, Python awk and sed".
Thing 1 - simplistic text manipulation tools.
sed. It has a fixed, relatively simple scope of work defined by the idea of reading and examining each line of a file. sed is not designed to be particularly readable. It is designed to be very small and very efficient on very tiny unix servers.
awk. It has a slightly less fixed, less simple scope of work. However, the main loop of an awk program is defined by the implicit reading of lines of a source file.
These are not "complete" programming languages. While you can -- with some work -- write fairly sophisticated programs in awk, it rapidly gets complicated and difficult to read.
Thing 2 - general-purposes programming languages. These have a rich variety of statement types, numerous built-in data structures, and no wired-in assumptions or shortcuts to speak of.
Perl.
Python.
When to use them.
sed. Never. It really doesn't have any value in the modern era of computers with more than 32K of memory. Perl or Python do the same things more clearly.
awk. Never. Like sed, it reflects an earlier era of computing. Rather than maintain this language (in addition to all the other required for a successful system), it's more pleasant to simply do everything in one pleasant language.
Perl. Any programming problem of any kind. If you like free-thinking syntax, where there are many, many ways to do the same thing, perl is fun.
Python. Any programming problem of any kind. If you like fairly limited syntax, where there are fewer choices, less subtlety, and (perhaps) more clarity. Python's object-oriented nature makes it more suitable for large, complex problems.
Background -- I'm not bashing sed and awk out of ignorance. I learned awk over 20 years ago. Did many things with it; used to teach it as a core unix skill. I learned Perl about 15 years ago. Did many sophisticated things with it. I've left both behind because I can do the same things in Python -- and it is simpler and more clear.
There are two serious problems with sed and awk, neither of which are their age.
The incompleteness of their implementation. Everything sed and awk do can be done in Python or Perl, often more simply and sometimes faster, too. A shell pipeline has some performance advantages because of its multi-processing. Python offers a
subprocess
module to allow me to recover those advantages.The need to learn yet another language. By doing things in Python (or Perl) your implementation depends on fewer languages, with a resulting increase in clarity.
何时使用它们:awk - 从不 - S. Lott。
我认为 S. Lott 的这个建议有点不切实际。 事实上,在 Linux 和其他 UNIX 环境中,awk 是一个非常有用的工具,可以与 bash、sh 和 ksh 一起使用来进行快速文本处理。 脚本编写本身的想法是通过将这个工具、那个工具粘合在一起来解决你的问题。 因此,在管理脚本中,常见的是 ls、grep、|、awk、time、ps 等。每一个都是脚本编写者像构建器一样一砖一瓦地组合起来完成构建的工具(以解决手头的问题) 。
例如,我是管理 彩弹装备供应 dotcom 团队的成员。 该电子商务网站基于 LAMP 堆栈。 为了自动处理和规范从不同供应商输入后端数据库的数据,我们使用和维护多样化的脚本组合,包括 bash、perl、php,甚至expect。 基于可用的模块和 API,每种方法都有其优势。 在 bash 脚本中,我们使用 awk 进行快速模式匹配并根据需要对模式执行适当的操作,而无需切换到 PERL。 我还想指出的一件事是,其中相当多的脚本是购买的或从开源获取的,这一点在线程中没有强调。 如果脚本以 Perl 形式出现,我们将其保留为 Perl; 如果脚本是 Php,我们将其维护为 Php; 如果它以 bash 形式出现,我们将其保留为 bash; 我们不会仅仅因为认为原始语言效率较低而用另一种语言重写它。
When to use them: awk - never - S. Lott.
I think S. Lott slightly missed the mark with this recommendation. The fact is, on Linux and the other UNIX environments, awk is a useful tool to be used with bash, sh, and ksh for quick text processings. The idea of scripting itself is you solve your problem by gluing together this tool, that tool. Hence in admin scripts, it is common to has ls, grep, |, awk, time, ps, etc. Each is a tool that the scripter combines like a builder brick by brick to finish the building (to solve the problem at hand).
For instance I am a team member of the team managing paintball gear supplies dotcom. This e-commerce site is based on the LAMP stack. For automated processing and normalizing data feeds from various suppliers into the back end database, we employ and maintain a diversified mix of scripts, including bash, perl, php, and even expect. Each has its strengths based on the available modules and API. In the bash scripts we do quick patterns match and appropriate actions on the patterns as needed using awk without the need to switch to PERL. One thing I would also like to point out, which has not been emphasized in the thread, is that a fair number of these scripts were purchased, or gotten from the open source. If the script came as Perl, we maintain it as Perl; if the script came as Php, we maintain it as Php; if it came as bash, we maintain it as bash; we do not re-write it in another language just because we think it is less efficient in the original language.