使用 elisp 处理文本
自从我转换到 Emacs 教会以来,我一直试图在它内部做所有事情,我想知道如何用它快速有效地进行一些文本处理。
作为一个例子,让我们以几分钟前我在 org-mode 上编辑的这个列表为例。
** Diego: b QI ** bruno-gil: b QI ** Koma: jo ** um: rsrs pr0n ** FelipeAugusto: esp ** GustavoPupo: pinto tr etc ** GP: lit gtk ** Alan: jo mil pc ** Jost: b hq jo 1997 ** Herbert: b rsrs pr0n ** Andre: maia mil pseudo ** Rodrigo: c ** caue: b rsrs 7arte pseudo ** kenny: cri gif ** daniel: gtk mu pr0n rsrs b ** tony: an 1997 esp ** Vitor: b jo mimimi ** raphael: b rpg 7arte ** Luca: b lit gnu pc prog mmu 7arte 1997 ** LZZ: an qt ** William: b an jo pc 1997 ** Epic: gtk ** Aldo: b pseudo pol mil fur ** GustavoKyon: an gtk ** CarlosIsaksen : an hq jo 7arte gtk 1997 ** Peter: pseudo pol mil est 1997 gtk lit lang ** leandro: b jo cb ** frederico: 7arte lit gtk ** rol: b an pseudo mimimi 7arte ** mathias: jo lit ** henrique: 1997 h gtk qt ** eumané: an qt ** walrus: cri de ** FilipePinheiro: lit pseudo ** Igor: pseudo b ** Erick: b jo rpg q 1997 gtk ** Gabriel: pr0n rsrs qt ** george: clo mimimi ** anão: hq jo 1997 rsrs clô b ** jeff: 7arte gtk ** davidatenas: an 7arte 1997 esp qt ** HHahaah: b ** Eduardo: b
它是与标签关联的名称列表,我想获取与名称关联的标签列表。
在 bash 中,我首先用单引号回显粘贴的整个内容,然后将其传输到 awk,循环遍历每一行并将其每个部分添加到正确的临时变量中,然后对其进行修改,直到它像我想要的那样。
echo '** Diego: b QI ** bruno-gil: b QI ** Koma: jo ** um: rsrs pr0n ** FelipeAugusto: esp ** GustavoPupo: pinto, tr etc ** GP: lit gtk ** Alan: jo mil pc ** Jost: b hq jo 1997 ** Herbert: b rsrs pr0n ** Andre: maia mil pseudo ** Rodrigo: c ** caue: b rsrs 7arte pseudo ** kenny: cri gif ** daniel: gtk mu pr0n rsrs b ** tony: an 1997 esp ** Vitor: b jo mimimi ** raphael: b rpg 7arte ** Luca: b lit gnu pc prog mmu 7arte 1997 ** LZZ: an qt ** William: b an jo pc 1997 ** Epic: gtk ** Aldo: b pseudo pol mil fur ** GustavoKyon: an gtk ** CarlosIsaksen : an hq jo 7arte gtk 1997 ** Peter: pseudo pol mil est 1997 gtk lit lang ** leandro: b jo cb ** frederico: 7arte lit gtk ** rol: b an pseudo mimimi 7arte ** mathias: jo lit ** henrique: 1997 h gtk qt ** eumané: an qt ** walrus: cri de ** FilipePinheiro: lit pseudo ** Igor: pseudo b ** Erick: b jo rpg q 1997 gtk ** Gabriel: pr0n rsrs qt ** george: clo mimimi ** anão: hq jo 1997 rsrs clô b ** jeff: 7arte gtk ** davidatenas: an 7arte 1997 esp qt ** HHahaah: b ** Eduardo: b ' | awk '{sub(":","");for (i=3;i<=NF;i++) members[$i] = members[$i] " " $2}; END{for (j in members) print j ": " members[j]}' | sort
...还有TA-DA!不到 2 分钟的预期输出,以直观和增量的方式完成。你能告诉我如何在 elisp 中(最好是在 emacs 缓冲区中)优雅而简单地执行类似的操作吗?
谢谢!
Since I've converted to the Church of Emacs, I've been trying to do everything from inside it, and I was wondering how to do some text processing quickly and efficiently with it.
As an example, let's take this list that I was editing some minutes ago on org-mode.
** Diego: b QI ** bruno-gil: b QI ** Koma: jo ** um: rsrs pr0n ** FelipeAugusto: esp ** GustavoPupo: pinto tr etc ** GP: lit gtk ** Alan: jo mil pc ** Jost: b hq jo 1997 ** Herbert: b rsrs pr0n ** Andre: maia mil pseudo ** Rodrigo: c ** caue: b rsrs 7arte pseudo ** kenny: cri gif ** daniel: gtk mu pr0n rsrs b ** tony: an 1997 esp ** Vitor: b jo mimimi ** raphael: b rpg 7arte ** Luca: b lit gnu pc prog mmu 7arte 1997 ** LZZ: an qt ** William: b an jo pc 1997 ** Epic: gtk ** Aldo: b pseudo pol mil fur ** GustavoKyon: an gtk ** CarlosIsaksen : an hq jo 7arte gtk 1997 ** Peter: pseudo pol mil est 1997 gtk lit lang ** leandro: b jo cb ** frederico: 7arte lit gtk ** rol: b an pseudo mimimi 7arte ** mathias: jo lit ** henrique: 1997 h gtk qt ** eumané: an qt ** walrus: cri de ** FilipePinheiro: lit pseudo ** Igor: pseudo b ** Erick: b jo rpg q 1997 gtk ** Gabriel: pr0n rsrs qt ** george: clo mimimi ** anão: hq jo 1997 rsrs clô b ** jeff: 7arte gtk ** davidatenas: an 7arte 1997 esp qt ** HHahaah: b ** Eduardo: b
It is a list of names associated with tags, and I want to get a list of tags associated with names.
In bash, I would first echo with single quotes the whole thing pasted and then pipe it to awk, looping over each line and adding each its parts to the right temporary variable and then messing with it until it is like I want.
echo '** Diego: b QI ** bruno-gil: b QI ** Koma: jo ** um: rsrs pr0n ** FelipeAugusto: esp ** GustavoPupo: pinto, tr etc ** GP: lit gtk ** Alan: jo mil pc ** Jost: b hq jo 1997 ** Herbert: b rsrs pr0n ** Andre: maia mil pseudo ** Rodrigo: c ** caue: b rsrs 7arte pseudo ** kenny: cri gif ** daniel: gtk mu pr0n rsrs b ** tony: an 1997 esp ** Vitor: b jo mimimi ** raphael: b rpg 7arte ** Luca: b lit gnu pc prog mmu 7arte 1997 ** LZZ: an qt ** William: b an jo pc 1997 ** Epic: gtk ** Aldo: b pseudo pol mil fur ** GustavoKyon: an gtk ** CarlosIsaksen : an hq jo 7arte gtk 1997 ** Peter: pseudo pol mil est 1997 gtk lit lang ** leandro: b jo cb ** frederico: 7arte lit gtk ** rol: b an pseudo mimimi 7arte ** mathias: jo lit ** henrique: 1997 h gtk qt ** eumané: an qt ** walrus: cri de ** FilipePinheiro: lit pseudo ** Igor: pseudo b ** Erick: b jo rpg q 1997 gtk ** Gabriel: pr0n rsrs qt ** george: clo mimimi ** anão: hq jo 1997 rsrs clô b ** jeff: 7arte gtk ** davidatenas: an 7arte 1997 esp qt ** HHahaah: b ** Eduardo: b ' | awk '{sub(":","");for (i=3;i<=NF;i++) members[$i] = members[$i] " " $2}; END{for (j in members) print j ": " members[j]}' | sort
... and TA-DA! The expected output in less than 2 minutes, done in an intuitive and incremental way. Can you show me how to do something like this in elisp, preferably in an emacs buffer, with elegance and simplicity?
Thanks!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
我要做的第一件事就是利用 org-mode 的标签支持。而不是
You 会使用
Which
org-mode
识别为标签“b”和“QI”。要将当前格式转换为标准
org-mode
格式,您可以使用以下内容(假设带有源的缓冲区称为“asdf”)
它并不漂亮或高效,但它可以完成工作。
之后,您可以使用以下命令生成具有以下格式的缓冲区
您想要从 shell 脚本中获取结果:
这会将结果放入名为“结果”的缓冲区中,如果没有,则创建它
已经存在。基本上,它正在收集缓冲区“asdf”中的所有标签,
对它们进行排序,然后循环遍历每个标签并搜索每个标题
该标签位于“asdf”中并将其插入到“结果”中。
经过一些清理,这可以变成一个函数;基本上只是
用参数替换“asdf”和“结果”。如果你需要这样做,我可以做
那。
The first thing I would do is to take advantage of
org-mode
's tag support. Instead ofYou would have
Which
org-mode
recognizes as the tags "b" and "QI".To transform your current format to the standard
org-mode
format, you can usethe following (assuming the buffer with your source is called "asdf")
It's not pretty or efficient, but it gets the job done.
After that, you can use the following to produce a buffer that has the format
you wanted from the shell script:
This puts the results in a buffer called "results", creating it if it doesn't
already exist. Basically, it is collecting all the tags in the buffer "asdf",
sorting them, then looping through each tag and searching for each headline with
that tag in "asdf" and inserting it to "results".
With a bit of cleaning up, this could be made into a function; basically just
replacing "asdf" and "results" with arguments. If you need that done, I can do
that.
有一个函数 shell-command-on-region 几乎可以完成它所说的功能。您可以突出显示一个区域,执行 M-|,键入 shell 命令的名称,然后数据就会通过管道传输到该命令。给它一个参数,该区域就会被命令的结果替换。
对于一个简单的示例,突出显示一个区域,输入 'Cu 0 M-| wc'(control-u、零、元管道,然后是'wc'),该区域将被替换为该区域的字符数、单词数和行数。
您可以做的另一件事是弄清楚如何操作一行,将其设为宏,然后重复运行该宏。例如,“Cx ( Cs foo Cg bar Cx )”将搜索单词“foo”,然后键入单词“bar”,将其更改为“foobar”。然后,您可以执行一次“Cu Cx e”,这将持续运行宏,直到找不到更多“foo”的出现。
There is a function shell-command-on-region that pretty much does what it says. You can highlight a region, do M-|, type the name of a shell command, and the data is piped to that command. Give it an argument and the region is replaced with the result of the command.
For a trivial example, highlight a region, type 'C-u 0 M-| wc' (control-u, zero, meta-pipe and then 'wc') and the region will be replaced with the number of characters, words and lines of that region.
Another thing you can do is figure out how to manipulate one line, make it a macro, and then run the macro repeatedly. For example, 'C-x ( C-s foo C-g bar C-x )' will search for the word "foo", then type the word "bar", changing it to "foobar". You can then do 'C-u C-x e' once which will continually run the macro until it doesn't find any more occurrences of "foo".
好的,这是我在 elisp 中的第一次尝试:
let
将其绑定到符号现在我将 foobar 更改为一些奇特的东西。
(split-string)
将文本拆分为字符串(temphash (make-hash-table :test 'equal))
)这有点令人费解,特别是双映射车,但它有点有效。这是完整的“代码”:
这是输出:
Ok, here is my first attempt in elisp:
let
Now I change foobar to something fancy.
(split-string)
(temphash (make-hash-table :test 'equal))
)It's a little mind-bending, specially the double mapcar, but it sorta works. Here is the full "code":
And here is the output:
如果您了解 *nix 管道,那么您就熟悉 函数式编程,因为函数式编程将程序视为使用函数应用程序对数据进行连续转换。还记得学校数学中的函数组合吗?基本上,g ∘ f 意味着您首先应用 f,然后立即应用 g:(g ∘ f)(x) = g(f(x))。。函数式程序是一个巨大的函数组合。而管道只是一个函数组合,只是方向相反:(g ∘数学中的 f)(x) 与 x | 相同f | g 在命令行中。
有一个第三方库
dash.el
提供了多种用于列表和树转换的函数以及简化函数方法的函数和宏。其中之一是线程宏->>
,它模仿命令行管道:因此,如果我们想通过串行应用操作来操纵文本数据,我们的函数可能如下所示:
完全执行您想要的操作的函数将如下所示:
(puthash k (cons (car it) (gethash kh)) h)
看起来很神秘,但它只是意味着哈希表中的每个键下都有一个列表,每次找到新值时都会将其附加到该列表中。因此,如果在b
下有(Diego)
并且我们发现bruno-gil
也应该在b
下,b
下的值变为(bruno-gil Diego)
。If you know *nix pipes, than you are familiar with functional programming, because functional programming treats programs as successive transformation of data using application of functions. Remember function composition from school math? Basically, g ∘ f means that you first apply f and then immediately apply g: (g ∘ f)(x) = g(f(x)). A functional program is a one giant function composition. And a pipe is just a function composition, just with an opposite direction: (g ∘ f)(x) in math is the same as
x | f | g
in command line.There is a third-party library
dash.el
that provides multitude of functions for list and tree transformations and also functions and macros that ease functional approach. One of them is a threading macro->>
, which mimics command line piping:So if we want to manipulate text data by serially applying operations, our function may look like this:
The function that does exactly what you want would then look like this:
(puthash k (cons (car it) (gethash k h)) h)
looks cryptic, but it simply means that under each key in hash-table there is a list, which you append to every time you find a new value. So if underb
there is(Diego)
and we find thatbruno-gil
should be underb
too, the value underb
becomes(bruno-gil Diego)
.前面的替代方案很有趣,但我不认为捕获了问题的“作为最近的转换者,我将如何在 Emacs 中执行此操作”方面。我怀疑学习 Emacs 并着眼于使用 Emacs Lisp 完成整个工作的人可能会从以下内容开始:
The previous alternatives are interesting but I do not believe capture the "how would I do this in Emacs as a recent convert" aspect of the question. I suspect someone learning Emacs with an eye to using Emacs Lisp to do the whole job might start out with something like:
这是我的第二次尝试。我写了一个小宏和一些函数来处理这些数据。
This is my second attempt. I wrote a little macro and some functions to deal with such data.