使用 elisp 处理文本

发布于 2024-08-16 19:37:19 字数 2557 浏览 2 评论 0原文

自从我转换到 Emacs 教会以来,我一直试图在它内部做所有事情,我想知道如何用它快速有效地进行一些文本处理。

作为一个例子,让我们以几分钟前我在 org-mode 上编辑的这个列表为例。

** Diego: b QI
** bruno-gil: b QI
** Koma: jo
** um: rsrs pr0n
** FelipeAugusto: esp
** GustavoPupo: pinto tr etc
** GP: lit gtk
** Alan: jo mil pc
** Jost: b hq jo 1997
** Herbert: b rsrs pr0n
** Andre: maia mil pseudo
** Rodrigo: c
** caue: b rsrs 7arte pseudo
** kenny: cri gif
** daniel: gtk mu pr0n rsrs b
** tony: an 1997 esp
** Vitor: b jo mimimi
** raphael: b rpg 7arte
** Luca: b lit gnu pc prog mmu 7arte 1997
** LZZ: an qt
** William: b an jo pc 1997
** Epic: gtk
** Aldo: b pseudo pol mil fur
** GustavoKyon: an gtk
** CarlosIsaksen : an hq jo 7arte gtk 1997
** Peter: pseudo pol mil est 1997 gtk lit lang
** leandro: b jo cb
** frederico: 7arte lit gtk
** rol: b an pseudo mimimi 7arte
** mathias: jo lit
** henrique: 1997 h gtk qt
** eumané: an qt
** walrus: cri de
** FilipePinheiro: lit pseudo
** Igor: pseudo b
** Erick: b jo rpg q 1997 gtk
** Gabriel: pr0n rsrs qt
** george: clo mimimi
** anão: hq jo 1997 rsrs clô b
** jeff: 7arte gtk
** davidatenas:  an 7arte 1997 esp qt
** HHahaah: b 
** Eduardo: b

它是与标签关联的名称列表,我想获取与名称关联的标签列表。

在 bash 中,我首先用单引号回显粘贴的整个内容,然后将其传输到 awk,循环遍历每一行并将其每个部分添加到正确的临时变量中,然后对其进行修改,直到它像我想要的那样。

echo '** Diego: b QI
** bruno-gil: b QI
** Koma: jo
** um: rsrs pr0n
** FelipeAugusto: esp
** GustavoPupo: pinto, tr etc
** GP: lit gtk
** Alan: jo mil pc
** Jost: b hq jo 1997
** Herbert: b rsrs pr0n
** Andre: maia mil pseudo
** Rodrigo: c
** caue: b rsrs 7arte pseudo
** kenny: cri gif
** daniel: gtk mu pr0n rsrs b
** tony: an 1997 esp
** Vitor: b jo mimimi
** raphael: b rpg 7arte
** Luca: b lit gnu pc prog mmu 7arte 1997
** LZZ: an qt
** William: b an jo pc 1997
** Epic: gtk
** Aldo: b pseudo pol mil fur
** GustavoKyon: an gtk
** CarlosIsaksen : an hq jo 7arte gtk 1997
** Peter: pseudo pol mil est 1997 gtk lit lang
** leandro: b jo cb
** frederico: 7arte lit gtk
** rol: b an pseudo mimimi 7arte
** mathias: jo lit
** henrique: 1997 h gtk qt
** eumané: an qt
** walrus: cri de
** FilipePinheiro: lit pseudo
** Igor: pseudo b
** Erick: b jo rpg q 1997 gtk
** Gabriel: pr0n rsrs qt
** george: clo mimimi
** anão: hq jo 1997 rsrs clô b
** jeff: 7arte gtk
** davidatenas:  an 7arte 1997 esp qt
** HHahaah: b
** Eduardo: b
' | awk '{sub(":","");for (i=3;i<=NF;i++) members[$i] = members[$i] " " $2}; END{for (j in members) print j ": " members[j]}' | sort

...还有TA-DA!不到 2 分钟的预期输出,以直观和增量的方式完成。你能告诉我如何在 elisp 中(最好是在 emacs 缓冲区中)优雅而简单地执行类似的操作吗?

谢谢!

Since I've converted to the Church of Emacs, I've been trying to do everything from inside it, and I was wondering how to do some text processing quickly and efficiently with it.

As an example, let's take this list that I was editing some minutes ago on org-mode.

** Diego: b QI
** bruno-gil: b QI
** Koma: jo
** um: rsrs pr0n
** FelipeAugusto: esp
** GustavoPupo: pinto tr etc
** GP: lit gtk
** Alan: jo mil pc
** Jost: b hq jo 1997
** Herbert: b rsrs pr0n
** Andre: maia mil pseudo
** Rodrigo: c
** caue: b rsrs 7arte pseudo
** kenny: cri gif
** daniel: gtk mu pr0n rsrs b
** tony: an 1997 esp
** Vitor: b jo mimimi
** raphael: b rpg 7arte
** Luca: b lit gnu pc prog mmu 7arte 1997
** LZZ: an qt
** William: b an jo pc 1997
** Epic: gtk
** Aldo: b pseudo pol mil fur
** GustavoKyon: an gtk
** CarlosIsaksen : an hq jo 7arte gtk 1997
** Peter: pseudo pol mil est 1997 gtk lit lang
** leandro: b jo cb
** frederico: 7arte lit gtk
** rol: b an pseudo mimimi 7arte
** mathias: jo lit
** henrique: 1997 h gtk qt
** eumané: an qt
** walrus: cri de
** FilipePinheiro: lit pseudo
** Igor: pseudo b
** Erick: b jo rpg q 1997 gtk
** Gabriel: pr0n rsrs qt
** george: clo mimimi
** anão: hq jo 1997 rsrs clô b
** jeff: 7arte gtk
** davidatenas:  an 7arte 1997 esp qt
** HHahaah: b 
** Eduardo: b

It is a list of names associated with tags, and I want to get a list of tags associated with names.

In bash, I would first echo with single quotes the whole thing pasted and then pipe it to awk, looping over each line and adding each its parts to the right temporary variable and then messing with it until it is like I want.

echo '** Diego: b QI
** bruno-gil: b QI
** Koma: jo
** um: rsrs pr0n
** FelipeAugusto: esp
** GustavoPupo: pinto, tr etc
** GP: lit gtk
** Alan: jo mil pc
** Jost: b hq jo 1997
** Herbert: b rsrs pr0n
** Andre: maia mil pseudo
** Rodrigo: c
** caue: b rsrs 7arte pseudo
** kenny: cri gif
** daniel: gtk mu pr0n rsrs b
** tony: an 1997 esp
** Vitor: b jo mimimi
** raphael: b rpg 7arte
** Luca: b lit gnu pc prog mmu 7arte 1997
** LZZ: an qt
** William: b an jo pc 1997
** Epic: gtk
** Aldo: b pseudo pol mil fur
** GustavoKyon: an gtk
** CarlosIsaksen : an hq jo 7arte gtk 1997
** Peter: pseudo pol mil est 1997 gtk lit lang
** leandro: b jo cb
** frederico: 7arte lit gtk
** rol: b an pseudo mimimi 7arte
** mathias: jo lit
** henrique: 1997 h gtk qt
** eumané: an qt
** walrus: cri de
** FilipePinheiro: lit pseudo
** Igor: pseudo b
** Erick: b jo rpg q 1997 gtk
** Gabriel: pr0n rsrs qt
** george: clo mimimi
** anão: hq jo 1997 rsrs clô b
** jeff: 7arte gtk
** davidatenas:  an 7arte 1997 esp qt
** HHahaah: b
** Eduardo: b
' | awk '{sub(":","");for (i=3;i<=NF;i++) members[$i] = members[$i] " " $2}; END{for (j in members) print j ": " members[j]}' | sort

... and TA-DA! The expected output in less than 2 minutes, done in an intuitive and incremental way. Can you show me how to do something like this in elisp, preferably in an emacs buffer, with elegance and simplicity?

Thanks!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

少女情怀诗 2024-08-23 19:37:19

我要做的第一件事就是利用 org-mode 的标签支持。而不是

** Diego: b QI

You 会使用

** Diego                          :b:QI:

Which org-mode 识别为标签“b”和“QI”。

要将当前格式转换为标准 org-mode 格式,您可以使用
以下内容(假设带有源的缓冲区称为“asdf”)

(with-current-buffer "asdf"
  (beginning-of-buffer)
  (replace-string " " ":")
  (beginning-of-buffer)
  (replace-string "**:" "** ")
  (beginning-of-buffer)
  (replace-string "::" " :")
  (beginning-of-buffer)
  (replace-string "\n" ":\n")
  (org-set-tags-command t t))

它并不漂亮或高效,但它可以完成工作。

之后,您可以使用以下命令生成具有以下格式的缓冲区
您想要从 shell 脚本中获取结果:

(let ((results (get-buffer-create "results"))
      tags)
  (with-current-buffer "asdf"
    (beginning-of-buffer)
    (while (org-on-heading-p)
      (mapc '(lambda (item) (when item (add-to-list 'tags item))) (org-get-local-tags))
      (outline-next-visible-heading 1)))
  (setq tags (sort tags 'string<))
  (with-current-buffer results
    (erase-buffer)
    (mapc '(lambda (item)
             (insert (format "%s: %s\n"
                             item
                             (with-current-buffer "asdf"
                               (org-map-entries '(substring-no-properties (org-get-heading t)) item)))))
          tags)
    (beginning-of-buffer)
    (replace-regexp "[()]" "")))

这会将结果放入名为“结果”的缓冲区中,如果没有,则创建它
已经存在。基本上,它正在收集缓冲区“asdf”中的所有标签,
对它们进行排序,然后循环遍历每个标签并搜索每个标题
该标签位于“asdf”中并将其插入到“结果”中。

经过一些清理,这可以变成一个函数;基本上只是
用参数替换“asdf”和“结果”。如果你需要这样做,我可以做
那。

The first thing I would do is to take advantage of org-mode's tag support. Instead of

** Diego: b QI

You would have

** Diego                          :b:QI:

Which org-mode recognizes as the tags "b" and "QI".

To transform your current format to the standard org-mode format, you can use
the following (assuming the buffer with your source is called "asdf")

(with-current-buffer "asdf"
  (beginning-of-buffer)
  (replace-string " " ":")
  (beginning-of-buffer)
  (replace-string "**:" "** ")
  (beginning-of-buffer)
  (replace-string "::" " :")
  (beginning-of-buffer)
  (replace-string "\n" ":\n")
  (org-set-tags-command t t))

It's not pretty or efficient, but it gets the job done.

After that, you can use the following to produce a buffer that has the format
you wanted from the shell script:

(let ((results (get-buffer-create "results"))
      tags)
  (with-current-buffer "asdf"
    (beginning-of-buffer)
    (while (org-on-heading-p)
      (mapc '(lambda (item) (when item (add-to-list 'tags item))) (org-get-local-tags))
      (outline-next-visible-heading 1)))
  (setq tags (sort tags 'string<))
  (with-current-buffer results
    (erase-buffer)
    (mapc '(lambda (item)
             (insert (format "%s: %s\n"
                             item
                             (with-current-buffer "asdf"
                               (org-map-entries '(substring-no-properties (org-get-heading t)) item)))))
          tags)
    (beginning-of-buffer)
    (replace-regexp "[()]" "")))

This puts the results in a buffer called "results", creating it if it doesn't
already exist. Basically, it is collecting all the tags in the buffer "asdf",
sorting them, then looping through each tag and searching for each headline with
that tag in "asdf" and inserting it to "results".

With a bit of cleaning up, this could be made into a function; basically just
replacing "asdf" and "results" with arguments. If you need that done, I can do
that.

旧人哭 2024-08-23 19:37:19

有一个函数 shell-command-on-region 几乎可以完成它所说的功能。您可以突出显示一个区域,执行 M-|,键入 shell 命令的名称,然后数据就会通过管道传输到该命令。给它一个参数,该区域就会被命令的结果替换。

对于一个简单的示例,突出显示一个区域,输入 'Cu 0 M-| wc'(control-u、零、元管道,然后是'wc'),该区域将被替换为该区域的字符数、单词数和行数。

您可以做的另一件事是弄清楚如何操作一行,将其设为宏,然后重复运行该宏。例如,“Cx ( Cs foo Cg bar Cx )”将搜索单词“foo”,然后键入单词“bar”,将其更改为“foobar”。然后,您可以执行一次“Cu Cx e”,这将持续运行宏,直到找不到更多“foo”的出现。

There is a function shell-command-on-region that pretty much does what it says. You can highlight a region, do M-|, type the name of a shell command, and the data is piped to that command. Give it an argument and the region is replaced with the result of the command.

For a trivial example, highlight a region, type 'C-u 0 M-| wc' (control-u, zero, meta-pipe and then 'wc') and the region will be replaced with the number of characters, words and lines of that region.

Another thing you can do is figure out how to manipulate one line, make it a macro, and then run the macro repeatedly. For example, 'C-x ( C-s foo C-g bar C-x )' will search for the word "foo", then type the word "bar", changing it to "foobar". You can then do 'C-u C-x e' once which will continually run the macro until it doesn't find any more occurrences of "foo".

溇涏 2024-08-23 19:37:19

好的,这是我在 elisp 中的第一次尝试:

  1. 我使用 elisp 和 paredit 模式启动一个缓冲区,打开双引号并粘贴文本
  2. 我使用 let 将其绑定到符号
(let ((foobar "** Diego: b QI
** bruno-gil: b QI
** Koma: jo
** um: rsrs pr0n
** FelipeAugusto: esp
** GustavoPupo: pinto, tr etc
** GP: lit gtk
** Alan: jo mil pc
** Jost: b hq jo 1997
** Herbert: b rsrs pr0n
** Andre: maia mil pseudo
** Rodrigo: c
** caue: b rsrs 7arte pseudo
** kenny: cri gif
** daniel: gtk mu pr0n rsrs b
** tony: an 1997 esp
** Vitor: b jo mimimi
** raphael: b rpg 7arte
** Luca: b lit gnu pc prog mmu 7arte 1997
** LZZ: an qt
** William: b an jo pc 1997
** Epic: gtk
** Aldo: b pseudo pol mil fur
** GustavoKyon: an gtk
** CarlosIsaksen : an hq jo 7arte gtk 1997
** Peter: pseudo pol mil est 1997 gtk lit lang
** leandro: b jo cb
** frederico: 7arte lit gtk
** rol: b an pseudo mimimi 7arte
** mathias: jo lit
** henrique: 1997 h gtk qt
** eumané: an qt
** walrus: cri de
** FilipePinheiro: lit pseudo
** Igor: pseudo b
** Erick: b jo rpg q 1997 gtk
** Gabriel: pr0n rsrs qt
** george: clo mimimi
** anão: hq jo 1997 rsrs clô b
** jeff: 7arte gtk
** davidatenas:  an 7arte 1997 esp qt
** HHahaah: b 
** Eduardo: b 
"))
  foobar)

现在我将 foobar 更改为一些奇特的东西。

  1. 首先,我使用正则表达式删除符号,并使用 (split-string) 将文本拆分为字符串
  2. 然后我使用 mapcar 将每一行转换为单词列表
(mapcar #'(lambda (y) (split-string y " " t)) (split-string (replace-regexp-in-string "[:\*]" "" foobar) "\n" t))
  1. 然后我创建一个 hashmap 并将其绑定到temphash ((temphash (make-hash-table :test 'equal)))
  2. 然后我循环到嵌套列表中以将元素添加到哈希表中。我认为我不应该使用 mapcar 进行非函数式编程,但没有人在看;)
(mapcar #'(lambda (l)
              (mapcar #'(lambda (m) (puthash m (format "%s %s" (car l) (let ((tempel (gethash m temphash)))
                                                            (if tempel tempel ""))) temphash)) (rest l)))
          (mapcar #'(lambda (y) (split-string y " " t)) (split-string (replace-regexp-in-string "[:\*]" "" foobar) "\n" t))) 
  1. 最后,我使用从 Xah Lee 的网页窃取的便捷函数将哈希表中的元素提取到另一组嵌套列表中,
  2. 最后我用 Mx pp-eval-last-sexp 将它打印到另一个缓冲区,

这有点令人费解,特别是双映射车,但它有点有效。这是完整的“代码”:

;; Stolen from Xah Lee's page


(defun hash-to-list (hashtable)
  "Return a list that represent the hashtable."
  (let (mylist)
    (maphash (lambda (kk vv) (setq mylist (cons (list kk vv) mylist))) hashtable)
    mylist
  )
)

;; Code

(let ((foobar "** Diego: b QI
** bruno-gil: b QI
** Koma: jo
** um: rsrs pr0n
** FelipeAugusto: esp
** GustavoPupo: pinto, tr etc
** GP: lit gtk
** Alan: jo mil pc
** Jost: b hq jo 1997
** Herbert: b rsrs pr0n
** Andre: maia mil pseudo
** Rodrigo: c
** caue: b rsrs 7arte pseudo
** kenny: cri gif
** daniel: gtk mu pr0n rsrs b
** tony: an 1997 esp
** Vitor: b jo mimimi
** raphael: b rpg 7arte
** Luca: b lit gnu pc prog mmu 7arte 1997
** LZZ: an qt
** William: b an jo pc 1997
** Epic: gtk
** Aldo: b pseudo pol mil fur
** GustavoKyon: an gtk
** CarlosIsaksen : an hq jo 7arte gtk 1997
** Peter: pseudo pol mil est 1997 gtk lit lang
** leandro: b jo cb
** frederico: 7arte lit gtk
** rol: b an pseudo mimimi 7arte
** mathias: jo lit
** henrique: 1997 h gtk qt
** eumané: an qt
** walrus: cri de
** FilipePinheiro: lit pseudo
** Igor: pseudo b
** Erick: b jo rpg q 1997 gtk
** Gabriel: pr0n rsrs qt
** george: clo mimimi
** anão: hq jo 1997 rsrs clô b
** jeff: 7arte gtk
** davidatenas:  an 7arte 1997 esp qt
** HHahaah: b 
** Eduardo: b 
")
      (temphash  (make-hash-table :test 'equal)))
  (mapcar #'(lambda (l)
              (mapcar #'(lambda (m) (puthash m (format "%s %s" (car l) (let ((tempel (gethash m temphash)))
                                                            (if tempel tempel ""))) temphash)) (rest l)))
          (mapcar #'(lambda (y) (split-string y " " t)) (split-string (replace-regexp-in-string "[:\*]" "" foobar) "\n" t)))
  (hash-to-list temphash)) 

这是输出:

(("clô" "anão ")
 ("clo" "george ")
 ("q" "Erick ")
 ("de" "walrus ")
 ("h" "henrique ")
 ("cb" "leandro ")
 ("lang" "Peter ")
 ("est" "Peter ")
 ("fur" "Aldo ")
 ("pol" "Peter Aldo ")
 ("qt" "davidatenas Gabriel eumané henrique LZZ ")
 ("mmu" "Luca ")
 ("prog" "Luca ")
 ("gnu" "Luca ")
 ("rpg" "Erick raphael ")
 ("mimimi" "george rol Vitor ")
 ("an" "davidatenas eumané rol CarlosIsaksen GustavoKyon William LZZ tony ")
 ("mu" "daniel ")
 ("gif" "kenny ")
 ("cri" "walrus kenny ")
 ("7arte" "davidatenas jeff rol frederico CarlosIsaksen Luca raphael caue ")
 ("c" "Rodrigo ")
 ("pseudo" "Igor FilipePinheiro rol Peter Aldo caue Andre ")
 ("maia" "Andre ")
 ("1997" "davidatenas anão Erick henrique Peter CarlosIsaksen William Luca tony Jost ")
 ("hq" "anão CarlosIsaksen Jost ")
 ("pc" "William Luca Alan ")
 ("mil" "Peter Aldo Andre Alan ")
 ("gtk" "jeff Erick henrique frederico Peter CarlosIsaksen GustavoKyon Epic daniel GP ")
 ("lit" "FilipePinheiro mathias frederico Peter Luca GP ")
 ("etc" "GustavoPupo ")
 ("tr" "GustavoPupo ")
 ("pinto," "GustavoPupo ")
 ("esp" "davidatenas tony FelipeAugusto ")
 ("pr0n" "Gabriel daniel Herbert um ")
 ("rsrs" "anão Gabriel daniel caue Herbert um ")
 ("jo" "anão Erick mathias leandro CarlosIsaksen William Vitor Jost Alan Koma ")
 ("QI" "bruno-gil Diego ")
 ("b" "Eduardo HHahaah anão Erick Igor rol leandro Aldo William Luca raphael Vitor daniel caue Herbert Jost bruno-gil Diego "))

Ok, here is my first attempt in elisp:

  1. I start a buffer with elisp and paredit modes on, open double quotes and paste the text
  2. I bind it to a symbol using let
(let ((foobar "** Diego: b QI
** bruno-gil: b QI
** Koma: jo
** um: rsrs pr0n
** FelipeAugusto: esp
** GustavoPupo: pinto, tr etc
** GP: lit gtk
** Alan: jo mil pc
** Jost: b hq jo 1997
** Herbert: b rsrs pr0n
** Andre: maia mil pseudo
** Rodrigo: c
** caue: b rsrs 7arte pseudo
** kenny: cri gif
** daniel: gtk mu pr0n rsrs b
** tony: an 1997 esp
** Vitor: b jo mimimi
** raphael: b rpg 7arte
** Luca: b lit gnu pc prog mmu 7arte 1997
** LZZ: an qt
** William: b an jo pc 1997
** Epic: gtk
** Aldo: b pseudo pol mil fur
** GustavoKyon: an gtk
** CarlosIsaksen : an hq jo 7arte gtk 1997
** Peter: pseudo pol mil est 1997 gtk lit lang
** leandro: b jo cb
** frederico: 7arte lit gtk
** rol: b an pseudo mimimi 7arte
** mathias: jo lit
** henrique: 1997 h gtk qt
** eumané: an qt
** walrus: cri de
** FilipePinheiro: lit pseudo
** Igor: pseudo b
** Erick: b jo rpg q 1997 gtk
** Gabriel: pr0n rsrs qt
** george: clo mimimi
** anão: hq jo 1997 rsrs clô b
** jeff: 7arte gtk
** davidatenas:  an 7arte 1997 esp qt
** HHahaah: b 
** Eduardo: b 
"))
  foobar)

Now I change foobar to something fancy.

  1. First I remove the symbols with a regexp and split the text in strings using (split-string)
  2. Then I do a mapcar to turn each line into a list of words
(mapcar #'(lambda (y) (split-string y " " t)) (split-string (replace-regexp-in-string "[:\*]" "" foobar) "\n" t))
  1. Then I create a hashmap and bind it to temphash ((temphash (make-hash-table :test 'equal)))
  2. And then I loop into the nested lists to add the elements to the hash-table. I think I'm not supposed to do non-functional programming with mapcar, but nobody is looking ;)
(mapcar #'(lambda (l)
              (mapcar #'(lambda (m) (puthash m (format "%s %s" (car l) (let ((tempel (gethash m temphash)))
                                                            (if tempel tempel ""))) temphash)) (rest l)))
          (mapcar #'(lambda (y) (split-string y " " t)) (split-string (replace-regexp-in-string "[:\*]" "" foobar) "\n" t))) 
  1. Finally, I extract the elements from the hash table into another set of nested lists with a handy function stolen from Xah Lee's webpage,
  2. And finally I pretty print it to another buffer with M-x pp-eval-last-sexp

It's a little mind-bending, specially the double mapcar, but it sorta works. Here is the full "code":

;; Stolen from Xah Lee's page


(defun hash-to-list (hashtable)
  "Return a list that represent the hashtable."
  (let (mylist)
    (maphash (lambda (kk vv) (setq mylist (cons (list kk vv) mylist))) hashtable)
    mylist
  )
)

;; Code

(let ((foobar "** Diego: b QI
** bruno-gil: b QI
** Koma: jo
** um: rsrs pr0n
** FelipeAugusto: esp
** GustavoPupo: pinto, tr etc
** GP: lit gtk
** Alan: jo mil pc
** Jost: b hq jo 1997
** Herbert: b rsrs pr0n
** Andre: maia mil pseudo
** Rodrigo: c
** caue: b rsrs 7arte pseudo
** kenny: cri gif
** daniel: gtk mu pr0n rsrs b
** tony: an 1997 esp
** Vitor: b jo mimimi
** raphael: b rpg 7arte
** Luca: b lit gnu pc prog mmu 7arte 1997
** LZZ: an qt
** William: b an jo pc 1997
** Epic: gtk
** Aldo: b pseudo pol mil fur
** GustavoKyon: an gtk
** CarlosIsaksen : an hq jo 7arte gtk 1997
** Peter: pseudo pol mil est 1997 gtk lit lang
** leandro: b jo cb
** frederico: 7arte lit gtk
** rol: b an pseudo mimimi 7arte
** mathias: jo lit
** henrique: 1997 h gtk qt
** eumané: an qt
** walrus: cri de
** FilipePinheiro: lit pseudo
** Igor: pseudo b
** Erick: b jo rpg q 1997 gtk
** Gabriel: pr0n rsrs qt
** george: clo mimimi
** anão: hq jo 1997 rsrs clô b
** jeff: 7arte gtk
** davidatenas:  an 7arte 1997 esp qt
** HHahaah: b 
** Eduardo: b 
")
      (temphash  (make-hash-table :test 'equal)))
  (mapcar #'(lambda (l)
              (mapcar #'(lambda (m) (puthash m (format "%s %s" (car l) (let ((tempel (gethash m temphash)))
                                                            (if tempel tempel ""))) temphash)) (rest l)))
          (mapcar #'(lambda (y) (split-string y " " t)) (split-string (replace-regexp-in-string "[:\*]" "" foobar) "\n" t)))
  (hash-to-list temphash)) 

And here is the output:

(("clô" "anão ")
 ("clo" "george ")
 ("q" "Erick ")
 ("de" "walrus ")
 ("h" "henrique ")
 ("cb" "leandro ")
 ("lang" "Peter ")
 ("est" "Peter ")
 ("fur" "Aldo ")
 ("pol" "Peter Aldo ")
 ("qt" "davidatenas Gabriel eumané henrique LZZ ")
 ("mmu" "Luca ")
 ("prog" "Luca ")
 ("gnu" "Luca ")
 ("rpg" "Erick raphael ")
 ("mimimi" "george rol Vitor ")
 ("an" "davidatenas eumané rol CarlosIsaksen GustavoKyon William LZZ tony ")
 ("mu" "daniel ")
 ("gif" "kenny ")
 ("cri" "walrus kenny ")
 ("7arte" "davidatenas jeff rol frederico CarlosIsaksen Luca raphael caue ")
 ("c" "Rodrigo ")
 ("pseudo" "Igor FilipePinheiro rol Peter Aldo caue Andre ")
 ("maia" "Andre ")
 ("1997" "davidatenas anão Erick henrique Peter CarlosIsaksen William Luca tony Jost ")
 ("hq" "anão CarlosIsaksen Jost ")
 ("pc" "William Luca Alan ")
 ("mil" "Peter Aldo Andre Alan ")
 ("gtk" "jeff Erick henrique frederico Peter CarlosIsaksen GustavoKyon Epic daniel GP ")
 ("lit" "FilipePinheiro mathias frederico Peter Luca GP ")
 ("etc" "GustavoPupo ")
 ("tr" "GustavoPupo ")
 ("pinto," "GustavoPupo ")
 ("esp" "davidatenas tony FelipeAugusto ")
 ("pr0n" "Gabriel daniel Herbert um ")
 ("rsrs" "anão Gabriel daniel caue Herbert um ")
 ("jo" "anão Erick mathias leandro CarlosIsaksen William Vitor Jost Alan Koma ")
 ("QI" "bruno-gil Diego ")
 ("b" "Eduardo HHahaah anão Erick Igor rol leandro Aldo William Luca raphael Vitor daniel caue Herbert Jost bruno-gil Diego "))
鸠魁 2024-08-23 19:37:19

如果您了解 *nix 管道,那么您就熟悉 函数式编程,因为函数式编程将程序视为使用函数应用程序对数据进行连续转换。还记得学校数学中的函数组合吗?基本上,g ∘ f 意味着您首先应用 f,然后立即应用 g(g ∘ f)(x) = g(f(x))。。函数式程序是一个巨大的函数组合。而管道只是一个函数组合,只是方向相反:(g ∘数学中的 f)(x) 与 x | 相同f | g 在命令行中。

有一个第三方库 dash.el 提供了多种用于列表和树转换的函数以及简化函数方法的函数和宏。其中之一是线程宏 ->> ,它模仿命令行管道:

(->> '(1 2 3) (-map '1+) (-reduce '+)) ; returns 9
;; equivalent to (-reduce '+ (-map '1+ '(1 2 3)))

因此,如果我们想通过串行应用操作来操纵文本数据,我们的函数可能如下所示:

(defun key-value-swap (s)
  (->> s
       nil ; Split into lines
       nil ; Remove stars from each line
       nil ; Split each line
       nil ; Add 1st element as a value to each element starting from
           ; 2nd as keys
       nil ; Return a hash-table
       ))

完全执行您想要的操作的函数将如下所示:

(defun key-value-swap (s)
  (let ((h (make-hash-table :test 'equal)))
    (->> s
         s-lines ; split into lines
         (--map (s-split "\\(\\s-\\|:\\)" ; split each line
                         (s-chop-prefix "** " it) ; throw away stars
                         t))
         (--map (-each (cdr it) ; for every field in the line, except 1st
                  (lambda (k) ; append 1st line to value under key
                    (puthash k (cons (car it) (gethash k h)) h)))))
    h)) ; return hash-table

(puthash k (cons (car it) (gethash kh)) h) 看起来很神秘,但它只是意味着哈希表中的每个键下都有一个列表,每次找到新值时都会将其附加到该列表中。因此,如果在 b 下有 (Diego) 并且我们发现 bruno-gil 也应该在 b 下, b 下的值变为 (bruno-gil Diego)

If you know *nix pipes, than you are familiar with functional programming, because functional programming treats programs as successive transformation of data using application of functions. Remember function composition from school math? Basically, g ∘ f means that you first apply f and then immediately apply g: (g ∘ f)(x) = g(f(x)). A functional program is a one giant function composition. And a pipe is just a function composition, just with an opposite direction: (g ∘ f)(x) in math is the same as x | f | g in command line.

There is a third-party library dash.el that provides multitude of functions for list and tree transformations and also functions and macros that ease functional approach. One of them is a threading macro ->>, which mimics command line piping:

(->> '(1 2 3) (-map '1+) (-reduce '+)) ; returns 9
;; equivalent to (-reduce '+ (-map '1+ '(1 2 3)))

So if we want to manipulate text data by serially applying operations, our function may look like this:

(defun key-value-swap (s)
  (->> s
       nil ; Split into lines
       nil ; Remove stars from each line
       nil ; Split each line
       nil ; Add 1st element as a value to each element starting from
           ; 2nd as keys
       nil ; Return a hash-table
       ))

The function that does exactly what you want would then look like this:

(defun key-value-swap (s)
  (let ((h (make-hash-table :test 'equal)))
    (->> s
         s-lines ; split into lines
         (--map (s-split "\\(\\s-\\|:\\)" ; split each line
                         (s-chop-prefix "** " it) ; throw away stars
                         t))
         (--map (-each (cdr it) ; for every field in the line, except 1st
                  (lambda (k) ; append 1st line to value under key
                    (puthash k (cons (car it) (gethash k h)) h)))))
    h)) ; return hash-table

(puthash k (cons (car it) (gethash k h)) h) looks cryptic, but it simply means that under each key in hash-table there is a list, which you append to every time you find a new value. So if under b there is (Diego) and we find that bruno-gil should be under b too, the value under b becomes (bruno-gil Diego).

风苍溪 2024-08-23 19:37:19

前面的替代方案很有趣,但我不认为捕获了问题的“作为最近的转换者,我将如何在 Emacs 中执行此操作”方面。我怀疑学习 Emacs 并着眼于使用 Emacs Lisp 完成整个工作的人可能会从以下内容开始:

(defun create-tags-to-name (buffer-name)
  "Create a buffer filled with lines containg `** TAG:
LIST-OF-NAMES' by transposing lines in the region matching the
format `** NAME: LIST-OF-TAGS' where the list items are white
space separated."
  (interactive)
  (let ((buf (get-buffer-create buffer-name))
    (tag-to-name-list (list))
    name tags element)
    ;; Clear the destination buffer
    (with-current-buffer buf
      (erase-buffer))
    ;; Build the list of tag to name associations.
    (while (re-search-forward "^** \\([-a-zA-Z0-9 ]+\\):\\(.+\\)$" (point-max) t)
      (setq name (buffer-substring (match-beginning 1) (match-end 1))
        tags (split-string (buffer-substring (match-beginning 2) (match-end 2))))
      ;; For each tag add the name to the tag's name list
      (while tags
    (let ((tag (car tags)))
      (setq element (assoc tag tag-to-name-list)
        tags (cdr tags))
      (if element
          (setcdr element (append (list name) (cdr element)))
        (setq tag-to-name-list (append (list (cons tag (list name))) tag-to-name-list))))))
    ;; Dump the associations to the target buffer
    (with-current-buffer buf
      (while tag-to-name-list
    (setq element (car tag-to-name-list)
          tag-to-name-list (cdr tag-to-name-list))
    (insert (concat "** " (car element) ":"))
    (let ((tag-list (cdr element)))
      (while tag-list
        (insert " " (car tag-list))
        (setq tag-list (cdr tag-list))))
    (insert "\n")))))

The previous alternatives are interesting but I do not believe capture the "how would I do this in Emacs as a recent convert" aspect of the question. I suspect someone learning Emacs with an eye to using Emacs Lisp to do the whole job might start out with something like:

(defun create-tags-to-name (buffer-name)
  "Create a buffer filled with lines containg `** TAG:
LIST-OF-NAMES' by transposing lines in the region matching the
format `** NAME: LIST-OF-TAGS' where the list items are white
space separated."
  (interactive)
  (let ((buf (get-buffer-create buffer-name))
    (tag-to-name-list (list))
    name tags element)
    ;; Clear the destination buffer
    (with-current-buffer buf
      (erase-buffer))
    ;; Build the list of tag to name associations.
    (while (re-search-forward "^** \\([-a-zA-Z0-9 ]+\\):\\(.+\\)$" (point-max) t)
      (setq name (buffer-substring (match-beginning 1) (match-end 1))
        tags (split-string (buffer-substring (match-beginning 2) (match-end 2))))
      ;; For each tag add the name to the tag's name list
      (while tags
    (let ((tag (car tags)))
      (setq element (assoc tag tag-to-name-list)
        tags (cdr tags))
      (if element
          (setcdr element (append (list name) (cdr element)))
        (setq tag-to-name-list (append (list (cons tag (list name))) tag-to-name-list))))))
    ;; Dump the associations to the target buffer
    (with-current-buffer buf
      (while tag-to-name-list
    (setq element (car tag-to-name-list)
          tag-to-name-list (cdr tag-to-name-list))
    (insert (concat "** " (car element) ":"))
    (let ((tag-list (cdr element)))
      (while tag-list
        (insert " " (car tag-list))
        (setq tag-list (cdr tag-list))))
    (insert "\n")))))
沉溺在你眼里的海 2024-08-23 19:37:19

这是我的第二次尝试。我写了一个小宏和一些函数来处理这些数据。

(defun better-numberp (s)
  (string-match "^ *[0-9.,]* *$" s))

(defmacro awk-like (&rest args)
  (let ((arg (car (last args)))
        (calls (mapcar #'(lambda (l)
                           (cond
                            ((numberp (first l)) (cons `(lambda (f) (equal %r ,(first l))) (rest l)))
                            ((stringp (first l)) (cons `(lambda (f) (string-match ,(first l) %)) (rest l)))
                            (t l)))
                       (butlast args))))
    `(mapcar #'(lambda (%%)
                 (let ((%r 0))
                   (mapcar
                    #'(lambda (l)
                        (setq %r (1+ %r))
                        (let ((% l))
                          (dolist (tipo ',calls)
                            (progn
                              (setq % (cond
                                       ((funcall (first tipo) %) (eval (cadr tipo))) (t %)))
                              (set (intern (format "%%%d" %r)) %))) %)) %%)))
             (mapcar #'(lambda (y) (split-string y " " t))
                     (split-string ,arg "\n" t)))))

(defun hash-to-list (hashtable)
  "Return a list that represent the hashtable."
  (let (mylist)
    (maphash (lambda (kk vv) (setq mylist (cons (list kk vv) mylist))) hashtable)
    mylist
    )
  )

(defun append-hash (key value hashtable)
  (let ((current (gethash key hashtable)))
    (puthash key 
             (cond
              ((null current) (list value))
              ((listp current) (cons value current))
              (t current)) 
             hashtable)))

(let ((foohash (make-hash-table :test 'equal)))
  (awk-like
   (2 (replace-regexp-in-string ":" "" %))
   ((lambda (f) (> %r 2))  (append-hash % %2 foohash))
   "** Diego: b QI
** bruno-gil: b QI
** Koma: jo
** um: rsrs pr0n
** FelipeAugusto: esp
** GustavoPupo: pinto tr etc
** GP: lit gtk
** Alan: jo mil pc
** Jost: b hq jo 1997
** Herbert: b rsrs pr0n
** Andre: maia mil pseudo
** Rodrigo: c
** caue: b rsrs 7arte pseudo
** kenny: cri gif
** daniel: gtk mu pr0n rsrs b
** tony: an 1997 esp
** Vitor: b jo mimimi
** raphael: b rpg 7arte
** Luca: b lit gnu pc prog mmu 7arte 1997
** LZZ: an qt
** William: b an jo pc 1997
** Epic: gtk
** Aldo: b pseudo pol mil fur
** GustavoKyon: an gtk
** CarlosIsaksen: an hq jo 7arte gtk 1997
** Peter: pseudo pol mil est 1997 gtk lit lang
** leandro: b jo cb
** frederico: 7arte lit gtk
** rol: b an pseudo mimimi 7arte
** mathias: jo lit
** henrique: 1997 h gtk qt
** eumané: an qt
** walrus: cri de
** FilipePinheiro: lit pseudo
** Igor: pseudo b
** Erick: b jo rpg q 1997 gtk
** Gabriel: pr0n rsrs qt
** george: clo mimimi
** anão: hq jo 1997 rsrs clô b
** jeff: 7arte gtk
** davidatenas:  an 7arte 1997 esp qt
** HHahaah: b
** Eduardo: b
")
  (hash-to-list foohash))

This is my second attempt. I wrote a little macro and some functions to deal with such data.

(defun better-numberp (s)
  (string-match "^ *[0-9.,]* *$" s))

(defmacro awk-like (&rest args)
  (let ((arg (car (last args)))
        (calls (mapcar #'(lambda (l)
                           (cond
                            ((numberp (first l)) (cons `(lambda (f) (equal %r ,(first l))) (rest l)))
                            ((stringp (first l)) (cons `(lambda (f) (string-match ,(first l) %)) (rest l)))
                            (t l)))
                       (butlast args))))
    `(mapcar #'(lambda (%%)
                 (let ((%r 0))
                   (mapcar
                    #'(lambda (l)
                        (setq %r (1+ %r))
                        (let ((% l))
                          (dolist (tipo ',calls)
                            (progn
                              (setq % (cond
                                       ((funcall (first tipo) %) (eval (cadr tipo))) (t %)))
                              (set (intern (format "%%%d" %r)) %))) %)) %%)))
             (mapcar #'(lambda (y) (split-string y " " t))
                     (split-string ,arg "\n" t)))))

(defun hash-to-list (hashtable)
  "Return a list that represent the hashtable."
  (let (mylist)
    (maphash (lambda (kk vv) (setq mylist (cons (list kk vv) mylist))) hashtable)
    mylist
    )
  )

(defun append-hash (key value hashtable)
  (let ((current (gethash key hashtable)))
    (puthash key 
             (cond
              ((null current) (list value))
              ((listp current) (cons value current))
              (t current)) 
             hashtable)))

(let ((foohash (make-hash-table :test 'equal)))
  (awk-like
   (2 (replace-regexp-in-string ":" "" %))
   ((lambda (f) (> %r 2))  (append-hash % %2 foohash))
   "** Diego: b QI
** bruno-gil: b QI
** Koma: jo
** um: rsrs pr0n
** FelipeAugusto: esp
** GustavoPupo: pinto tr etc
** GP: lit gtk
** Alan: jo mil pc
** Jost: b hq jo 1997
** Herbert: b rsrs pr0n
** Andre: maia mil pseudo
** Rodrigo: c
** caue: b rsrs 7arte pseudo
** kenny: cri gif
** daniel: gtk mu pr0n rsrs b
** tony: an 1997 esp
** Vitor: b jo mimimi
** raphael: b rpg 7arte
** Luca: b lit gnu pc prog mmu 7arte 1997
** LZZ: an qt
** William: b an jo pc 1997
** Epic: gtk
** Aldo: b pseudo pol mil fur
** GustavoKyon: an gtk
** CarlosIsaksen: an hq jo 7arte gtk 1997
** Peter: pseudo pol mil est 1997 gtk lit lang
** leandro: b jo cb
** frederico: 7arte lit gtk
** rol: b an pseudo mimimi 7arte
** mathias: jo lit
** henrique: 1997 h gtk qt
** eumané: an qt
** walrus: cri de
** FilipePinheiro: lit pseudo
** Igor: pseudo b
** Erick: b jo rpg q 1997 gtk
** Gabriel: pr0n rsrs qt
** george: clo mimimi
** anão: hq jo 1997 rsrs clô b
** jeff: 7arte gtk
** davidatenas:  an 7arte 1997 esp qt
** HHahaah: b
** Eduardo: b
")
  (hash-to-list foohash))
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文