如何在 Emacs Lisp 中提取 XML 处理指令?

发布于 2025-01-10 22:30:38 字数 2575 浏览 0 评论 0原文

我想从 XML 文件中提取处理指令(特别是 xml-model);然而,(n)xml-parse-file 和 libxml-parse-xml-region 都无法识别处理指令。

有没有一种干净的方法来提取处理指令,或者我是否必须通过正则表达式搜索 PI?

编辑:这是我正在寻找的功能的初稿:

(cl-defun extract-processing-instructions (&rest processing-instructions)
  "Extracts all/only the specified xml processing instructions from the current buffer and returns them as a list of string."
  (interactive)
  (let ((pi-re
     (format "<\\?\\(%s\\).*\\?>" (string-join processing-instructions "\\|")))
    (result))
    (save-excursion
      (goto-char (point-min))
      (while (re-search-forward pi-re nil t)
    (push (match-string 0) result)))
    (nreverse result)))

(cl-defun pi-str2sexp (pi-str)
  "Takes a processing instruction as a string and transforms it to a sexp-structure (in the style of xml-parse-*)."
  (let (sexp attr-alist)
    (save-match-data
      ;; get and push pi-element-name
      ;; (string-match "<\\?\\([[:alnum:]-]*\\)" pi-str)
      (string-match "<\\?\\([[:alnum:]-]*\\)" pi-str)
      (push (make-symbol (match-string 1 pi-str)) sexp)
      ;; construct attribute alist
      (while (string-match "\\([[:alnum:]-]*\\)=\"\\([^ ]*\\)\""
               pi-str (match-end 0))
    (push (cons (make-symbol (match-string 1 pi-str))
            (match-string 2 pi-str))
          attr-alist)))
    ;; finally: push attr alist and return sexp
    (push (nreverse attr-alist) sexp)
    (nreverse sexp)))

编辑2:事实证明,在这件事上建议/通常基于 xml-parse-* 构建(就像 @Tom Regner 所建议的那样)是一个巨大的痛苦。 :(

我想出的是一个上下文管理器,想法是用它来围绕建议 string-parse-tag-1 (这是 xml-parse-* 的核心(当然独立使用是也是一个选项):

(cl-defmacro --replace-first-group (regex-replace-alist)
  `(save-excursion
     (dolist (expression ,regex-replace-alist)
       (goto-char (point-min))
       (replace-regexp (car expression) (cadr expression)))))

(cl-defmacro with-parsable-pi (buffer &body body)
  "Context manager that treats xml processing instructions in BUFFER as normal elements."
  (declare (indent defun))
  `(let ((old-buffer ,buffer))
     (with-temp-buffer
       (insert-buffer-substring old-buffer)
       (goto-char (point-min))
       (--replace-first-group '(("\\(\\?\\)>" "/>") ("<\\(\\?\\)" "<")))
       ,@body)))

例如,允许像

(with-parsable-pi (current-buffer)
  (xml-parse-tag-1))

这样的调用,因此至少可以一次获取一个元素;但由于上下文管理器中公开的 XML 实际上并不有效,并且如果无效,则会出现 xml-parse-* (正确的)错误遇到XML,不可能 我想一次处理多个元素,

但可能会引入一个伪根元素之类的东西,但这种混乱的螺旋实在是太可怕了,

当然,另一个想法是运行 xpath 查询来提取处理指令。如果 Emacs Lisp 中有一个可靠的 xpath 解决方案就好了。

I would like to extract the processing instructions (particularly xml-model) from an XML file; yet both (n)xml-parse-file as well as libxml-parse-xml-region do not recognize processing instructions.

Is there a clean way to extract processing instructions or do I have to regex search for PIs?

edit: Here is a first draft of the functionality I was looking for:

(cl-defun extract-processing-instructions (&rest processing-instructions)
  "Extracts all/only the specified xml processing instructions from the current buffer and returns them as a list of string."
  (interactive)
  (let ((pi-re
     (format "<\\?\\(%s\\).*\\?>" (string-join processing-instructions "\\|")))
    (result))
    (save-excursion
      (goto-char (point-min))
      (while (re-search-forward pi-re nil t)
    (push (match-string 0) result)))
    (nreverse result)))

(cl-defun pi-str2sexp (pi-str)
  "Takes a processing instruction as a string and transforms it to a sexp-structure (in the style of xml-parse-*)."
  (let (sexp attr-alist)
    (save-match-data
      ;; get and push pi-element-name
      ;; (string-match "<\\?\\([[:alnum:]-]*\\)" pi-str)
      (string-match "<\\?\\([[:alnum:]-]*\\)" pi-str)
      (push (make-symbol (match-string 1 pi-str)) sexp)
      ;; construct attribute alist
      (while (string-match "\\([[:alnum:]-]*\\)=\"\\([^ ]*\\)\""
               pi-str (match-end 0))
    (push (cons (make-symbol (match-string 1 pi-str))
            (match-string 2 pi-str))
          attr-alist)))
    ;; finally: push attr alist and return sexp
    (push (nreverse attr-alist) sexp)
    (nreverse sexp)))

edit 2: Turns out advicing/generally building upon xml-parse-* in this matter (like suggested by @Tom Regner) is a huge pain. :(

The thing I came up with was a context manager, the idea was to use it to around-advice string-parse-tag-1 (which is at the heart of xml-parse-* (of course stand-alone use is also an option):

(cl-defmacro --replace-first-group (regex-replace-alist)
  `(save-excursion
     (dolist (expression ,regex-replace-alist)
       (goto-char (point-min))
       (replace-regexp (car expression) (cadr expression)))))

(cl-defmacro with-parsable-pi (buffer &body body)
  "Context manager that treats xml processing instructions in BUFFER as normal elements."
  (declare (indent defun))
  `(let ((old-buffer ,buffer))
     (with-temp-buffer
       (insert-buffer-substring old-buffer)
       (goto-char (point-min))
       (--replace-first-group '(("\\(\\?\\)>" "/>") ("<\\(\\?\\)" "<")))
       ,@body)))

This e.g. allows calls like

(with-parsable-pi (current-buffer)
  (xml-parse-tag-1))

so it is at least possible to get an element at a time; but since the XML exposed in the context manager isn't actually valid and xml-parse-* (rightfully) errors if invalid XML is encountered, it isn't possible to process more than one element at a time.

I was thinking of maybe introducing a pseudo root element or something, but the kludge spiral is ghastly enough as it is.

Another idea of course would be to run an xpath query to extract processing instructions. If there only was a solid xpath solution in Emacs Lisp..

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

私藏温柔 2025-01-17 22:30:38

好吧,我想我找到了一个令人满意的解决方案:xmltok-forward-prolog

所以这里是我想出的用于提取处理指令的代码:

(cl-defun filter-xmltok-prolog (&optional (buffer (current-buffer))
                     (filter-re "processing-instruction-.*"))
  "Filters the output of xmltok-forward-prolog (i.e. index 0 ('type') of each array) run in the context of BUFFER against FILTER-RE. Returns a list of vectors."
  (with-current-buffer buffer
    (save-excursion
      (goto-char (point-min))
      (let ((raw-prolog-data (xmltok-forward-prolog)))
    (seq-filter
     #'(lambda (x)
         (string-match filter-re (symbol-name (aref x 0))))
     raw-prolog-data)))))

(cl-defun --merge-pi-data (pi-data)
  "Meant to operate on data filtered with filter-xmltok-prolog against 'processing-instruction-.*'.
Merges processing-instruction-left/-right and returns a list of vectors holding the start/end coordinates of a processing instruction at index 1 and 2."
  (let ((left (car pi-data))
    (right (cadr pi-data)))
    (cond
     ((null pi-data) nil)
     (t (cons
     (vector 'processing-instruction
         (aref left 1) (aref right 2))
     (--merge-pi-data (cddr pi-data)))))))

;; test
(--merge-pi-data '([processing-instruction-left 40 51] [processing-instruction-right 52 126]))

(cl-defun pi-str2s-exp (pi-str)
  "Takes a processing instruction as a string and transforms it into a sexp structure (in the style of xml-parse-*)."
  (let (sexp attr-alist)
    (save-match-data
      ;; get and push pi-element-name
      (string-match "<\\?\\([[:alnum:]-]*\\)" pi-str)
      (push (make-symbol (match-string 1 pi-str)) sexp)
      ;; construct attribute alist
      (while (string-match "\\([[:alnum:]-]*\\)=\"\\([^ ]*\\)\""
               pi-str (match-end 0))
    (push (cons (make-symbol (match-string 1 pi-str))
            (match-string 2 pi-str))
          attr-alist)))
    ;; finally: push attr alist and return sexp
    (push (nreverse attr-alist) sexp)
    (nreverse sexp)))

(cl-defun get-processing-instructions (&optional (buffer (current-buffer)))
  "Extracts processing instructions from BUFFER and returns a list of sexp representations in the style of xml-parse-*."
  (save-excursion
    (mapcar #'pi-str2s-exp
     (mapcar #'(lambda (v)
           (buffer-substring (aref v 1) (aref v 2)))
       (--merge-pi-data (filter-xmltok-prolog buffer))))))


(cl-defun test/get-pis-from-file (file)
  (with-temp-buffer
    (insert-file-contents file)
    (get-processing-instructions)))

(test/get-pis-from-file "~/some/xml/file.xml")

我根本不是 Emacs Lisp 专家,并且这根本没有经过彻底测试,但它现在可以工作! :)

Ok, I think I found a satisfactory solution: xmltok-forward-prolog!

So here is the code I came up with for extracting processing instructions:

(cl-defun filter-xmltok-prolog (&optional (buffer (current-buffer))
                     (filter-re "processing-instruction-.*"))
  "Filters the output of xmltok-forward-prolog (i.e. index 0 ('type') of each array) run in the context of BUFFER against FILTER-RE. Returns a list of vectors."
  (with-current-buffer buffer
    (save-excursion
      (goto-char (point-min))
      (let ((raw-prolog-data (xmltok-forward-prolog)))
    (seq-filter
     #'(lambda (x)
         (string-match filter-re (symbol-name (aref x 0))))
     raw-prolog-data)))))

(cl-defun --merge-pi-data (pi-data)
  "Meant to operate on data filtered with filter-xmltok-prolog against 'processing-instruction-.*'.
Merges processing-instruction-left/-right and returns a list of vectors holding the start/end coordinates of a processing instruction at index 1 and 2."
  (let ((left (car pi-data))
    (right (cadr pi-data)))
    (cond
     ((null pi-data) nil)
     (t (cons
     (vector 'processing-instruction
         (aref left 1) (aref right 2))
     (--merge-pi-data (cddr pi-data)))))))

;; test
(--merge-pi-data '([processing-instruction-left 40 51] [processing-instruction-right 52 126]))

(cl-defun pi-str2s-exp (pi-str)
  "Takes a processing instruction as a string and transforms it into a sexp structure (in the style of xml-parse-*)."
  (let (sexp attr-alist)
    (save-match-data
      ;; get and push pi-element-name
      (string-match "<\\?\\([[:alnum:]-]*\\)" pi-str)
      (push (make-symbol (match-string 1 pi-str)) sexp)
      ;; construct attribute alist
      (while (string-match "\\([[:alnum:]-]*\\)=\"\\([^ ]*\\)\""
               pi-str (match-end 0))
    (push (cons (make-symbol (match-string 1 pi-str))
            (match-string 2 pi-str))
          attr-alist)))
    ;; finally: push attr alist and return sexp
    (push (nreverse attr-alist) sexp)
    (nreverse sexp)))

(cl-defun get-processing-instructions (&optional (buffer (current-buffer)))
  "Extracts processing instructions from BUFFER and returns a list of sexp representations in the style of xml-parse-*."
  (save-excursion
    (mapcar #'pi-str2s-exp
     (mapcar #'(lambda (v)
           (buffer-substring (aref v 1) (aref v 2)))
       (--merge-pi-data (filter-xmltok-prolog buffer))))))


(cl-defun test/get-pis-from-file (file)
  (with-temp-buffer
    (insert-file-contents file)
    (get-processing-instructions)))

(test/get-pis-from-file "~/some/xml/file.xml")

I'm not at all an Emacs Lisp expert and this isn't at all tested thoroughly, but it works for now! :)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文