如何在 Emacs Lisp 中提取 XML 处理指令?
我想从 XML 文件中提取处理指令(特别是 xml-model);然而,(n)xml-parse-file 和 libxml-parse-xml-region 都无法识别处理指令。
有没有一种干净的方法来提取处理指令,或者我是否必须通过正则表达式搜索 PI?
编辑:这是我正在寻找的功能的初稿:
(cl-defun extract-processing-instructions (&rest processing-instructions)
"Extracts all/only the specified xml processing instructions from the current buffer and returns them as a list of string."
(interactive)
(let ((pi-re
(format "<\\?\\(%s\\).*\\?>" (string-join processing-instructions "\\|")))
(result))
(save-excursion
(goto-char (point-min))
(while (re-search-forward pi-re nil t)
(push (match-string 0) result)))
(nreverse result)))
(cl-defun pi-str2sexp (pi-str)
"Takes a processing instruction as a string and transforms it to a sexp-structure (in the style of xml-parse-*)."
(let (sexp attr-alist)
(save-match-data
;; get and push pi-element-name
;; (string-match "<\\?\\([[:alnum:]-]*\\)" pi-str)
(string-match "<\\?\\([[:alnum:]-]*\\)" pi-str)
(push (make-symbol (match-string 1 pi-str)) sexp)
;; construct attribute alist
(while (string-match "\\([[:alnum:]-]*\\)=\"\\([^ ]*\\)\""
pi-str (match-end 0))
(push (cons (make-symbol (match-string 1 pi-str))
(match-string 2 pi-str))
attr-alist)))
;; finally: push attr alist and return sexp
(push (nreverse attr-alist) sexp)
(nreverse sexp)))
编辑2:事实证明,在这件事上建议/通常基于 xml-parse-* 构建(就像 @Tom Regner 所建议的那样)是一个巨大的痛苦。 :(
我想出的是一个上下文管理器,想法是用它来围绕建议 string-parse-tag-1 (这是 xml-parse-* 的核心(当然独立使用是也是一个选项):
(cl-defmacro --replace-first-group (regex-replace-alist)
`(save-excursion
(dolist (expression ,regex-replace-alist)
(goto-char (point-min))
(replace-regexp (car expression) (cadr expression)))))
(cl-defmacro with-parsable-pi (buffer &body body)
"Context manager that treats xml processing instructions in BUFFER as normal elements."
(declare (indent defun))
`(let ((old-buffer ,buffer))
(with-temp-buffer
(insert-buffer-substring old-buffer)
(goto-char (point-min))
(--replace-first-group '(("\\(\\?\\)>" "/>") ("<\\(\\?\\)" "<")))
,@body)))
例如,允许像
(with-parsable-pi (current-buffer)
(xml-parse-tag-1))
这样的调用,因此至少可以一次获取一个元素;但由于上下文管理器中公开的 XML 实际上并不有效,并且如果无效,则会出现 xml-parse-* (正确的)错误遇到XML,不可能 我想一次处理多个元素,
但可能会引入一个伪根元素之类的东西,但这种混乱的螺旋实在是太可怕了,
当然,另一个想法是运行 xpath 查询来提取处理指令。如果 Emacs Lisp 中有一个可靠的 xpath 解决方案就好了。
I would like to extract the processing instructions (particularly xml-model) from an XML file; yet both (n)xml-parse-file
as well as libxml-parse-xml-region
do not recognize processing instructions.
Is there a clean way to extract processing instructions or do I have to regex search for PIs?
edit: Here is a first draft of the functionality I was looking for:
(cl-defun extract-processing-instructions (&rest processing-instructions)
"Extracts all/only the specified xml processing instructions from the current buffer and returns them as a list of string."
(interactive)
(let ((pi-re
(format "<\\?\\(%s\\).*\\?>" (string-join processing-instructions "\\|")))
(result))
(save-excursion
(goto-char (point-min))
(while (re-search-forward pi-re nil t)
(push (match-string 0) result)))
(nreverse result)))
(cl-defun pi-str2sexp (pi-str)
"Takes a processing instruction as a string and transforms it to a sexp-structure (in the style of xml-parse-*)."
(let (sexp attr-alist)
(save-match-data
;; get and push pi-element-name
;; (string-match "<\\?\\([[:alnum:]-]*\\)" pi-str)
(string-match "<\\?\\([[:alnum:]-]*\\)" pi-str)
(push (make-symbol (match-string 1 pi-str)) sexp)
;; construct attribute alist
(while (string-match "\\([[:alnum:]-]*\\)=\"\\([^ ]*\\)\""
pi-str (match-end 0))
(push (cons (make-symbol (match-string 1 pi-str))
(match-string 2 pi-str))
attr-alist)))
;; finally: push attr alist and return sexp
(push (nreverse attr-alist) sexp)
(nreverse sexp)))
edit 2: Turns out advicing/generally building upon xml-parse-* in this matter (like suggested by @Tom Regner) is a huge pain. :(
The thing I came up with was a context manager, the idea was to use it to around-advice string-parse-tag-1 (which is at the heart of xml-parse-* (of course stand-alone use is also an option):
(cl-defmacro --replace-first-group (regex-replace-alist)
`(save-excursion
(dolist (expression ,regex-replace-alist)
(goto-char (point-min))
(replace-regexp (car expression) (cadr expression)))))
(cl-defmacro with-parsable-pi (buffer &body body)
"Context manager that treats xml processing instructions in BUFFER as normal elements."
(declare (indent defun))
`(let ((old-buffer ,buffer))
(with-temp-buffer
(insert-buffer-substring old-buffer)
(goto-char (point-min))
(--replace-first-group '(("\\(\\?\\)>" "/>") ("<\\(\\?\\)" "<")))
,@body)))
This e.g. allows calls like
(with-parsable-pi (current-buffer)
(xml-parse-tag-1))
so it is at least possible to get an element at a time; but since the XML exposed in the context manager isn't actually valid and xml-parse-* (rightfully) errors if invalid XML is encountered, it isn't possible to process more than one element at a time.
I was thinking of maybe introducing a pseudo root element or something, but the kludge spiral is ghastly enough as it is.
Another idea of course would be to run an xpath query to extract processing instructions. If there only was a solid xpath solution in Emacs Lisp..
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
好吧,我想我找到了一个令人满意的解决方案:
xmltok-forward-prolog
!所以这里是我想出的用于提取处理指令的代码:
我根本不是 Emacs Lisp 专家,并且这根本没有经过彻底测试,但它现在可以工作! :)
Ok, I think I found a satisfactory solution:
xmltok-forward-prolog
!So here is the code I came up with for extracting processing instructions:
I'm not at all an Emacs Lisp expert and this isn't at all tested thoroughly, but it works for now! :)