用于缩进 XML 文件的正则表达式
是否可以编写一个 REGEX(搜索替换),当在 XML 字符串上运行时,该 REGEX 会输出良好缩进的 XML 字符串?
如果是的话,正则表达式是什么:)
Is it possible to write a REGEX (search replace) that when run on an XML string will output that XML string indented nicely?
If so whats the REGEX :)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
否。
使用 XML 解析器读取字符串,然后使用 XML 序列化程序以“漂亮”模式将其写回。
每个 XML 处理器都有自己的选项,因此它取决于平台,但以下是适用于 DOM Level 3 LS 兼容实现的有点冗长的方法:
No.
Use an XML parser to read the string, then an XML serialiser to write it back out in ‘pretty’ mode.
Each XML processor has its own options so it depends on platform, but here is the somewhat long-winded way that works on DOM Level 3 LS-compliant implementations:
如果您不使用正则表达式,那么这样做会简单得多。 事实上,我什至不确定正则表达式是否可行。
大多数语言都有 XML 库,可以使这项任务变得非常简单。 您使用什么语言?
Doing this would be far, far simpler if you didn't use a regex. In fact I'm not even sure it's possible with regex.
Most languages have XML libraries that would make this task very simple. What language are you using?
我不知道正则表达式是否可以单独对任意 XML 输入执行漂亮的打印格式。 您需要程序应用正则表达式来查找标记,找到匹配的结束标记(如果标记不是自闭合的),等等。 使用正则表达式来解决这个问题实际上是使用了错误的工具。 漂亮打印 XML 的最简单方法是使用 XML 解析器,读入它,设置适当的序列化选项,然后将 XML 序列化回来。
为什么要使用正则表达式来解决这个问题?
I don't know if a regex, in isolation, could do a pretty-print format of an arbitrary XML input. You would need a regex being applied by a program to find a tag, locate the matching closing tags (if the tag is not self-closed), and so on. Using regex to solve this problem is really using the wrong tool for the job. The simplest possible way to pretty print XML is to use an XML parser, read it in, set appropriate serialization options, and then serialize the XML back out.
Why do you want to use regex to solve this problem?
为此使用正则表达式将是一场噩梦。 根据节点的层次结构跟踪缩进级别几乎是不可能的。 也许 perl 的 5.10 正则表达式引擎可能会有所帮助,因为它现在是可重入的。 但我们不要走这条路...此外,您还需要考虑 CDATA 部分,它们可以嵌入需要被缩进忽略并完整保留的 XML 声明。
坚持使用 DOM。 正如另一个答案中所建议的,一些库已经提供了一个可以为您缩进 DOM 树的函数。 如果不构建一个将比创建和维护执行相同任务的正则表达式简单得多。
Using a regex for this will be a nightmare. Keeping track of the indentation level based on the hierarchy of the nodes will be almost impossible. Perhaps perl's 5.10 regular expression engine might help since it's now reentrant. But let's not go into that road... Besides you will need to take into account CDATA sections which can embed XML declarations that need to be ignored by the indentation and preserved intact.
Stick with DOM. As it was suggested in the other answer, some libraries provide already a function that will indent a DOM tree for you. If not building one will be much simplier than creating and maintaining the regexes that will do the same task.
这里描述的黑暗巫术正则表达式效果很好。
http://www.perlmonks.org/?node_id=261292
与使用 XML::LibXMl 和其他方法相比,它的主要优点是速度快一个数量级。
The dark voodoo regexp as described here works great.
http://www.perlmonks.org/?node_id=261292
Its main advantage against using XML::LibXMl and others is that it's an order of magnitude faster.
这只能通过多个正则表达式来实现,其执行方式类似于状态机。
您正在寻找的东西更适合即兴解析器。
This would only be acheivable with multiple regexs, which will perform like a state machine.
What you are looking for is far better suited to an off the cuff parser.
来自此链接 :
From this link: