正则表达式剥离标签，保留 CDATA

发布于 2024-09-06 07:57:28 字数 848 浏览 11 评论 0原文

可能的重复：
RegEx 匹配开放标记（XHTML 自包含标记除外）

大家好，

我知道每个人都喜欢正则表达式问题，所以这是我的。我有一个 XML 树，其中一些节点包含 CDATA。如何仅返回包含数据的字符串？

让我们看一个例子

<xml>
  <node>I'm plain text.</node>
  <node><![CDATA[I'm text in cdata... and may contain html, <strong>yikes!</strong>]]></node>
</xml>

会返回

I'm plain text. I'm text in cdata... and may contain html, yikes!

我读过关于不使用常规语言解析不规则语言的内容，但我确信这是可行的。小伙伴们你们觉得怎么样呢？

谢谢， Kevin

编辑： 这是一个需要快速而肮脏的解决方案来处理几行 XML 的问题。我对最初的断然拒绝感到惊讶，但通过进一步阅读（特别是后来提供的链接），我发现经验丰富的程序员知道这是应该尽可能避免的事情。生活和学习。谢谢。

原文

Possible Duplicate:
RegEx match open tags except XHTML self-contained tags

Hi all,

I know how everyone loves a regex question, so here is mine. I have an XML tree within which some nodes contain CDATA. How do I return just a string containing the data?

Lets see an example

<xml>
  <node>I'm plain text.</node>
  <node><![CDATA[I'm text in cdata... and may contain html, <strong>yikes!</strong>]]></node>
</xml>

Would return

I'm plain text. I'm text in cdata... and may contain html, yikes!

I've read about not parsing an irregular language with a regular one, but I'm sure this is doable. What do you reckon guys?

Thanks,
Kevin

EDIT: This was a problem that needed a quick and dirty solution to deal with a few lines of XML. I was surprised at the initial flat refusal, but from further reading (in particular from links provided later on) I see that experienced programmers know it's something that should be avoided wherever possible. Live and learn. Thanks.

分享到QQ

分享到微博