从 XHTML 文档中删除特定标签(但保留其内容)的机制?
我想要一种简单而简单的方法来从 XHTML 文档中剥离标签,并且相信所有选项中都必须有足够简洁的内容,例如:XSLT、XPath、XQuery、使用 .NET XML 命名空间的自定义 C# 编程。我对其他人持开放态度。
例如,我想从 XHTML 文档中删除所有
标记,但是 保留其内部内容和子标签 (即不仅仅是跳过粗体标签并且 它的孩子)。
我需要保持原始文档的结构减去剥离的标签。
想法:
我已经了解了 XSLT 匹配元素进行选择的能力;但是我想默认情况下匹配所有内容,但有几个例外,并且我不确定这是否有利于这一点。这就是我现在正在看的。
XQuery我还没有开始研究。 (XQuery 更新:简要介绍了这项技术,它在功能上与 SQL 具有足够的可比性,以至于我看不出它如何维护原始文档的嵌套节点结构 - 我认为这不是一个竞争者)。
自定义C#/.NET XML 命名空间程序可能是可行的,因为我已经有了一个想法,但我的直接假设是,与其他 XML- 的原因相比,它可能涉及更多内容 -创建了特定的匹配语言。
...我还没有考虑过的另一种支持技术...
I would like a brief and easy way to strip tags from an XHTML document, and believe there has to be something curt enough among all the options like: XSLT, XPath, XQuery, custom C# programming using the .NET XML namespace. I'm open to others.
For example, I want to strip all
<b>
tags from an XHTML document but
keep their inner content and child tags
(i.e. not simply skip the bold tag and
its children).
I need to maintain the structure of the original document minus the stripped tags.
Thoughts:
I've seen XSLT's ability to match elements for selection; however I want to match everything by default with a couple of exceptions, and I'm unsure it's conducive to this. This is what I'm looking at right now.
XQuery I haven't started to look into. (Update for XQuery: Took a brief look at this technology and it's comparable enough to SQL in function that I fail to see how it can maintain the nested node structure of the original document - I think this is not a contender).
A custom C#/.NET XML namespace program might be viable as I already have an idea for it, but my immediate assumption is it's likely more involved contrasted with the reasons for which these other XML-specific matching languages were created.
... another kind of enabling technology I haven't yet considered...
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您想过 XSLT 吗?这是专门为转换 XML 和一般树结构而设计的语言。
此转换:
当应用于任何 XHTML 文档时,如下所示:
产生所需的正确结果,在本例中:
Have you thought of XSLT? This is the language specifically designed for transforming XML and generally tree structures.
This transformation:
when applied on any XHTML document, as the one below:
produces the wanted, correct result, in this case: