如何保留诸如& ndash之类的自定义实体;在使用XSLT转换XML的同时
我正在尝试使用XSLT转换XML,这是我正在使用的示例....
输入XML:
<!DOCTYPE printArtifactGroup [<!ENTITY ndash "&#38;ndash;">]>
<group>
<begin>
<head>
<text>(VOLS 0200)</text>
</head>
<data>
<text>Health 161–1 to 16–32–End 2006</text>
</data>
</begin>
</group>
XSLT:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
<xsl:output encoding="utf8" method="xml" indent="yes" />
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="begin">
<xsl:copy-of select="data" />
</xsl:template>
</xsl:stylesheet>
Python代码运行XSLT变换:
import lxml.etree as ET
from lxml import etree
parser = etree.XMLParser(load_dtd=True, resolve_entities=True, huge_tree=True)
dom = ET.parse('test.xml', parser)
xslt = ET.parse('test.xsl')
transform = ET.XSLT(xslt)
newdom = transform(dom)
processed_file='processed.xml'
with open(processed_file, 'w') as file:
file.write(str(newdom))
print(newdom)
print('Task Done')
实际输出:
<?xml version="1.0"?>
<group>
<data>
<text>Health 161&ndash;1 to 16&ndash;32&ndash;End 2006</text>
</data>
</group>
预期输出:
<group>
<begin>
<data>
<text>Health 161–1 to 16–32–End 2006</text>
</data>
</begin>
</group>
XML Parser正在解决&amp; (ampersand)实体对&amp; amp; ---当我们拥有自定义实体&amp; ndash时;它正在转换为&amp; ndash;
这是默认行为,但是我们有一个巨大的XML,在将转换的XML与源进行比较时,更改实体时很难。
无论如何,我们可以通过保留原始Entites来生成预期的输出。
预先感谢,任何想法或建议都会真正实现。
I'm trying to transforming xml using xslt, here is the example I'm using....
Input xml:
<!DOCTYPE printArtifactGroup [<!ENTITY ndash "&ndash;">]>
<group>
<begin>
<head>
<text>(VOLS 0200)</text>
</head>
<data>
<text>Health 161–1 to 16–32–End 2006</text>
</data>
</begin>
</group>
xslt:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
<xsl:output encoding="utf8" method="xml" indent="yes" />
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="begin">
<xsl:copy-of select="data" />
</xsl:template>
</xsl:stylesheet>
python code to run xslt transformation:
import lxml.etree as ET
from lxml import etree
parser = etree.XMLParser(load_dtd=True, resolve_entities=True, huge_tree=True)
dom = ET.parse('test.xml', parser)
xslt = ET.parse('test.xsl')
transform = ET.XSLT(xslt)
newdom = transform(dom)
processed_file='processed.xml'
with open(processed_file, 'w') as file:
file.write(str(newdom))
print(newdom)
print('Task Done')
Actual output:
<?xml version="1.0"?>
<group>
<data>
<text>Health 161–1 to 16–32–End 2006</text>
</data>
</group>
expected output:
<group>
<begin>
<data>
<text>Health 161–1 to 16–32–End 2006</text>
</data>
</begin>
</group>
xml parser is resolving the &(ampersand) entity to & --- when we have custom entities – it is converting to –
this is the default behavior, but we have a huge xml and when comparing the transformed xml with source it is difficult when entities are changed.
is there anyway we can generate the expected output by retaining the original entites.
Thanks in advance, any idea or suggestions are really appriciated.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
(现代)XSLT方法(XSLT 2及更高版本,可与SaxOnc的Python API一起使用)
,然后将使用
XSL:pracem-map
,例如那样
text
text 元素作为Eg&lt; text&gt; health 161&amp; ndash; 1至16&amp; ndash; 32&amp; ndash; end 2006&lt;/text&gt;
。(注意:我仅介绍了实体/角色映射问题,我没有尝试在该示例中实现您的转换的另一部分)。
The (modern) XSLT way (XSLT 2 and later, available to Python with the Python API of SaxonC) would to use
and then an
xsl:character-map
e.g.That way the
text
element is output as e.g.<text>Health 161–1 to 16–32–End 2006</text>
.(Note: I have solely presented the entity/character map issue, I have not tried to implement the other part of your transformation in that sample).
这是一个技巧:将内部实体定义替换为
e
是一个新的名称空间前缀,然后用两个替换match =“ begin”
模板模板:在源XML解析期间,这将转换
&amp; ndash;
,它是HTML中的a targue ,带有XML element <代码>&lt; ndash/&gt; 。在XSLT处理过程中,该元素被一系列字符替换。disable-output-Escaping
可防止XSLT处理器以'amp; ndash;
输出。Here's a trick: Replace the internal entity definition with
where
e
is a new namespace prefix, and replace thematch="begin"
template with two templates:During parsing of the source XML, this will convert
–
, which is a character in HTML, with an XML element<ndash/>
. And during XSLT processing, this element is replaced with a sequence of characters. Thedisable-output-escaping
prevents the XSLT processor from outputting this as–
.